-
Notifications
You must be signed in to change notification settings - Fork 829
community: add lingo-docs - markdown documentation translator #1899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
community: add lingo-docs - markdown documentation translator #1899
Conversation
Python CLI that translates markdown documentation to multiple languages using the Lingo.dev Python SDK. Preserves code blocks during translation and generates language selector badges.
📝 WalkthroughWalkthroughThis PR introduces a new lingo-docs CLI tool in the community directory that translates Markdown files into multiple languages using the Lingo.dev API, with features for code block preservation, badge generation, and language-selector creation. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as lingo-docs CLI
participant FileIO as File System
participant Engine as LingoDotDevEngine
participant API as Lingo.dev API
User->>CLI: translate file.md --langs es,fr --source en
CLI->>FileIO: Read markdown file
FileIO-->>CLI: Content
CLI->>CLI: Extract code blocks (preserve)
CLI->>Engine: Initialize with API key
loop For each target language
CLI->>Engine: Translate content
Engine->>API: POST translation request
API-->>Engine: Translated text
Engine-->>CLI: Translation result
CLI->>CLI: Restore code blocks
end
CLI->>FileIO: Write translated files (file.es.md, file.fr.md)
CLI->>CLI: Generate language-selector badge
CLI->>FileIO: Optionally insert badge into original file
CLI->>User: Display results table with status & output paths
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.changeset/fast-schools-smile.md:
- Around line 1-2: The changeset file is empty; replace it with a proper
changeset documenting the Python CLI addition by running the interactive
generator (use `pnpm new` not `pnpm new:empty`) or manually create a changeset
that names the affected package, specifies the appropriate version bump (e.g.,
minor for new functionality), and includes a short description like "add Python
CLI tool"; update the file `.changeset/fast-schools-smile.md` accordingly so the
release tooling picks up the change.
In `@community/lingo-docs/lingo_docs/cli.py`:
- Around line 53-65: The extract/restore placeholder workflow can silently fail
if translation mutates the __CODE_BLOCK_*__ tokens; update the flow that calls
extract_code_blocks and restore_code_blocks so that after localize_text(...)
returns you assert all expected placeholders (from the code_blocks list returned
by extract_code_blocks) are present in the translated string before calling
restore_code_blocks, and pass the reference parameter into
localize_text(reference="Preserve all __CODE_BLOCK_*__ tokens unchanged") to
signal the backend to avoid altering those tokens; use the extract_code_blocks,
restore_code_blocks, and localize_text identifiers to locate and implement these
checks and the reference argument.
In `@community/lingo-docs/README.md`:
- Line 37: Update the README line that currently reads "Lingo.dev API key (free
tier: 10,000 tokens/month)" to correctly reflect the billing unit: change the
phrase to "Lingo.dev API key (free tier: 10,000 translated words/month)"; locate
the string in community/lingo-docs/README.md and replace only the word "tokens"
with "translated words" to preserve surrounding context.
- Around line 56-57: Update the README signup/API key steps to be explicit:
instruct users to first sign in or create an account at the full redirect URL
https://lingo.dev/en/auth?redirect=%2Fen%2Fapp, then open their dashboard,
select the target Project, go to Project Settings → API Keys (or API key) and
copy the key; make sure to state that the API key is found inside the project
settings and that login is required before accessing Projects → API key → Copy.
🧹 Nitpick comments (4)
community/lingo-docs/lingo_docs/cli.py (2)
124-126: Optional: de‑duplicate language codes.Avoid repeated translations and duplicate badge entries when users pass duplicates.
♻️ Proposed tweak
def parse_languages(langs: str) -> list[str]: """Parse comma-separated language codes.""" - return [lang.strip() for lang in langs.split(",") if lang.strip()] + seen: set[str] = set() + ordered: list[str] = [] + for raw in langs.split(","): + lang = raw.strip() + if not lang or lang in seen: + continue + seen.add(lang) + ordered.append(lang) + return ordered
219-235: Badge injection should update existing blocks.If users re-run with new languages, the existing badge stays stale because the code only inserts when missing. Replacing the existing block keeps badges in sync.
🔧 Proposed update
badge_comment = f"<!-- lingo-docs-badge -->\n{badge}\n<!-- /lingo-docs-badge -->\n\n" - if "<!-- lingo-docs-badge -->" not in content: - updated_content = badge_comment + content - file.write_text(updated_content, encoding="utf-8") - console.print(f"[green]✓[/green] Badge added to {file.name}") + if "<!-- lingo-docs-badge -->" in content: + updated_content = re.sub( + r"<!-- lingo-docs-badge -->.*?<!-- /lingo-docs-badge -->\n*\n?", + badge_comment, + content, + flags=re.S, + ) + console.print(f"[green]✓[/green] Badge updated in {file.name}") + else: + updated_content = badge_comment + content + console.print(f"[green]✓[/green] Badge added to {file.name}") + file.write_text(updated_content, encoding="utf-8")community/lingo-docs/pyproject.toml (1)
1-19: Add a lockfile (or tighten version bounds) to prevent unintended breaking upgrades.Lower-bound-only specs can pull in breaking changes: typer 0.21.x is still in the 0.x semver range where minor bumps can be breaking (e.g., 0.21.0 dropped Python 3.8), and rich 14.0.0 introduced environment-variable behavior changes. For a CLI application, consider using
pip-toolsto generate a pinnedrequirements.txtlockfile (keep ranges inpyproject.toml, compile to exact pins in the lock), or tighten ranges using compatible-release specifiers (e.g.,typer~=0.21.1).community/lingo-docs/README.md (1)
30-30: Clarify the language count or list all supported languages.The documentation claims "20+ languages" but the table at lines 101-109 only shows 14 specific languages, followed by "And more..." This vagueness may confuse users about actual language support.
Consider either:
- Listing all supported languages explicitly in the table
- Being more specific about the total count (e.g., "25+ languages")
- Removing "And more..." if only 14 are supported
| --- | ||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify changeset setup and format in repository
echo "=== Check .changeset directory contents ==="
ls -la .changeset/ 2>/dev/null || echo "No .changeset directory found"
echo -e "\n=== Check for .changeset/config.json ==="
cat .changeset/config.json 2>/dev/null || echo "No config.json found"
echo -e "\n=== Check root package.json for changeset config ==="
cat package.json 2>/dev/null | head -50
echo -e "\n=== List all changeset files ==="
fd -t f . .changeset 2>/dev/null || echo "No changeset files found"
echo -e "\n=== Check if pnpm is used ==="
if [ -f "pnpm-lock.yaml" ]; then
echo "pnpm-lock.yaml found - pnpm is used"
else
echo "pnpm-lock.yaml not found"
fiRepository: lingodotdev/lingo.dev
Length of output: 1895
Replace empty changeset with proper changeset documenting the Python CLI tool addition.
This repository uses changesets for release tracking (as configured in .changeset/config.json). The current file contains only delimiters with no content. Since this PR adds a Python CLI tool—a functional change requiring version management—use pnpm new instead of pnpm new:empty to create a proper changeset that documents the package name, version bump type, and description. Empty changesets are reserved for non-functional changes like README updates.
🤖 Prompt for AI Agents
In @.changeset/fast-schools-smile.md around lines 1 - 2, The changeset file is
empty; replace it with a proper changeset documenting the Python CLI addition by
running the interactive generator (use `pnpm new` not `pnpm new:empty`) or
manually create a changeset that names the affected package, specifies the
appropriate version bump (e.g., minor for new functionality), and includes a
short description like "add Python CLI tool"; update the file
`.changeset/fast-schools-smile.md` accordingly so the release tooling picks up
the change.
| def extract_code_blocks(content: str) -> tuple[str, list[str]]: | ||
| """Extract code blocks and replace with placeholders to preserve them during translation.""" | ||
| code_blocks = [] | ||
|
|
||
| def replacer(match): | ||
| code_blocks.append(match.group(0)) | ||
| return f"__CODE_BLOCK_{len(code_blocks) - 1}__" | ||
|
|
||
| # Match fenced code blocks (``` or ~~~) | ||
| pattern = r"(```[\s\S]*?```|~~~[\s\S]*?~~~)" | ||
| processed = re.sub(pattern, replacer, content) | ||
| return processed, code_blocks | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the full context of the file to understand how placeholders are used
head -80 community/lingo-docs/lingo_docs/cli.pyRepository: lingodotdev/lingo.dev
Length of output: 2160
🏁 Script executed:
# Look for the restore_code_blocks function and how placeholders are handled
sed -n '66,120p' community/lingo-docs/lingo_docs/cli.pyRepository: lingodotdev/lingo.dev
Length of output: 1999
🏁 Script executed:
# Search for usage of placeholders and how translation handles them
rg "__CODE_BLOCK_" community/lingo-docs/lingo_docs/cli.py -A 2 -B 2Repository: lingodotdev/lingo.dev
Length of output: 415
🏁 Script executed:
# Check if there are any validation or assertion mechanisms in place
rg -i "assert|validate|check" community/lingo-docs/lingo_docs/cli.py | head -20Repository: lingodotdev/lingo.dev
Length of output: 47
🌐 Web query:
Lingo.dev Python SDK placeholder preservation "do not translate" feature
💡 Result:
Python SDK (package lingodotdev) — placeholder preservation / “do not translate”
As of lingodotdev v1.3.0 (Jul 22, 2025), the Python SDK docs/API do not expose a dedicated “protect placeholders / do-not-translate spans” feature (e.g., no parameter like protected_terms, ignore_patterns, or do_not_translate). The documented translation params are things like source_locale, target_locale, fast, and reference (for context), plus chunking/concurrency controls. [1]
Practical workaround (SDK):
- Pre/post-process placeholders yourself: replace placeholders with unique sentinel tokens before calling
localize_text()/localize_object(), then swap them back after translation (e.g., replace{user}→__PH_0__). - Optionally also use the SDK’s
referenceparameter to reinforce the rule (“keep__PH_0__exactly unchanged”). [1]
If you meant Lingo.dev CLI/Compiler (static i18n), not the Python SDK
Lingo.dev’s CLI does have a “do not translate” capability at the key level via lockedKeys, which copies the source value into all target locales unchanged (useful for brand names / technical strings). [2][3]
The Compiler also supports custom prompts, where you can explicitly instruct “Do not translate brand names…” etc. [4]
Sources:
[1] lingodotdev PyPI project page (Python SDK methods/params) (pypi.org)
[2] Lingo.dev CLI i18n.json config reference (lockedKeys) (lingo.dev)
[3] Lingo.dev Docs “Configuration” (key locking / locked keys retain original values) (docs.lingo.dev)
[4] Lingo.dev Compiler configuration reference (custom prompt w/ “Do not translate…”) (lingo.dev)
Citations:
- 1: https://pypi.org/project/lingodotdev/
- 2: https://lingo.dev/en/cli/fundamentals/i18n-json-config?utm_source=openai
- 3: https://docs.lingo.dev/config?utm_source=openai
- 4: https://lingo.dev/en/compiler/configuration/reference?utm_source=openai
Add post-translation validation to ensure code-block placeholders survive localization.
The Lingo.dev Python SDK (v1.3.0+) does not provide a built-in "do not translate" feature for placeholder tokens. While the current extract-before/restore-after pattern is the standard workaround, there is no guarantee that __CODE_BLOCK_n__ tokens survive unmodified through the translation engine. Consider adding:
- A post-translation assertion that verifies all expected placeholders are present in the translated content before calling
restore_code_blocks(). - Passing the
referenceparameter tolocalize_text()with a note like "Preserve all__CODE_BLOCK_*__tokens unchanged" to reinforce the intent with the backend.
This prevents silent corruption of code blocks if placeholders are altered during translation.
🤖 Prompt for AI Agents
In `@community/lingo-docs/lingo_docs/cli.py` around lines 53 - 65, The
extract/restore placeholder workflow can silently fail if translation mutates
the __CODE_BLOCK_*__ tokens; update the flow that calls extract_code_blocks and
restore_code_blocks so that after localize_text(...) returns you assert all
expected placeholders (from the code_blocks list returned by
extract_code_blocks) are present in the translated string before calling
restore_code_blocks, and pass the reference parameter into
localize_text(reference="Preserve all __CODE_BLOCK_*__ tokens unchanged") to
signal the backend to avoid altering those tokens; use the extract_code_blocks,
restore_code_blocks, and localize_text identifiers to locate and implement these
checks and the reference argument.
|
|
||
| - Python 3.11+ | ||
| - [uv](https://docs.astral.sh/uv/) (recommended) or pip | ||
| - Lingo.dev API key (free tier: 10,000 tokens/month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Lingo.dev free tier token limit per month
💡 Result:
Lingo.dev’s free Hobby tier is limited by translated words, not LLM “tokens”: it includes 10,000 translated words per month. [1]
Source: Lingo.dev pricing page. [1]
Update free tier terminology from "tokens" to "translated words".
Lingo.dev's free tier is measured in translated words, not LLM tokens. The correct phrasing should be "free tier: 10,000 translated words/month" rather than "10,000 tokens/month."
🤖 Prompt for AI Agents
In `@community/lingo-docs/README.md` at line 37, Update the README line that
currently reads "Lingo.dev API key (free tier: 10,000 tokens/month)" to
correctly reflect the billing unit: change the phrase to "Lingo.dev API key
(free tier: 10,000 translated words/month)"; locate the string in
community/lingo-docs/README.md and replace only the word "tokens" with
"translated words" to preserve surrounding context.
| - Sign up at https://lingo.dev/en/auth | ||
| - Go to Projects → API key → Copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Lingo.dev signup URL authentication
💡 Result:
Signup / login URL (same page):
https://lingo.dev/en/auth?redirect=%2Fen%2Fapp
This page says that if you don’t yet have an account, it will be created automatically after you continue (Google or email). [1]
CLI authentication (local dev):
npx lingo.dev@latest loginThis opens a browser window for login/signup and then returns you to the terminal. [2][3]
Headless / CI authentication (API key):
Set:
export LINGODOTDEV_API_KEY=your_api_key_here(or put LINGODOTDEV_API_KEY=... in a .env). [2]
Sources: [1] [2] [3]
🌐 Web query:
Lingo.dev how to obtain API key dashboard
💡 Result:
To get your Lingo.dev API key from the dashboard:
- Log in to the Lingo.dev Engine (dashboard).
- Go to the Projects page.
- In your project, click API key → Copy. [1]
- Some docs describe this as finding it in your project settings. [2]
Then use it as LINGODOTDEV_API_KEY (e.g., in .env or CI secrets). [1][3]
Sources: [1] [2] [3]
Improve signup and API key retrieval instructions for clarity.
The instructions are functionally correct but could be more explicit:
- The signup URL (https://lingo.dev/en/auth) works but the complete redirect URL is https://lingo.dev/en/auth?redirect=%2Fen%2Fapp
- The API key navigation path (Projects → API key → Copy) is accurate but should clarify that users must log in first and that the API key is located within their project settings
Consider expanding lines 56–57 to include the login step and project context for users unfamiliar with the dashboard.
🤖 Prompt for AI Agents
In `@community/lingo-docs/README.md` around lines 56 - 57, Update the README
signup/API key steps to be explicit: instruct users to first sign in or create
an account at the full redirect URL
https://lingo.dev/en/auth?redirect=%2Fen%2Fapp, then open their dashboard,
select the target Project, go to Project Settings → API Keys (or API key) and
copy the key; make sure to state that the API key is found inside the project
settings and that login is required before accessing Projects → API key → Copy.
|
Hey @Prajwalsrinvas please address the comments by Coderabbit. Thank you for your work! ❤️ |
Python CLI that translates markdown documentation to multiple languages using the Lingo.dev Python SDK. Preserves code
blocks during translation and generates language selector badges.
Closes #1761
Lingo.dev Features Used
LingoDotDevEnginefor AI-powered translationlocalize_text()with async/awaitHow to Test