-
-
Notifications
You must be signed in to change notification settings - Fork 27
Plaintext RSS fixer upper #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughA new TypeScript module provides utilities to transform plain text and RSS descriptions to HTML, including functions for escaping HTML entities, detecting HTML-like content, and converting plain text with timestamps, headers, and links. The RSS module now conditionally applies this transformation when content isn't already HTML. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/lib/rss-transform.ts`:
- Around line 35-63: The current link-list detection treats any line containing
"http" as a list item, dropping trailing text; update the initial test and the
while loop in rss-transform.ts to only treat a line as a link-list entry when
the entire line matches either "URL" or "Label: URL" (e.g., use a stricter regex
that anchors the whole line such as matching optional "Label:" then a single URL
with no extra text). Apply the same anchored regex in both the outer if and the
while condition that builds linkItems, and then keep the existing parsing logic
(item.match(/^(.+?):\s*(https?:\/\/.+)$/) and fallback URL-only match) to render
the <ul> only for those fully-matching lines so inline sentences with trailing
text are left untouched.
In `@src/lib/rss.ts`:
- Around line 111-115: The truncation currently always runs
htmlToText(description), which strips literal '<'/'>' in plain text; change the
logic so description is truncated from htmlToText(description) only when
looksLikeHtml(description) is true, otherwise truncate the raw description
string; update the description assignment (alongside content which uses
looksLikeHtml and transformPlainTextToHtml) to use a conditional: if
looksLikeHtml(description) use htmlToText(description) for truncation, else use
description directly, then pass that result into truncate(..., 260).
🧹 Nitpick comments (1)
src/lib/rss-transform.ts (1)
27-33: Header handling doesn’t match the comment.
Line 27 says plain “Links” is supported, but the regex only accepts**...**. Either update the comment or add support for unbolded headers to avoid confusion.💡 Minimal fix (comment alignment)
- // Check if this is a section header (bold text like "**Links**" or just "Links") + // Check if this is a section header (bold text like "**Links**")
| // Check if this looks like a link list item (e.g., "CodeRabbit: https://...") | ||
| if (/:?\s*https?:\/\//.test(line)) { | ||
| // Collect all consecutive link lines into a list | ||
| const linkItems: string[] = []; | ||
| while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) { | ||
| linkItems.push(lines[i]); | ||
| i++; | ||
| } | ||
| html.push('<ul>'); | ||
| linkItems.forEach(item => { | ||
| // Parse "Label: URL" or just "URL" | ||
| const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/); | ||
| if (match) { | ||
| const label = escapeHtml(match[1].trim()); | ||
| const url = escapeHtml(match[2].trim()); | ||
| html.push(`<li>${label}: <a href="${url}">${url}</a></li>`); | ||
| } else { | ||
| const urlMatch = item.match(/(https?:\/\/.+)/); | ||
| if (urlMatch) { | ||
| const url = escapeHtml(urlMatch[1].trim()); | ||
| html.push(`<li><a href="${url}">${url}</a></li>`); | ||
| } else { | ||
| html.push(`<li>${escapeHtml(item)}</li>`); | ||
| } | ||
| } | ||
| }); | ||
| html.push('</ul>'); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline-URL sentences get coerced into link lists (drops surrounding text).
Any line containing http becomes a list item, so a sentence like “Sponsor: https://x (use code …)” loses the trailing text. If that’s not intended, tighten detection to only match lines that are just a URL or “Label: URL”.
🔧 Safer link-line detection
- if (/:?\s*https?:\/\//.test(line)) {
+ const linkLineRegex = /^(?:[^:]+:\s*)?https?:\/\/\S+\s*$/;
+ if (linkLineRegex.test(line)) {
// Collect all consecutive link lines into a list
const linkItems: string[] = [];
- while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) {
+ while (i < lines.length && linkLineRegex.test(lines[i])) {
linkItems.push(lines[i]);
i++;
}
html.push('<ul>');
linkItems.forEach(item => {
// Parse "Label: URL" or just "URL"
- const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/);
+ const match = item.match(/^(?:([^:]+):\s*)?(https?:\/\/\S+)\s*$/);
if (match) {
- const label = escapeHtml(match[1].trim());
- const url = escapeHtml(match[2].trim());
- html.push(`<li>${label}: <a href="${url}">${url}</a></li>`);
+ const label = match[1] ? escapeHtml(match[1].trim()) : null;
+ const url = escapeHtml(match[2].trim());
+ html.push(
+ label
+ ? `<li>${label}: <a href="${url}">${url}</a></li>`
+ : `<li><a href="${url}">${url}</a></li>`
+ );
} else {
const urlMatch = item.match(/(https?:\/\/.+)/);
if (urlMatch) {
const url = escapeHtml(urlMatch[1].trim());
html.push(`<li><a href="${url}">${url}</a></li>`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check if this looks like a link list item (e.g., "CodeRabbit: https://...") | |
| if (/:?\s*https?:\/\//.test(line)) { | |
| // Collect all consecutive link lines into a list | |
| const linkItems: string[] = []; | |
| while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) { | |
| linkItems.push(lines[i]); | |
| i++; | |
| } | |
| html.push('<ul>'); | |
| linkItems.forEach(item => { | |
| // Parse "Label: URL" or just "URL" | |
| const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/); | |
| if (match) { | |
| const label = escapeHtml(match[1].trim()); | |
| const url = escapeHtml(match[2].trim()); | |
| html.push(`<li>${label}: <a href="${url}">${url}</a></li>`); | |
| } else { | |
| const urlMatch = item.match(/(https?:\/\/.+)/); | |
| if (urlMatch) { | |
| const url = escapeHtml(urlMatch[1].trim()); | |
| html.push(`<li><a href="${url}">${url}</a></li>`); | |
| } else { | |
| html.push(`<li>${escapeHtml(item)}</li>`); | |
| } | |
| } | |
| }); | |
| html.push('</ul>'); | |
| continue; | |
| } | |
| // Check if this looks like a link list item (e.g., "CodeRabbit: https://...") | |
| const linkLineRegex = /^(?:[^:]+:\s*)?https?:\/\/\S+\s*$/; | |
| if (linkLineRegex.test(line)) { | |
| // Collect all consecutive link lines into a list | |
| const linkItems: string[] = []; | |
| while (i < lines.length && linkLineRegex.test(lines[i])) { | |
| linkItems.push(lines[i]); | |
| i++; | |
| } | |
| html.push('<ul>'); | |
| linkItems.forEach(item => { | |
| // Parse "Label: URL" or just "URL" | |
| const match = item.match(/^(?:([^:]+):\s*)?(https?:\/\/\S+)\s*$/); | |
| if (match) { | |
| const label = match[1] ? escapeHtml(match[1].trim()) : null; | |
| const url = escapeHtml(match[2].trim()); | |
| html.push( | |
| label | |
| ? `<li>${label}: <a href="${url}">${url}</a></li>` | |
| : `<li><a href="${url}">${url}</a></li>` | |
| ); | |
| } else { | |
| const urlMatch = item.match(/(https?:\/\/.+)/); | |
| if (urlMatch) { | |
| const url = escapeHtml(urlMatch[1].trim()); | |
| html.push(`<li><a href="${url}">${url}</a></li>`); | |
| } else { | |
| html.push(`<li>${escapeHtml(item)}</li>`); | |
| } | |
| } | |
| }); | |
| html.push('</ul>'); | |
| continue; | |
| } |
🤖 Prompt for AI Agents
In `@src/lib/rss-transform.ts` around lines 35 - 63, The current link-list
detection treats any line containing "http" as a list item, dropping trailing
text; update the initial test and the while loop in rss-transform.ts to only
treat a line as a link-list entry when the entire line matches either "URL" or
"Label: URL" (e.g., use a stricter regex that anchors the whole line such as
matching optional "Label:" then a single URL with no extra text). Apply the same
anchored regex in both the outer if and the while condition that builds
linkItems, and then keep the existing parsing logic
(item.match(/^(.+?):\s*(https?:\/\/.+)$/) and fallback URL-only match) to render
the <ul> only for those fully-matching lines so inline sentences with trailing
text are left untouched.
| content: looksLikeHtml(description) | ||
| ? description | ||
| : transformPlainTextToHtml(description), | ||
| description: truncate(htmlToText(description), 260), | ||
| duration: itunes_duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plain-text descriptions still go through htmlToText.
When description is plaintext, running htmlToText can drop literal </> sequences. Consider using raw text for truncation when !looksLikeHtml(description).
✅ Suggested fix
- return {
+ const isHtml = looksLikeHtml(description);
+ const descriptionText = isHtml ? htmlToText(description) : description;
+ return {
id,
title: `${title}`,
- content: looksLikeHtml(description)
- ? description
- : transformPlainTextToHtml(description),
- description: truncate(htmlToText(description), 260),
+ content: isHtml ? description : transformPlainTextToHtml(description),
+ description: truncate(descriptionText, 260),🤖 Prompt for AI Agents
In `@src/lib/rss.ts` around lines 111 - 115, The truncation currently always runs
htmlToText(description), which strips literal '<'/'>' in plain text; change the
logic so description is truncated from htmlToText(description) only when
looksLikeHtml(description) is true, otherwise truncate the raw description
string; update the description assignment (alongside content which uses
looksLikeHtml and transformPlainTextToHtml) to use a conditional: if
looksLikeHtml(description) use htmlToText(description) for truncation, else use
description directly, then pass that result into truncate(..., 260).
|
closing, fix was done upstream |
if plaintext is discovered, aka unformatted hard to read garbage, fix it.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.