Plaintext RSS fixer upper #33

argyleink · 2026-01-19T18:40:13Z

if plaintext is discovered, aka unformatted hard to read garbage, fix it.

Summary by CodeRabbit

New Features
- RSS descriptions are now automatically transformed to properly formatted HTML. Plain-text content is intelligently parsed and converted, with support for timestamp grouping, bold section headers, automatic link detection and conversion to clickable anchors, and proper HTML escaping for safe content rendering.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

vercel · 2026-01-19T18:40:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
starpod	Ready	Preview, Comment	Jan 19, 2026 6:40pm

coderabbitai · 2026-01-19T18:40:25Z

📝 Walkthrough

Walkthrough

A new TypeScript module provides utilities to transform plain text and RSS descriptions to HTML, including functions for escaping HTML entities, detecting HTML-like content, and converting plain text with timestamps, headers, and links. The RSS module now conditionally applies this transformation when content isn't already HTML.

Changes

Cohort / File(s)	Summary
RSS Transformation Utilities `src/lib/rss-transform.ts`	New module with three exported functions: `escapeHtml()` for safe HTML encoding, `looksLikeHtml()` for content-type detection, and `transformPlainTextToHtml()` for multi-part transformation (timestamps grouped as lists, bold headers detected, link detection with anchor tags, paragraphs as default).
RSS Module Integration `src/lib/rss.ts`	Imports transformation utilities and adds conditional logic to check if description content resembles HTML; applies `transformPlainTextToHtml()` for plain text, otherwise preserves original content.
Unit Test Coverage `tests/unit/rss.test.ts`	New comprehensive test suite validating escaping behavior, HTML detection heuristics, and multi-scenario transformation tests covering headers, timestamps, links, blank input, and proper entity escaping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop through the plaintext, a bound through the bold,
Timestamps and headers transformed to behold,
Links wrapped in anchors, escapes nice and clean,
The finest HTML transformation I've seen!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Plaintext RSS fixer upper' is vague and uses colloquial language without clearly specifying what is being fixed or how.	Use a more descriptive title that clarifies the specific change, such as 'Transform plaintext RSS content to HTML' or 'Add plaintext RSS to HTML conversion utilities'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/lib/rss-transform.ts`:
- Around line 35-63: The current link-list detection treats any line containing
"http" as a list item, dropping trailing text; update the initial test and the
while loop in rss-transform.ts to only treat a line as a link-list entry when
the entire line matches either "URL" or "Label: URL" (e.g., use a stricter regex
that anchors the whole line such as matching optional "Label:" then a single URL
with no extra text). Apply the same anchored regex in both the outer if and the
while condition that builds linkItems, and then keep the existing parsing logic
(item.match(/^(.+?):\s*(https?:\/\/.+)$/) and fallback URL-only match) to render
the <ul> only for those fully-matching lines so inline sentences with trailing
text are left untouched.

In `@src/lib/rss.ts`:
- Around line 111-115: The truncation currently always runs
htmlToText(description), which strips literal '<'/'>' in plain text; change the
logic so description is truncated from htmlToText(description) only when
looksLikeHtml(description) is true, otherwise truncate the raw description
string; update the description assignment (alongside content which uses
looksLikeHtml and transformPlainTextToHtml) to use a conditional: if
looksLikeHtml(description) use htmlToText(description) for truncation, else use
description directly, then pass that result into truncate(..., 260).

🧹 Nitpick comments (1)

src/lib/rss-transform.ts (1)
27-33: Header handling doesn’t match the comment.
Line 27 says plain “Links” is supported, but the regex only accepts **...**. Either update the comment or add support for unbolded headers to avoid confusion.
💡 Minimal fix (comment alignment)
-    // Check if this is a section header (bold text like "**Links**" or just "Links")
+    // Check if this is a section header (bold text like "**Links**")

coderabbitai · 2026-01-19T18:45:50Z

src/lib/rss-transform.ts

+    // Check if this looks like a link list item (e.g., "CodeRabbit: https://...")
+    if (/:?\s*https?:\/\//.test(line)) {
+      // Collect all consecutive link lines into a list
+      const linkItems: string[] = [];
+      while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) {
+        linkItems.push(lines[i]);
+        i++;
+      }
+      html.push('<ul>');
+      linkItems.forEach(item => {
+        // Parse "Label: URL" or just "URL"
+        const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/);
+        if (match) {
+          const label = escapeHtml(match[1].trim());
+          const url = escapeHtml(match[2].trim());
+          html.push(`<li>${label}: <a href="${url}">${url}</a></li>`);
+        } else {
+          const urlMatch = item.match(/(https?:\/\/.+)/);
+          if (urlMatch) {
+            const url = escapeHtml(urlMatch[1].trim());
+            html.push(`<li><a href="${url}">${url}</a></li>`);
+          } else {
+            html.push(`<li>${escapeHtml(item)}</li>`);
+          }
+        }
+      });
+      html.push('</ul>');
+      continue;
+    }


⚠️ Potential issue | 🟡 Minor

Inline-URL sentences get coerced into link lists (drops surrounding text).
Any line containing http becomes a list item, so a sentence like “Sponsor: https://x (use code …)” loses the trailing text. If that’s not intended, tighten detection to only match lines that are just a URL or “Label: URL”.

🔧 Safer link-line detection

- if (/:?\s*https?:\/\//.test(line)) { + const linkLineRegex = /^(?:[^:]+:\s*)?https?:\/\/\S+\s*$/; + if (linkLineRegex.test(line)) { // Collect all consecutive link lines into a list const linkItems: string[] = []; - while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) { + while (i < lines.length && linkLineRegex.test(lines[i])) { linkItems.push(lines[i]); i++; } html.push('<ul>'); linkItems.forEach(item => { // Parse "Label: URL" or just "URL" - const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/); + const match = item.match(/^(?:([^:]+):\s*)?(https?:\/\/\S+)\s*$/); if (match) { - const label = escapeHtml(match[1].trim()); - const url = escapeHtml(match[2].trim()); - html.push(`<li>${label}: <a href="${url}">${url}</a></li>`); + const label = match[1] ? escapeHtml(match[1].trim()) : null; + const url = escapeHtml(match[2].trim()); + html.push( + label + ? `<li>${label}: <a href="${url}">${url}</a></li>` + : `<li><a href="${url}">${url}</a></li>` + ); } else { const urlMatch = item.match(/(https?:\/\/.+)/); if (urlMatch) { const url = escapeHtml(urlMatch[1].trim()); html.push(`<li><a href="${url}">${url}</a></li>`);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Check if this looks like a link list item (e.g., "CodeRabbit: https://...")

if (/:?\s*https?:\/\//.test(line)) {

// Collect all consecutive link lines into a list

const linkItems: string[] = [];

while (i < lines.length && /:?\s*https?:\/\//.test(lines[i])) {

linkItems.push(lines[i]);

i++;

}

html.push('<ul>');

linkItems.forEach(item => {

// Parse "Label: URL" or just "URL"

const match = item.match(/^(.+?):\s*(https?:\/\/.+)$/);

if (match) {

const label = escapeHtml(match[1].trim());

const url = escapeHtml(match[2].trim());

html.push(`<li>${label}: <a href="${url}">${url}</a></li>`);

} else {

const urlMatch = item.match(/(https?:\/\/.+)/);

if (urlMatch) {

const url = escapeHtml(urlMatch[1].trim());

html.push(`<li><a href="${url}">${url}</a></li>`);

} else {

html.push(`<li>${escapeHtml(item)}</li>`);

}

}

});

html.push('</ul>');

continue;

}

// Check if this looks like a link list item (e.g., "CodeRabbit: https://...")

const linkLineRegex = /^(?:[^:]+:\s*)?https?:\/\/\S+\s*$/;

if (linkLineRegex.test(line)) {

// Collect all consecutive link lines into a list

const linkItems: string[] = [];

while (i < lines.length && linkLineRegex.test(lines[i])) {

linkItems.push(lines[i]);

i++;

}

html.push('<ul>');

linkItems.forEach(item => {

// Parse "Label: URL" or just "URL"

const match = item.match(/^(?:([^:]+):\s*)?(https?:\/\/\S+)\s*$/);

if (match) {

const label = match[1] ? escapeHtml(match[1].trim()) : null;

const url = escapeHtml(match[2].trim());

html.push(

label

? `<li>${label}: <a href="${url}">${url}</a></li>`

: `<li><a href="${url}">${url}</a></li>`

);

} else {

const urlMatch = item.match(/(https?:\/\/.+)/);

if (urlMatch) {

const url = escapeHtml(urlMatch[1].trim());

html.push(`<li><a href="${url}">${url}</a></li>`);

} else {

html.push(`<li>${escapeHtml(item)}</li>`);

}

}

});

html.push('</ul>');

continue;

}

🤖 Prompt for AI Agents

In `@src/lib/rss-transform.ts` around lines 35 - 63, The current link-list detection treats any line containing "http" as a list item, dropping trailing text; update the initial test and the while loop in rss-transform.ts to only treat a line as a link-list entry when the entire line matches either "URL" or "Label: URL" (e.g., use a stricter regex that anchors the whole line such as matching optional "Label:" then a single URL with no extra text). Apply the same anchored regex in both the outer if and the while condition that builds linkItems, and then keep the existing parsing logic (item.match(/^(.+?):\s*(https?:\/\/.+)$/) and fallback URL-only match) to render the <ul> only for those fully-matching lines so inline sentences with trailing text are left untouched.

coderabbitai · 2026-01-19T18:45:50Z

src/lib/rss.ts

+            content: looksLikeHtml(description)
+              ? description
+              : transformPlainTextToHtml(description),
            description: truncate(htmlToText(description), 260),
            duration: itunes_duration,


⚠️ Potential issue | 🟡 Minor

Plain-text descriptions still go through htmlToText.
When description is plaintext, running htmlToText can drop literal </> sequences. Consider using raw text for truncation when !looksLikeHtml(description).

✅ Suggested fix

- return { + const isHtml = looksLikeHtml(description); + const descriptionText = isHtml ? htmlToText(description) : description; + return { id, title: `${title}`, - content: looksLikeHtml(description) - ? description - : transformPlainTextToHtml(description), - description: truncate(htmlToText(description), 260), + content: isHtml ? description : transformPlainTextToHtml(description), + description: truncate(descriptionText, 260),

🤖 Prompt for AI Agents

In `@src/lib/rss.ts` around lines 111 - 115, The truncation currently always runs htmlToText(description), which strips literal '<'/'>' in plain text; change the logic so description is truncated from htmlToText(description) only when looksLikeHtml(description) is true, otherwise truncate the raw description string; update the description assignment (alongside content which uses looksLikeHtml and transformPlainTextToHtml) to use a conditional: if looksLikeHtml(description) use htmlToText(description) for truncation, else use description directly, then pass that result into truncate(..., 260).

argyleink · 2026-01-19T19:00:31Z

closing, fix was done upstream

argyleink added 2 commits January 19, 2026 10:18

fixes rss plaintext

1e9dcab

adds tests, abstracts the primary function out of rss

e4bf561

vercel bot deployed to Preview January 19, 2026 18:40 View deployment

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

argyleink closed this Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Plaintext RSS fixer upper #33

Plaintext RSS fixer upper #33

argyleink commented Jan 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 19, 2026

Uh oh!

coderabbitai bot Jan 19, 2026

Uh oh!

argyleink commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Plaintext RSS fixer upper #33

Plaintext RSS fixer upper #33

Conversation

argyleink commented Jan 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

argyleink commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

argyleink commented Jan 19, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Jan 19, 2026 •

edited

Loading

coderabbitai bot commented Jan 19, 2026 •

edited

Loading