Skip to content

Conversation

@KeqingMoe
Copy link

Summary of Changes

This PR fixes a defect in the annotate-snippets library where patch highlighting becomes misaligned when handling Unicode characters. Currently, when a source line contains Unicode characters (either before the patch location or within the patch content itself), the color highlighting (e.g., for Removal) is incorrectly offset.

Root Cause

The issue originates in the emit_suggestion_default function (src/renderer/render.rs). This function incorrectly mixes different units when calling StyledBuffer::set_style_range:

  • The col_start and col_end parameters expect character counts
  • However, the function passes: padding (character count) + span_start/end_pos (display width) + line.len() (byte length)

This inconsistent usage of units (characters, display columns, bytes) causes the highlight ranges to be calculated incorrectly when multi-byte Unicode characters are present, leading to visual misalignment.

Fix

This PR corrects the unit calculations in emit_suggestion_default, ensuring that consistent character-based positioning is used throughout. The fix ensures that:

  • Highlight ranges are calculated based on proper character counts (or display widths where appropriate)
  • Unicode characters (including emoji, CJK characters, and combining characters) no longer cause highlight misalignment
  • The Removal color (and other patch styles) now correctly aligns with the intended text segments

Additional Context

  1. The same emit_suggestion_default function is also used by draw_code_line and render_source_line in render.rs. While these functions may have similar issues, they have not been tested or modified in this PR. Further investigation and potential fixes for these functions may be needed in future work.

  2. Regarding the .term.svg Output: In the attached .term.svg output, the underline markers ("^^^^^^^^ here") show a misalignment of approximately 1.5 units of display width. I suspect this is a rendering bug in snapbox. When tested in actual terminals, the underline markers align correctly.

@KeqingMoe KeqingMoe changed the title fix: Improve highlighting for patches con taining Unicode characters fix: Improve highlighting for patches containing Unicode characters Dec 23, 2025
@KeqingMoe KeqingMoe changed the title fix: Improve highlighting for patches containing Unicode characters fix: Fix Unicode highlight alignment in patches Dec 23, 2025
Copy link
Member

@Muscraft Muscraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this and coming up with a fix!

Can you make the commit adding the new test, the first commit in this PR? It makes it much easier to review the change. The "tip" at the end of this section in the Cargo Contributor Guide, describes the reasoning better.

(padding as isize + span_start_pos as isize) as usize,
(padding as isize + span_end_pos as isize) as usize,
(padding as isize + span_start_char_pos as isize) as usize,
(padding as isize + span_end_char_pos as isize) as usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using char spans here (and above) will cause issues with \t as we normalize the whitespace, which replaces \t with four spaces. We currently take this into account for addition lines, doing the same here should make things work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try it. However, the library's code doesn't explicitly distinguish between character count, byte count, and display width, which makes it unclear which of the three usize values ​​it refers to. If considering normalize_whitespace, we have four different units. I might need to investigate further.

<tspan x="10px" y="334px"><tspan class="fg-bright-blue bold">20</tspan><tspan> </tspan><tspan class="fg-bright-red">- )</tspan>
</tspan>
<tspan x="10px" y="352px"><tspan class="fg-bright-blue bold">21</tspan><tspan> </tspan><tspan class="fg-bright-red">- })</tspan><tspan>.flatten()</tspan>
<tspan x="10px" y="352px"><tspan class="fg-bright-blue bold">21</tspan><tspan> </tspan><tspan class="fg-bright-red">- </tspan><tspan> }).flatten()</tspan>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression; I suspect the cause is related to the tab characters in the test's source.

<tspan x="10px" y="334px"><tspan class="fg-bright-blue bold">20</tspan><tspan> </tspan><tspan class="fg-bright-red">- )</tspan>
</tspan>
<tspan x="10px" y="352px"><tspan class="fg-bright-blue bold">21</tspan><tspan> </tspan><tspan class="fg-bright-red">- })</tspan><tspan>.flatten()</tspan>
<tspan x="10px" y="352px"><tspan class="fg-bright-blue bold">21</tspan><tspan> </tspan><tspan class="fg-bright-red">- </tspan><tspan> }).flatten()</tspan>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression; I suspect the cause is related to the tab characters in the test's source.

@KeqingMoe
Copy link
Author

Thank you for pointing out the tab character issue. After investigating, I found that simply copying the addition line's approach would cause complications because the logic there is intertwined with other formatting concerns.

Instead, I've implemented a cleaner solution: adding a separate normalized field to the Loc struct to track tab-normalized positions independently. This approach:

  1. Preserves the original fix for Unicode characters – still using character-based counting for correct multi-byte character handling.
  2. Properly handles tab expansion – by maintaining normalized position calculations alongside the original ones.
  3. Avoids interfering with existing logic – the change is localized and doesn't break other assumptions in the codebase.

The key changes:

  • In Loc, added a normalized: usize field to track the position after tab expansion
  • Updated position calculations to use this normalized value when determining highlight ranges
  • This ensures that both Unicode characters and tab characters are handled correctly in the same rendering pass

All existing tests pass with this change, including the one with tab characters that was failing before. The solution is more maintainable than trying to retrofit the addition line's complex logic into the removal handling.

Let me know if you'd like me to add more comments or if there are other edge cases we should consider.

@KeqingMoe KeqingMoe requested a review from Muscraft December 24, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants