Skip to content

Conversation

@katrinafyi
Copy link
Contributor

@katrinafyi katrinafyi commented Dec 14, 2025

This page covers the use case where a local folder is checked in preparation for being uploaded to a remote URL. The local files might reference the future URLs and those URLs should be remapped to local files.

Example of use case: lycheeverse/lychee#1918

In particular, I would like feedback on the title and on the caution box (should it be in the remaps page?).

TODO:

  • WRITE TESTS IN LYCHEE REPO

closes #133

This page covers the use case where a local folder is checked in
preparation for being uploaded to a remote URL. The local files might
reference the future URLs and those URLs should be remapped to local
files.

Example of use case: lycheeverse/lychee#1918

TODO:
- [ ] WRITE TESTS IN LYCHEE REPO
Comment on lines 52 to 56
- Remaps are applied textually. For instance, `--remap 'https://example\.com/a
file://tmp` would also apply to a URL of `https://example.com/ab`. This is
undesirable but safely excluding this is very hard. Using regex's `$` would
conflict with `#fragments` or `?query` parameters, and appending a `/` would
cause the URL without a slash to be missed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading this, I did wonder if we should mention a workaround. For example, we could suggest the following regular expression either here or in a footnote.

^https://example\.com/a($|[/?#])

It's not flawless, but still helpful, perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good idea and, in practice, it's probably good enough for 99% of cases. I shouldn't be so pessimistic about things like this :-)

Copy link
Member

@MichaIng MichaIng Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also skip the technical explanation why this is hard to avoid. I would replace that with Matthias' example which should cover most cases properly, as you said.

Suggested change
- Remaps are applied textually. For instance, `--remap 'https://example\.com/a
file://tmp` would also apply to a URL of `https://example.com/ab`. This is
undesirable but safely excluding this is very hard. Using regex's `$` would
conflict with `#fragments` or `?query` parameters, and appending a `/` would
cause the URL without a slash to be missed.
- Remaps are applied textually. For instance, `--remap 'https://example\.com/foo'
'file://tmp'` would also apply to a URL of `https://example.com/foobar`.
To avoid this, a regex pattern can be used to match the end of a path element,
e.g `'^https://example\.com/foo($|[/?#])'`.

EDIT: Ah, that regex is interpreted is mentioned next. Maybe it should be reordered somehow. Though I have an idea how to avoid it ... see below

Copy link
Contributor Author

@katrinafyi katrinafyi Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of https://github.com/lycheeverse/lycheeverse.github.io/blob/aaebd37af4652e170bd98da8bb847fb1222cc581/src/content/docs/recipes/local-folder.mdx, this dot point looks like this. Thanks to @mre for the workaround suggestion.

  • Remaps are applied textually. As an example, the remap

    --remap "^https://example\.com/docs file://$(pwd)/out"
    

    applies to any URL beginning with that string even if it's inside a different subfolder.
    For instance, it would also apply to a URL of https://example.com/docs-2/page.

    If you need to guard against this, you can change the regex to end with
    ([?#/]|$) and add $1 to the replacement, like so:

    --remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out$1"
    

edit: need escape $1 in bash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. And yes, $1 would need to become \$1 here, not only in bash, but in any UNIX shell within double-quotes or no quotation. Single-quotes could be used to avoid this, but then $(pwd) wouldn't expand either.

But does $1 really add the (...) match to the replacement string here? Typically, regex match + replacement features use \1 (resp. \2 etc) for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mixed. sed uses \1, but rust and javascript use $1. The PR adding tests includes this at https://github.com/rina-forks/lychee/blob/2c4fb5c4c60aec74b5c5f59ccd7c1b00752eef2d/lychee-bin/tests/cli.rs#L3268

@mre
Copy link
Member

mre commented Dec 16, 2025

Oh, and feel free to delete the old examples that feel outdated. It's better to have no example than to have a misleading one.

@MichaIng
Copy link
Member

Great stuff, makes me wonder whether --base-url and a counter-part are needed at all. Just for saving users one argument compared to --remap?

@katrinafyi
Copy link
Contributor Author

katrinafyi commented Dec 21, 2025

i tried to write some tests following this approach over at lycheeverse/lychee#1965. it was not very fun, has many problems and reveals bugs in lychee.

honestly it's so troublesome that idk if it's worth pushing out this docs page. maybe we should just expedite the fixing of --root-dir and --base-url to perform the mapping properly.

edit: i'm less doomed now, it's not that bad - i'll fix up this docs page.

@katrinafyi katrinafyi changed the title Add "Checking a Local Folder with URL Remapping" page Add page for "Checking a Local Folder with URL Remapping" Jan 4, 2026
Copy link
Contributor Author

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reviews. I've updated the text and added a simple case for the more complicated Mapping a Remote Subfolder to a Local Folder section. This simpler case is applicable to a number of cases.

Comment on lines 52 to 56
- Remaps are applied textually. For instance, `--remap 'https://example\.com/a
file://tmp` would also apply to a URL of `https://example.com/ab`. This is
undesirable but safely excluding this is very hard. Using regex's `$` would
conflict with `#fragments` or `?query` parameters, and appending a `/` would
cause the URL without a slash to be missed.
Copy link
Contributor Author

@katrinafyi katrinafyi Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of https://github.com/lycheeverse/lycheeverse.github.io/blob/aaebd37af4652e170bd98da8bb847fb1222cc581/src/content/docs/recipes/local-folder.mdx, this dot point looks like this. Thanks to @mre for the workaround suggestion.

  • Remaps are applied textually. As an example, the remap

    --remap "^https://example\.com/docs file://$(pwd)/out"
    

    applies to any URL beginning with that string even if it's inside a different subfolder.
    For instance, it would also apply to a URL of https://example.com/docs-2/page.

    If you need to guard against this, you can change the regex to end with
    ([?#/]|$) and add $1 to the replacement, like so:

    --remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out$1"
    

edit: need escape $1 in bash

@mre
Copy link
Member

mre commented Jan 4, 2026

Great work. From my side, it's good to go. The only open question is #140 (comment).

Copy link
Contributor Author

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny changes. I also added a box asking for feedback, in case users find edge cases with this guide. Hopefully this is okay and not too many coloured boxes.

Comment on lines 52 to 56
- Remaps are applied textually. For instance, `--remap 'https://example\.com/a
file://tmp` would also apply to a URL of `https://example.com/ab`. This is
undesirable but safely excluding this is very hard. Using regex's `$` would
conflict with `#fragments` or `?query` parameters, and appending a `/` would
cause the URL without a slash to be missed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mixed. sed uses \1, but rust and javascript use $1. The PR adding tests includes this at https://github.com/rina-forks/lychee/blob/2c4fb5c4c60aec74b5c5f59ccd7c1b00752eef2d/lychee-bin/tests/cli.rs#L3268

@mre
Copy link
Member

mre commented Jan 5, 2026

Thanks. I'll go ahead and merge this. It's a big step toward making checking local files less surprising. Great work!

@mre mre merged commit 7485d80 into lycheeverse:master Jan 5, 2026
2 checks passed
@katrinafyi
Copy link
Contributor Author

Thank you!! It feels like a big moment to have this merged, hopefully it can help lots of people :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write docs page for checking local folder (with root-dir, base-url)

3 participants