-
Notifications
You must be signed in to change notification settings - Fork 18
Add page for "Checking a Local Folder with URL Remapping" #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This page covers the use case where a local folder is checked in preparation for being uploaded to a remote URL. The local files might reference the future URLs and those URLs should be remapped to local files. Example of use case: lycheeverse/lychee#1918 TODO: - [ ] WRITE TESTS IN LYCHEE REPO
| - Remaps are applied textually. For instance, `--remap 'https://example\.com/a | ||
| file://tmp` would also apply to a URL of `https://example.com/ab`. This is | ||
| undesirable but safely excluding this is very hard. Using regex's `$` would | ||
| conflict with `#fragments` or `?query` parameters, and appending a `/` would | ||
| cause the URL without a slash to be missed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reading this, I did wonder if we should mention a workaround. For example, we could suggest the following regular expression either here or in a footnote.
^https://example\.com/a($|[/?#])
It's not flawless, but still helpful, perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good idea and, in practice, it's probably good enough for 99% of cases. I shouldn't be so pessimistic about things like this :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also skip the technical explanation why this is hard to avoid. I would replace that with Matthias' example which should cover most cases properly, as you said.
| - Remaps are applied textually. For instance, `--remap 'https://example\.com/a | |
| file://tmp` would also apply to a URL of `https://example.com/ab`. This is | |
| undesirable but safely excluding this is very hard. Using regex's `$` would | |
| conflict with `#fragments` or `?query` parameters, and appending a `/` would | |
| cause the URL without a slash to be missed. | |
| - Remaps are applied textually. For instance, `--remap 'https://example\.com/foo' | |
| 'file://tmp'` would also apply to a URL of `https://example.com/foobar`. | |
| To avoid this, a regex pattern can be used to match the end of a path element, | |
| e.g `'^https://example\.com/foo($|[/?#])'`. |
EDIT: Ah, that regex is interpreted is mentioned next. Maybe it should be reordered somehow. Though I have an idea how to avoid it ... see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of https://github.com/lycheeverse/lycheeverse.github.io/blob/aaebd37af4652e170bd98da8bb847fb1222cc581/src/content/docs/recipes/local-folder.mdx, this dot point looks like this. Thanks to @mre for the workaround suggestion.
-
Remaps are applied textually. As an example, the remap
--remap "^https://example\.com/docs file://$(pwd)/out"applies to any URL beginning with that string even if it's inside a different subfolder.
For instance, it would also apply to a URL ofhttps://example.com/docs-2/page.If you need to guard against this, you can change the regex to end with
([?#/]|$)and add$1to the replacement, like so:--remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out$1"
edit: need escape $1 in bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. And yes, $1 would need to become \$1 here, not only in bash, but in any UNIX shell within double-quotes or no quotation. Single-quotes could be used to avoid this, but then $(pwd) wouldn't expand either.
But does $1 really add the (...) match to the replacement string here? Typically, regex match + replacement features use \1 (resp. \2 etc) for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mixed. sed uses \1, but rust and javascript use $1. The PR adding tests includes this at https://github.com/rina-forks/lychee/blob/2c4fb5c4c60aec74b5c5f59ccd7c1b00752eef2d/lychee-bin/tests/cli.rs#L3268
|
Oh, and feel free to delete the old examples that feel outdated. It's better to have no example than to have a misleading one. |
|
Great stuff, makes me wonder whether |
|
i tried to write some tests following this approach over at lycheeverse/lychee#1965. it was not very fun, has many problems and reveals bugs in lychee. honestly it's so troublesome that idk if it's worth pushing out this docs page. maybe we should just expedite the fixing of --root-dir and --base-url to perform the mapping properly. edit: i'm less doomed now, it's not that bad - i'll fix up this docs page. |
katrinafyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reviews. I've updated the text and added a simple case for the more complicated Mapping a Remote Subfolder to a Local Folder section. This simpler case is applicable to a number of cases.
| - Remaps are applied textually. For instance, `--remap 'https://example\.com/a | ||
| file://tmp` would also apply to a URL of `https://example.com/ab`. This is | ||
| undesirable but safely excluding this is very hard. Using regex's `$` would | ||
| conflict with `#fragments` or `?query` parameters, and appending a `/` would | ||
| cause the URL without a slash to be missed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of https://github.com/lycheeverse/lycheeverse.github.io/blob/aaebd37af4652e170bd98da8bb847fb1222cc581/src/content/docs/recipes/local-folder.mdx, this dot point looks like this. Thanks to @mre for the workaround suggestion.
-
Remaps are applied textually. As an example, the remap
--remap "^https://example\.com/docs file://$(pwd)/out"applies to any URL beginning with that string even if it's inside a different subfolder.
For instance, it would also apply to a URL ofhttps://example.com/docs-2/page.If you need to guard against this, you can change the regex to end with
([?#/]|$)and add$1to the replacement, like so:--remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out$1"
edit: need escape $1 in bash
|
Great work. From my side, it's good to go. The only open question is #140 (comment). |
katrinafyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny changes. I also added a box asking for feedback, in case users find edge cases with this guide. Hopefully this is okay and not too many coloured boxes.
| - Remaps are applied textually. For instance, `--remap 'https://example\.com/a | ||
| file://tmp` would also apply to a URL of `https://example.com/ab`. This is | ||
| undesirable but safely excluding this is very hard. Using regex's `$` would | ||
| conflict with `#fragments` or `?query` parameters, and appending a `/` would | ||
| cause the URL without a slash to be missed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mixed. sed uses \1, but rust and javascript use $1. The PR adding tests includes this at https://github.com/rina-forks/lychee/blob/2c4fb5c4c60aec74b5c5f59ccd7c1b00752eef2d/lychee-bin/tests/cli.rs#L3268
|
Thanks. I'll go ahead and merge this. It's a big step toward making checking local files less surprising. Great work! |
|
Thank you!! It feels like a big moment to have this merged, hopefully it can help lots of people :D |
This page covers the use case where a local folder is checked in preparation for being uploaded to a remote URL. The local files might reference the future URLs and those URLs should be remapped to local files.
Example of use case: lycheeverse/lychee#1918
In particular, I would like feedback on the title and on the caution box (should it be in the remaps page?).
TODO:
closes #133