-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
src: move all 1-byte encodings to native #61118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review requested:
|
6130c12 to
4be8283
Compare
4be8283 to
6a46ae3
Compare
6a46ae3 to
109aa87
Compare
|
#61093 landed, rebased. |
51adfc3 to
82a6ecc
Compare
src/encoding_singlebyte.h
Outdated
| const uint16_t* const tSingleByteEncodings[28] = { | ||
| tIBM866, tKOI8_R, tKOI8_U, tMacintosh, tXMacCyrillic, | ||
| tISO_8859_2, tISO_8859_3, tISO_8859_4, tISO_8859_5, tISO_8859_6, tISO_8859_7, | ||
| tISO_8859_8, tISO_8859_8, // second time for for 8-i | ||
| tISO_8859_10, tISO_8859_13, tISO_8859_14, tISO_8859_15, tISO_8859_16, | ||
| tWindows874, tWindows1250, tWindows1251, tWindows1252, tWindows1253, | ||
| tWindows1254, tWindows1255, tWindows1256, tWindows1257, tWindows1258 | ||
| }; | ||
|
|
||
| // Matches the list of encoding sin single-byte.js | ||
| const bool fSingleByteEncodings[28] = { | ||
| fIBM866, fKOI8_R, fKOI8_U, fMacintosh, fXMacCyrillic, | ||
| fISO_8859_2, fISO_8859_3, fISO_8859_4, fISO_8859_5, fISO_8859_6, fISO_8859_7, | ||
| fISO_8859_8, fISO_8859_8, // second time for for 8-i | ||
| fISO_8859_10, fISO_8859_13, fISO_8859_14, fISO_8859_15, fISO_8859_16, | ||
| fWindows874, fWindows1250, fWindows1251, fWindows1252, fWindows1253, | ||
| fWindows1254, fWindows1255, fWindows1256, fWindows1257, fWindows1258 | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the header is included in more than one translation unit, it may cause ODR violations. We can either mark them inline or move their definitions to a .cc file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed now
|
For the cpp & other lint errors, this might help: |
82a6ecc to
35b7960
Compare
b5d51ae to
ee8b680
Compare
|
@mertcanaltin lint was just not ready yet (as was noted in the PR description), fixed. |
ee8b680 to
0847010
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61118 +/- ##
==========================================
- Coverage 89.87% 88.95% -0.92%
==========================================
Files 671 670 -1
Lines 203178 203117 -61
Branches 39062 38967 -95
==========================================
- Hits 182599 180681 -1918
- Misses 12926 14717 +1791
- Partials 7653 7719 +66
🚀 New features to boost your workflow:
|
0847010 to
1d96511
Compare
gurgunday
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
lgtm other than one comment
| return; | ||
| } | ||
|
|
||
| uint16_t* dst = node::UncheckedMalloc<uint16_t>(length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can allocate a huge uint16_t[length] before any V8 string-length limit check, and if it ends up being higher than what V8 accepts, we would allocate for no reason
Maybe it would be better to guard this with sth like:
| uint16_t* dst = node::UncheckedMalloc<uint16_t>(length); | |
| if (length > static_cast<size_t>(v8::String::kMaxLength)) { | |
| // throw | |
| } | |
| uint16_t* dst = node::UncheckedMalloc<uint16_t>(length); |
Let me know if I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
0ff7b97 to
7e40472
Compare
| static v8::MaybeLocal<v8::Value> Raw(v8::Isolate* isolate, | ||
| uint16_t* buf, | ||
| size_t buflen); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about adding this
cc @addaleax @joyeecheung perhaps, wdyt?
7e40472 to
fee12a7
Compare
Co-authored-by: Mert Can Altin <mertgold60@gmail.com>
fee12a7 to
bec0adb
Compare
Tracking: #61041
This builds on top of #61093 and gives an additional ~2x improvement by moving the same logic to native
The old native fast path removed in #61093 had a bunch of nested ifs and converted data from utf16 to utf8 and then back to utf16
This instead constructs strings using direct maps as #61093 and returns them as raw buffers
windows-1252, main pre-#61093:windows-1252, main with #61093:windows-1252, this PR:This also similarly improves all other 1-byte encodings compared to #61093
For comparison, Bun:
v8/jsc is not an issue, unoptimal code is
cc @nodejs/performance