Cut a bunch of unused code out of mbstring #20912

alexdowad · 2026-01-12T00:34:13Z

The legacy mbfl_strcut function is only used to implement mb_strcut
for legacy text encodings which 1) do not use a fixed number of bytes
per codepoint, 2) do not have an 'mblen_table' which can be used to
quickly determine the codepoint length of a byte sequence, and 3) do
not have a specialized 'mb_cut' function which implements mb_strcut
for that text encoding.

Remove unused code from mbfl_strcut, and leave only what is currently
needed for the implementation of mb_strcut.

Over the last few years, I refactored mbstring to perform encoding conversion
a buffer at a time, rather than a single byte at a time. This resulted in a
huge performance increase.

After the refactoring, the old "byte-at-a-time" code was retained for two
reasons:

It was used by the mailparse PECL extension.
It was used to implement mb_strcut for some text encodings.

However, after reviewing mailparse's use of mbstring, it is clear that
mailparse only relies on mbstring for decoding of QPrint, and possibly
Base64. It does not use the byte-at-a-time conversion code for any
other encoding.

Further, mb_strcut only relies on the byte-at-a-time conversion code
for a limited number of legacy text encodings, such as ISO-2022-JP,
HZ, UTF-7, etc.

Hence, we can remove over 5000 lines of unused code without breaking
anything. This will help to reduce binary size, and make the mbstring
codebase easier to navigate for new contributors.

The legacy mbfl_strcut function is only used to implement mb_strcut for legacy text encodings which 1) do not use a fixed number of bytes per codepoint, 2) do not have an 'mblen_table' which can be used to quickly determine the codepoint length of a byte sequence, and 3) do not have a specialized 'mb_cut' function which implements mb_strcut for that text encoding. Remove unused code from mbfl_strcut, and leave only what is currently needed for the implementation of mb_strcut.

Over the last few years, I refactored mbstring to perform encoding conversion a buffer at a time, rather than a single byte at a time. This resulted in a huge performance increase. After the refactoring, the old "byte-at-a-time" code was retained for two reasons: 1) It was used by the mailparse PECL extension. 2) It was used to implement mb_strcut for some text encodings. However, after reviewing mailparse's use of mbstring, it is clear that mailparse only relies on mbstring for decoding of QPrint, and possibly Base64. It does not use the byte-at-a-time conversion code for any other encoding. Further, mb_strcut only relies on the byte-at-a-time conversion code for a limited number of legacy text encodings, such as ISO-2022-JP, HZ, UTF-7, etc. Hence, we can remove over 5000 lines of unused code without breaking anything. This will help to reduce binary size, and make the mbstring codebase easier to navigate for new contributors.

alexdowad · 2026-01-12T00:36:22Z

@remicollet I ran the tests for mailparse against this change to ensure it doesn't break anything. Tests for mailparse are still passing.

alexdowad added 2 commits January 12, 2026 09:26

alexdowad requested a review from youkidearitai as a code owner January 12, 2026 00:34

github-actions bot added the Extension: mbstring label Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cut a bunch of unused code out of mbstring #20912

Cut a bunch of unused code out of mbstring #20912

alexdowad commented Jan 12, 2026

Uh oh!

alexdowad commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Cut a bunch of unused code out of mbstring #20912

Are you sure you want to change the base?

Cut a bunch of unused code out of mbstring #20912

Conversation

alexdowad commented Jan 12, 2026

Uh oh!

alexdowad commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant