[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADUfDZq4c3dRgWpevv3+29frvd6L8G9RRdoVFpFnyRsF3Eve1Q@mail.gmail.com>
Date: Wed, 1 Oct 2025 09:20:27 -0700
From: Caleb Sander Mateos <csander@...estorage.com>
To: Guan-Chun Wu <409411716@....tku.edu.tw>
Cc: akpm@...ux-foundation.org, axboe@...nel.dk, ceph-devel@...r.kernel.org,
ebiggers@...nel.org, hch@....de, home7438072@...il.com, idryomov@...il.com,
jaegeuk@...nel.org, kbusch@...nel.org, linux-fscrypt@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
sagi@...mberg.me, tytso@....edu, visitorckw@...il.com, xiubli@...hat.com
Subject: Re: [PATCH v3 2/6] lib/base64: Optimize base64_decode() with reverse
lookup tables
On Wed, Oct 1, 2025 at 3:18 AM Guan-Chun Wu <409411716@....tku.edu.tw> wrote:
>
> On Fri, Sep 26, 2025 at 04:33:12PM -0700, Caleb Sander Mateos wrote:
> > On Thu, Sep 25, 2025 at 11:59 PM Guan-Chun Wu <409411716@....tku.edu.tw> wrote:
> > >
> > > From: Kuan-Wei Chiu <visitorckw@...il.com>
> > >
> > > Replace the use of strchr() in base64_decode() with precomputed reverse
> > > lookup tables for each variant. This avoids repeated string scans and
> > > improves performance. Use -1 in the tables to mark invalid characters.
> > >
> > > Decode:
> > > 64B ~1530ns -> ~75ns (~20.4x)
> > > 1KB ~27726ns -> ~1165ns (~23.8x)
> > >
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@...il.com>
> > > Co-developed-by: Guan-Chun Wu <409411716@....tku.edu.tw>
> > > Signed-off-by: Guan-Chun Wu <409411716@....tku.edu.tw>
> > > ---
> > > lib/base64.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++----
> > > 1 file changed, 61 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/base64.c b/lib/base64.c
> > > index 1af557785..b20fdf168 100644
> > > --- a/lib/base64.c
> > > +++ b/lib/base64.c
> > > @@ -21,6 +21,63 @@ static const char base64_tables[][65] = {
> > > [BASE64_IMAP] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,",
> > > };
> > >
> > > +static const s8 base64_rev_tables[][256] = {
> > > + [BASE64_STD] = {
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63,
> > > + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1,
> > > + -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
> > > + 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1,
> > > + -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
> > > + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + },
> > > + [BASE64_URLSAFE] = {
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1,
> > > + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1,
> > > + -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
> > > + 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, 63,
> > > + -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
> > > + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + },
> > > + [BASE64_IMAP] = {
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, 63, -1, -1, -1,
> > > + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1,
> > > + -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
> > > + 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1,
> > > + -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
> > > + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
> > > + },
> >
> > Do we actually need 3 separate lookup tables? It looks like all 3
> > variants agree on the value of any characters they have in common. So
> > we could combine them into a single lookup table that would work for a
> > valid base64 string of any variant. The only downside I can see is
> > that base64 strings which are invalid in some variants might no longer
> > be rejected by base64_decode().
> >
>
> In addition to the approach David mentioned, maybe we can use a common
> lookup table for A–Z, a–z, and 0–9, and then handle the variant-specific
> symbols with a switch.
>
> For example:
>
> static const s8 base64_rev_common[256] = {
> [0 ... 255] = -1,
> ['A'] = 0, ['B'] = 1, /* ... */, ['Z'] = 25,
> ['a'] = 26, /* ... */, ['z'] = 51,
> ['0'] = 52, /* ... */, ['9'] = 61,
> };
>
> static inline int base64_rev_lookup(u8 c, enum base64_variant variant) {
> s8 v = base64_rev_common[c];
> if (v != -1)
> return v;
>
> switch (variant) {
> case BASE64_STD:
> if (c == '+') return 62;
> if (c == '/') return 63;
> break;
> case BASE64_IMAP:
> if (c == '+') return 62;
> if (c == ',') return 63;
> break;
> case BASE64_URLSAFE:
> if (c == '-') return 62;
> if (c == '_') return 63;
> break;
> }
> return -1;
> }
>
> What do you think?
That adds several branches in the hot loop, at least 2 of which are
unpredictable for valid base64 input of a given variant (v != -1 as
well as the first c check in the applicable switch case). That seems
like it would hurt performance, no? I think having 3 separate tables
would be preferable to making the hot loop more branchy.
Best,
Caleb
Powered by blists - more mailing lists