[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec88d9a2-272e-973e-55ce-f3d411aafcd1@rasmusvillemoes.dk>
Date: Mon, 1 Apr 2019 22:43:13 +0200
From: Rasmus Villemoes <linux@...musvillemoes.dk>
To: Sultan Alsawaf <sultan@...neltoast.com>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Nathan Chancellor <natechancellor@...il.com>
Subject: Re: [RFCv2] string: Use faster alternatives when constant arguments
are used
On 30/03/2019 23.59, Sultan Alsawaf wrote:
> How can the memcmps cross a page boundary when memcmp itself will
> only read in large buffers of data at word boundaries?
Consider your patch replacing !strcmp(buf, "123") by !memcmp(buf, "123",
4). buf is known to point to a nul-terminated string. But it may point
at, say, the second-last byte in a page, with the last byte in that page
being a nul byte, and the following page being MMIO or unmapped or all
kinds of bad things. On e.g. x86 where unaligned accesses are cheap, and
seeing that you're only comparing for equality, gcc is likely to compile
the memcmp version into
*(u32*)buf == 0x00333231
because you've told the compiler that there's no problem accessing four
bytes starting at buf. Boom. Even without unaligned access being cheap
this can happen; suppose the length is 8 instead, and gcc somehow knows
that buf is four-byte aligned (and in this case it happens to point four
bytes before a page boundary), so it could compile the memcmp(,,8) into
*(u32*)(buf+4) == secondword && *(u32*)buf == firstword
(or do the comparisons in the "natural" order, but it might still do
both loads first).
> And if there are concerns for some arches but not others, then couldn't this be
> a feasible optimization for those which would work well with it?
No. First, these are concerns for all arches. Second, if you can find
some particular place where string parsing/matching is in any way
performance relevant and not just done once during driver init or
whatnot, maybe the maintainers of that file would take a patch
hand-optimizing some strcmps to memcmps, or, depending on what the code
does, perhaps replacing the whole *cmp logic with a custom hash table.
But a patch implicitly and silently touching thousands of lines of code,
without an analysis of why none of the above is a problem for any of
those lines, for any .config, arch, compiler version? No.
Rasmus
Powered by blists - more mailing lists