lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 4 Mar 2023 12:48:06 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Mateusz Guzik <mjguzik@...il.com>, Borislav Petkov <bp@...e.de>
Cc:     Alexander Potapenko <glider@...gle.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Eric Biggers <ebiggers@...gle.com>,
        Christian Brauner <brauner@...nel.org>, serge@...lyn.com,
        paul@...l-moore.com, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-security-module@...r.kernel.org
Subject: Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible

On Sat, Mar 4, 2023 at 12:31 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> Good news: gcc provides a lot of control as to how it inlines string
> ops, most notably:
>        -mstringop-strategy=alg

Note that any static decision is always going to be crap somewhere.
You can make it do the "optimal" thing for any particular machine, but
I consider that to be just garbage.

What I would actually like to see is the compiler always generate an
out-of-line call for the "big enough to not just do inline trivially"
case, but do so with the "rep stosb/movsb" calling convention.

Then we'd just mark those with objdump, and patch it up dynamically to
either use the right out-of-line memset/memcpy function, *or* just
replace it entirely with 'rep stosb' inline.

Because the cores that do this right *do* exist, despite your hatred
of the rep string instructions. At least Borislav claims that the
modern AMD cores do better with 'rep stosb'.

In particular, see what we do for 'clear_user()', where we effectively
can do the above (because unlike memset, we control it entirely). See
commit 0db7058e8e23 ("x86/clear_user: Make it faster").

Once we'd have that kind of infrastructure, we could then control
exactly what 'memset()' does.

And I note that we should probably have added Borislav to the cc when
memset came up, exactly because he's been looking at it anyway. Even
if AMD seems to have slightly different optimization rules than Intel
cores probably do. But again, that only emphasizes the whole "we
should not have a static choice here".

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ