lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh57tMaJxcH=kWE4xdKLjayKSDEVvMwHG4fKZ5tUHF6mg@mail.gmail.com>
Date:   Mon, 30 Aug 2021 15:00:06 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Dan Williams <dan.j.williams@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [patch 01/10] x86/fpu/signal: Clarify exception handling in restore_fpregs_from_user()

On Mon, Aug 30, 2021 at 2:33 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> There's a place where we care about #PF vs. #MC (see upthread)...

Interestingly (or perhaps not), that case is a problem case in general
for "fault_in_pages_readable()".

That function will only access data every PAGE_SIZE bytes, but if we
have other exceptions that can happen at a cacheline granularity, the
whole "retry after faulting pages in" may fail.

So that kind of

 - try to copy from user space

 - if that fails, do fault_in_pages_readable() and retry

loop can loop forever.

restore_fpregs_from_user() is odd and special in trying to deal with
it by looking at the error code. I'm n ot convinced it's the right
thing to do, since it just means that all the other places we do this
can be problematic.

But since the Intel machine check stuff is so misdesigned and doesn't
work on any normal machines, most people can't test any of this, none
of this matters, and it's only broken on those "serious enterprise
machines" setups that people think are better, but are actually just
almost entirely untested and thus don't work right.

I'm not sure what the right model here is. We might need to make
fault_in_pages_readable() do things a cacheline at a time, at which
point those repeat loops start working, and the error code thing
becomes pointless.

What I _am_ sure about is that the error code model doesn't work. It
may work in that one special case, but that just means that all the
non-special cases are broken.

So I'll argue that it's a fundamentally broken model, and that
_ASM_EXTABLE_FAULT thing is not just confusing, but actively hurtful.

            Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ