lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YoRFjTIzMYZu8Hq8@zx2c4.com>
Date:   Wed, 18 May 2022 03:02:05 +0200
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Filipe Manana <fdmanana@...e.com>,
        Vadim Galitsin <vadim.galitsyn@...cle.com>
Subject: Re: [patch 0/3] x86/fpu: Prevent FPU state corruption

Hey Thomas,

On Wed, May 04, 2022 at 05:40:26PM +0200, Jason A. Donenfeld wrote:
> Hi Thomas,
> 
> On Sun, May 01, 2022 at 09:31:42PM +0200, Thomas Gleixner wrote:
> > The recent changes in the random code unearthed a long standing FPU state
> > corruption due do a buggy condition for granting in-kernel FPU usage.
>  
> Thanks for working that out. I've been banging my head over [1] for a
> few days now trying to see if it's a mis-bisect or a real thing. I'll
> ask Larry to retry with this patchset.

So, Larry's debugging was inconsistent and didn't result in anything I
could piece together into basic cause and effect. But luckily Vadim, who
maintains the VirtualBox drivers for Oracle, was able to reproduce the
issue and was able to conduct some real debugging. I've CC'd him here.
>From talking with Vadim, here are some findings thus far:

  - Certain Linux guest processes crash under high load.
  - Windows kernel guest panics.

Observation: the Windows kernel uses SSSE3 in their kernel all over the
place, generated by the compiler.

  - Moving the mouse around helps induce the crash.

Observation: add_input_randomness() -> .. -> kernel_fpu_begin() -> blake2s_compress().

  - The problem exhibits itself in rc7, so this patchset does not fix
    the issue.
  - Applying https://xn--4db.cc/ttEUSvdC fixes the issue.

Observation: the problem is definitely related to using the FPU in a
hard IRQ.

I went reading KVM to get some idea of why KVM does *not* have this
problem, and it looks like there's some careful code there about doing
xsave and such around IRQs. So my current theory is that VirtualBox's
VMM just forgot to do this, and until now this bug went unnoticed.

Since VirtualBox is out of tree (and extremely messy of a codebase), and
this appears to be an out of tree module problem rather than a kernel
problem, I'm inclined to think that there's not much for us to do, at
least until we receive information to the contrary of this presumption.

But in case you do want to do something proactively, I don't have any
objections to just disabling the FPU in hard IRQ for 5.18. And in 5.19,
add_input_randomness() isn't going to hit that path anyway. But also,
doing nothing and letting the VirtualBox people figure out their bug
would be fine with me too. Either way, just wanted to give you a heads
up.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ