linux-kernel - Re: [PATCH] x86/fpu: verify xstate buffer size according with requested features

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANaxB-zQYC8=7frWGU2pRTDJrkVu0iR8QZCmUxSzGmBG-_b1cg@mail.gmail.com>
Date: Wed, 17 Jan 2024 23:59:37 -0800
From: Andrei Vagin <avagin@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Andrei Vagin <avagin@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, LKML <linux-kernel@...r.kernel.org>, 
	x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/fpu: verify xstate buffer size according with
 requested features

On Wed, Jan 17, 2024 at 3:52 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 1/17/24 14:30, Andrei Vagin wrote:
> > On Wed, Jan 17, 2024 at 11:34 AM Dave Hansen <dave.hansen@...elcom> wrote:
> >> Seems like the real problem here is that the fault_in_readable() doesn't
> >> match the XRSTOR.  It's going to continue to be a problem as long as we
> >> don't know what memory XRSTOR tried to access.  We can try all day long
> >> to precalculate what XRSTOR _will_ do, but that seems a bit silly
> >
> > I don't understand this part. The behavior of XRSTOR is well-defined
> > by CPU specifications, allowing us to easily precalculate the memory it
> > will attempt to access. What does it mean "we don't know what memory
> > XRSTOR tried to access"?
> >
> > xrstor restores only features that are set in fx_sw->xfeatures.
>
> XSAVE is a big old pile of fun.  Someone is confused here and that
> someone might be me.  Let me walk through my logic a bit.  Maybe you can
> point out where I've gone wrong. :)

I could be wrong too, so thank you for looking at this and helping to
find the right solution.

>
> Take a look at the "Legacy Region of an XSAVE Area" in the SDM.
>
>         The x87 state component comprises bytes 23:0 and bytes 159:32.
>         The SSE state component comprises bytes 31:24 and bytes 415:160.
>         The XSAVE feature set does not use bytes 511:416; bytes 463:416
>         are reserved.
>
> and the next section:
>
>         The XSAVE header of an XSAVE area comprises the 64 bytes
>         starting at offset 512 from the area’s base address:
>         ...
>         Bytes 7:0 of the XSAVE header is a state-component bitmap (see
>         Section 13.1) called XSTATE_BV.
>
> XRSTOR accesses memory based on XSTATE_BV (and XCOMP_BV which should be
> irrelevant here).  So we're looking for something at 512 bytes up in the
> XSAVE buffer.
>
> Let's have gdb help us out a bit.  Where is sw_reserved?
>
> (gdb) print/d &((struct fxregs_state *)0)->sw_reserved
> $4 = 464
>
> Where does XRSTOR itself look?
>
> (gdb) print/d &((union fpregs_state *)0)->xsave->header.xfeatures
> $5 = 512
>
> (xfeatures aka. XSTATE_BV)
>
> There _was_ a reason once upon a time that the kernel started sticking a
> copy of XSTATE_BV in 'sw_reserved'.  I just forget what it is at the
> moment.  It's horribly confusing to it laid out like this, but the SDM
> is pretty clear: "the XSAVE feature set does not use bytes 511:416".
>
> "fx_sw" is actually a software-defined and software-only-consumed area
> of the XSAVE buffer, thus the '_sw'.  Nothing in the '_sw' section tells
> us how the hardware will behave.

I think you don't take into account the requested-feature bitmap (RFBM),
which is the logical-and of edx:eax and XCR0. In our case, edx:eax
is set to the value of fx_sw->xfeatures.

>
> >> because the CPU knows where the fault happened.  It told us in CR2 and
> >> all we have to do is plumb that back to fault_in_readable().
> >
> > I considered this option as well, but then I decided that this approach
> > is better. The most important aspect is that it always rejects bad
> > buffers, allowing a user space to detect an issue even when a fault
> > isn't triggered. I believe proper handling of xrstor page faults could
> > be a valuable additional improvement to this change. If we detect a
> > fault outside of a provided buffer, we can print a warning to signal
> > that check_xstate_in_sigframe is incomplete.
>
> I'm not really following the logic there.  What's the downside of taking
> the fault?

Let's consider a scenario where someone messed up with an fpu state on a
signal frame. With my approach, a mistake can be promptly detected.
However, if we incorporate the page fault handling of xrstor, a mistake
will only be identified if xrstor triggers a fault. In cases where a
buffer is allocated in a large memory mapping, xrstor may silently read
memory beyond the buffer. Next time, a page beyond a buffer might be
swapped out, xrstore triggers a fault leading to application crashes.

Thanks,
Andrei