lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9A4E3B85-BE0C-40C6-B261-E634CFCD9819@intel.com>
Date:   Wed, 3 Feb 2021 04:10:24 +0000
From:   "Bae, Chang Seok" <chang.seok.bae@...el.com>
To:     Borislav Petkov <bp@...e.de>
CC:     Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, x86-ml <x86@...nel.org>,
        "Brown, Len" <len.brown@...el.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 07/21] x86/fpu/xstate: Introduce helpers to manage
 dynamic xstate buffers

On Jan 27, 2021, at 02:41, Borislav Petkov <bp@...e.de> wrote:
> On Wed, Jan 27, 2021 at 01:23:57AM +0000, Bae, Chang Seok wrote:
>> The xstate buffer may expand on the fly. The size has to be correctly
>> calculated if needed. CPUID provides essential information for the
>> calculation. Instead of reading CPUID repeatedly, store them -- the offset and
>> size are already stored here. The 64B alignment looks to be missing, so added
>> here.
> 
> /me goes and digs into the SDM.
> 
> Do you mean this:
> 
> "Bit 01 is set if, when the compacted format of an XSAVE area is used,
> this extended state component located on the next 64-byte boundary
> following the preceding state component (otherwise, it is located
> immediately following the preceding state component)."
> 
> So judging by your variable naming, you wanna record here whether the
> buffer aligns on 64 bytes.
> 
> Yes, no?

Yes, you’re right.

> How about a comment over that variable so that people reading the code,
> know what it records and do not have to open the SDM each time.

Okay, how about:
“
This alignment bit is set if the state is saved on a 64B-aligned address in
the compacted format buffer.
"

>> Maybe:
>>    "When the buffer is more than this size, the current mechanism is
>>     potentially marginal to support the allocations."
> 
> Where do you get those formulations?!
> 
> Are you simply trying to say that for buffers larger than 64K, the
> kernel needs "a more sophisticated allocation scheme"?
> 
> I'd suggest you try simple formulations first.
> 
> And why does it need a more sophisticated allocation scheme? Is 64K
> magical?
> 
> Also, I'm assuming here - since you're using vmalloc - that XSAVE* can
> handle virtually contiguous memory. SDM says it saves to "mem" and
> doesn't specify so it sounds like it does but let's have a confirmation
> here pls.

Yes, correct.

>>>> +#define XSTATE_BUFFER_MAX_BYTES		(64 * 1024)
>>> 
>>> What's that thing for when we have fpu_kernel_xstate_max_size too?
>> 
>> The threshold size is what the current mechanism can comfortably allocate
>> (maybe at most). The warning is left when the buffer size goes beyond the 
>> threshold. Then, we may need to consider a better allocation mechanism.
> 
> As above, why?
> 
>> Although a warning is given, vmalloc() may manage to allocate this size. So,
>> it was not considered a hard hit yet. vmalloc() failure will return an error
>> later.
> 
> And that warning is destined for whom, exactly?
> 
> When can that state become more than 64K?
> 
> What is that artificial limit for?
> 
> A whole lot of questions…

Okay, let me try to explain..

The threshold here could be more than that. But the intention is a heads-up to
(re-)consider (a) a new allocation mechanism and (b) to shrink the memory
allocation.

Also, the AMX state size is limited to (a bit less than) 64KB and it was
discussed that vmalloc() will be okay with AMX [2].

DaveH, correct me if I'm wrong.

>>>> +	state_ptr = vmalloc(newsz);
>>>> +	if (!state_ptr) {
>>>> +		trace_x86_fpu_xstate_alloc_failed(fpu);
>>> 
>>> WTH is that tracepoint here for?
>> 
>> While it returns an error, this function can be on the path of NMI handling.
> 
> How?
> 
> You're allocating an xstate buffer in NMI context?!

Oh, sorry. The typo could make it confusing here -- s/NMI/#NM/.

>> Then, likely only with the “unexpected #NM exception” message. So, logging a
>> tracepoint can provide evidence of the allocation failure in that case.
> 
> Who's going to see that tracepoint, people who are tracing the system
> but not normal users.

Maybe it is possible to backtrack this allocation failure out of #NM handling.
But the tracepoint can provide a clear context, although limited to those
using it.

>> PATCH9 introduces a wrapper that determines which to take. It simply returns
>> state_ptr when not a null pointer. So, the logic is to use the dynamic buffer
>> when available.
> 
> Why not allocate the xstate buffer by default instead of being embedded
> in struct fpu?

Indeed, this is the most preferred way on one hand. But there was a change to
the current allocation approach by Ingo about 6 years ago [3].

So, I’m wondering his current thought on this suggestion.

> You're already determining its max_size and you can use that to do the
> allocation. Two buffers is calling for trouble.

But if so, every task will consume 8KB (or up to 64KB) with AMX. Bad is the
waste of memory for those not using the state at all.

>> [1] https://lore.kernel.org/lkml/69721125-4e1c-ca9c-ff59-8e1331933e6c@intel.com/#t
> 
> Ok, I read that subthread.
> 
> The reasoning *why* we're using vmalloc() needs to be explained in a
> comment over alloc_xstate_buffer() otherwise we will forget and that is
> important.

Maybe:
“
If the task with vmalloc()-allocated buffer tends to terminate quickly,
vfree()-induced IPIs may be a concern. Implement cache may be helpful on this.
But the task with large state is likely to live longer. So, use vmalloc()
simply.
"
Let me know if this is not enough.

Thanks,
Chang

[2] https://lore.kernel.org/lkml/CALCETrW8u5rUsZvoo5t4YtC+4boBVcK__-srtA1+-YX06QYD1w@mail.gmail.com/
[3] https://lore.kernel.org/lkml/1430848300-27877-56-git-send-email-mingo@kernel.org/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ