linux-kernel - Re: [RFC PATCH 07/10] x86/fpu: Rellocate fpstate on save_fpregs_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <d662bbd4-ed39-47ef-b2a4-012c765ec4ad@www.fastmail.com>
Date:   Fri, 03 Dec 2021 15:51:26 +0000
From:   "Jiaxun Yang" <jiaxun.yang@...goat.com>
To:     "Dave Hansen" <dave.hansen@...el.com>, x86@...nel.org
Cc:     "Thomas Gleixner" <tglx@...utronix.de>,
        "Ingo Molnar" <mingo@...hat.com>, "Borislav Petkov" <bp@...en8.de>,
        dave.hansen@...ux.intel.com, hpa@...or.com,
        chang.seok.bae@...el.com, linux-kernel@...r.kernel.org,
        "Jiaxun Yang" <j.yang-87@....ed.ac.uk>
Subject: Re: [RFC PATCH 07/10] x86/fpu: Rellocate fpstate on save_fpregs_to_fpstate



在2021年12月3日十二月 下午3:18，Dave Hansen写道：
> On 12/3/21 3:39 AM, Jiaxun Yang wrote:
>>>>  	if (likely(use_xsave())) {
>>>> +		xstate_update_size(fpu);
>>>>  		os_xsave(fpu->fpstate);
>>>>  		update_avx_timestamp(fpu);
>>>>  		return;
>>> Have you considered what exactly happens when you hit that WARN_ON_FPU()
>>> which otherwise ignores the allocation error?  Have you considered what
>>> happens on the os_xsave() that follows it immediately?  How about what
>>> happens the next time this task runs after that failure?
>> Thank you for the catch.
>> This is a few questions that I don't have answer, so it's a RFC.
>> 
>> I thought it is unlikely to happen as kmalloc has emergency pool.
>> But in case it happens, I guess the best way to handle it is just
>> send SIGILL to corresponding user process or panic if it's kernel
>> fpu use?
>
> We've thought a *LOT* about this exact problem over the past few years.
>
> Intel even added hardware (XFD) to prevent the situation where you land
> in the context switch code, fail a memory allocation, and have to
> destroy user data in registers.  Without XFD, there are also zero ways
> to avoid this happening to apps, *other* than preallocating the memory
> in the first place.
>
> I don't think there is *any* viable path forward with this series.

Hmm, actually I can come up some ways to workaround it.
Like we can have some sort of preallocated emergency pool
with max_feature and utilize them in case of allocation failure during context switch.
We'll get some chance to fulfill the pool again after going back from interrupt context :-)

But maybe you are right, it's not for me, a first year undergraduate student,
to comment on solutions from thousands of brilliant brains at Intel.

Appreciate for your comments to let me understand the nature of the problem.

Thanks.
-- 
- Jiaxun