linux-kernel - Re: CONFIG_PREEMPT causes corruption of application's FPU stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 4 Jun 2008 12:53:17 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Jürgen Mell <j.mell@...nline.de>
Cc:	Suresh Siddha <suresh.b.siddha@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org, arjan@...ux.intel.com, hpa@...or.com,
	tglx@...utronix.de,
	Simon Holm Thøgersen <odie@...aau.dk>
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack


* Jürgen Mell <j.mell@...nline.de> wrote:

> > Jürgen, I think I found the reason for your issue aswell.
> >
> > As you observed, it is probably coming from the commit
> > acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> > optimization
> >
> > It's a side affect though. This is the failing scenario:
> >
> > process 'A' in save_i387_ia32() just after clear_used_math()
> >
> > Got an interrupt and pre-empted out.
> >
> > At the next context switch to process 'A' again, kernel tries to restore
> > the math state proactively and sees a fpu_counter > 0 and
> > !tsk_used_math()
> >
> > This results in init_fpu() during the __switch_to()'s
> > math_state_restore()
> >
> > And resulting in fpu corruption which will be saved/restored
> > (save_i387_fxsave and restore_i387_fxsave) during the remaining
> > part of the signal handling after the context switch.
> >
> > So in short, yes the problem shows up for preempt enabled kernels and
> > the same patch I sent out 30 mins back (appended again) should fix your
> > issue aswell. Can you please test this and check if my theory is indeed
> > correct. If it fixes your issue aswell, then I will re-post the patch
> > with a new changelog and updated comments in the patch.
> >
> 
> I have applied your patch to both an openSUSE 2.6.22.17 kernel and a 
> 2.6.26-rc4 kernel.org kernel and run the test with Einstein@...e on 
> two different machines. One machine is running 24 hours now, the other 
> 18 hours.
> 
> During this time there were no faults on both machines.
> 
> As it never before took more than 12 hours until the first appearance 
> of the problem, I think your patch fixed it. Very good work!
> 
> I will continue running the test, but I believe we can call this 
> fixed.
> 
> Thank you again!

fix applied to tip/x86/urgent. Thanks everyone, nice find!

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/