[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080604105317.GA17874@elte.hu>
Date: Wed, 4 Jun 2008 12:53:17 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Jürgen Mell <j.mell@...nline.de>
Cc: Suresh Siddha <suresh.b.siddha@...el.com>,
Andi Kleen <andi@...stfloor.org>,
Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org, arjan@...ux.intel.com, hpa@...or.com,
tglx@...utronix.de,
Simon Holm Thøgersen <odie@...aau.dk>
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack
* Jürgen Mell <j.mell@...nline.de> wrote:
> > Jürgen, I think I found the reason for your issue aswell.
> >
> > As you observed, it is probably coming from the commit
> > acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> > optimization
> >
> > It's a side affect though. This is the failing scenario:
> >
> > process 'A' in save_i387_ia32() just after clear_used_math()
> >
> > Got an interrupt and pre-empted out.
> >
> > At the next context switch to process 'A' again, kernel tries to restore
> > the math state proactively and sees a fpu_counter > 0 and
> > !tsk_used_math()
> >
> > This results in init_fpu() during the __switch_to()'s
> > math_state_restore()
> >
> > And resulting in fpu corruption which will be saved/restored
> > (save_i387_fxsave and restore_i387_fxsave) during the remaining
> > part of the signal handling after the context switch.
> >
> > So in short, yes the problem shows up for preempt enabled kernels and
> > the same patch I sent out 30 mins back (appended again) should fix your
> > issue aswell. Can you please test this and check if my theory is indeed
> > correct. If it fixes your issue aswell, then I will re-post the patch
> > with a new changelog and updated comments in the patch.
> >
>
> I have applied your patch to both an openSUSE 2.6.22.17 kernel and a
> 2.6.26-rc4 kernel.org kernel and run the test with Einstein@...e on
> two different machines. One machine is running 24 hours now, the other
> 18 hours.
>
> During this time there were no faults on both machines.
>
> As it never before took more than 12 hours until the first appearance
> of the problem, I think your patch fixed it. Very good work!
>
> I will continue running the test, but I believe we can call this
> fixed.
>
> Thank you again!
fix applied to tip/x86/urgent. Thanks everyone, nice find!
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists