lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Jun 2008 08:02:13 +0200
From:	Jürgen Mell <j.mell@...nline.de>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org, arjan@...ux.intel.com, mingo@...e.hu,
	hpa@...or.com, tglx@...utronix.de
Subject: Re: CONFIG_PREEMPT causes corruption of application's FPU stack

On Dienstag, 3. Juni 2008, Suresh Siddha wrote:
> On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> > On Sun, Jun 01, 2008 at 06:47:29PM +0200, Jürgen Mell wrote:
> > > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > > j.mell@...nline.de writes:
> > > > > or it is restored more than
> > > > > once. Please keep in mind, that I am always running two Einstein
> > > > > processes simultaneously on my two cores!
> > > > > I am willing to do further testing of this problem if someone
> > > > > can give me a hint how to continue.
> > > >
> > > > My bet would have been actually on
> > > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some
> > > > nasty things (enable interrupts in the middle of __switch_to).
> > > >
> > > > I looked through the old patchkit and couldn't find any specific
> > > > PREEMPT problems. All code it changes should run with preempt_off
> > > >
> > > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > > changes (__unlazy_fpu, math_state_restore) and see if that
> > > > triggers anywhere.
> > >
> > > No, that did not trigger. I put the WARN_ON_ONCE into process.c,
> > > traps.c and also into the __unlazy_fpu macro in i387.h but I got no
> > > messages anywhere (dmesg, /var/log/messages, /var/log/warn) when the
> > > trap #8 occurred.
> > > Meanwhile I am also running the tests on another machine to make
> > > sure it is not a hardware-related problem.
> > >
> > > Any new ideas are welcome!
> > >
> > > Meanwhile I will go back to 2.6.20 and revert
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong
> > > track...
> >
> > 2.6.20 doesn't have the commit
> > 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> >
> > As you are seeing this corruption problem starting from 2.6.20,
> > atleast recent(in 2.6.26 series) fpu changes don't play a role in
> > this.
> >
> > I will try to reproduce your issue.
>
> Jürgen, I think I found the reason for your issue aswell.
>
> As you observed, it is probably coming from the commit
> acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> optimization
>
> It's a side affect though. This is the failing scenario:
>
> process 'A' in save_i387_ia32() just after clear_used_math()
>
> Got an interrupt and pre-empted out.
>
> At the next context switch to process 'A' again, kernel tries to restore
> the math state proactively and sees a fpu_counter > 0 and
> !tsk_used_math()
>
> This results in init_fpu() during the __switch_to()'s
> math_state_restore()
>
> And resulting in fpu corruption which will be saved/restored
> (save_i387_fxsave and restore_i387_fxsave) during the remaining
> part of the signal handling after the context switch.
>
> So in short, yes the problem shows up for preempt enabled kernels and
> the same patch I sent out 30 mins back (appended again) should fix your
> issue aswell. Can you please test this and check if my theory is indeed
> correct. If it fixes your issue aswell, then I will re-post the patch
> with a new changelog and updated comments in the patch.
>
> thanks,
> suresh

Many thanks for the patch!
I will test this immediately but as it takes some time to make sure that 
the problem is really gone it will take some time until I have a report.

Thanks,
               Jürgen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ