lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Jan 2016 14:09:36 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Petr Mladek <pmladek@...e.com>
cc:	Jan Kara <jack@...e.cz>, Ben Hutchings <ben@...adent.org.uk>,
	Tejun Heo <tj@...nel.org>,
	Sasha Levin <sasha.levin@...cle.com>, Shaohua Li <shli@...com>,
	LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
	Daniel Bilik <daniel.bilik@...system.cz>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25

On Tue, 26 Jan 2016, Petr Mladek wrote:
> On Tue 2016-01-26 10:34:00, Jan Kara wrote:
> > On Sat 23-01-16 17:11:54, Thomas Gleixner wrote:
> > > On Sat, 23 Jan 2016, Ben Hutchings wrote:
> > > > On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote:
> > > > > > Looks like it requires more than trivial backport (I think). Tejun?
> > > > > 
> > > > > The timer migration has changed quite a bit.  Given that we've never
> > > > > seen vmstat work crashing in 3.18 era, I wonder whether the right
> > > > > thing to do here is reverting 874bbfe600a6 from 3.18 stable?
> > > > 
> > > > It's not just 3.18 that has this; 874bbfe600a6 was backported to all
> > > > stable branches from 3.10 onward.  Only the 4.2-ckt branch has
> > > > 22b886dd10180939.
> > > 
> > > 22b886dd10180939 fixes a bug which was introduced with the timer wheel
> > > overhaul in 4.2. So only 4.2/3 should have it backported.
> > 
> > Thanks for explanation. So do I understand right that timers are always run
> > on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to
> > run timer for delayed work on the calling CPU) doesn't make sense there? If
> > that is true than reverting the commit from older stable kernels is
> > probably the easiest way to resolve the crashes.
> 
> The commit 874bbfe600a6 ("workqueue: make sure delayed work run in
> local cpu") forces the timer to run on the local CPU. It might be correct
> for vmstat. But I wonder if it might break some other delayed work
> user that depends on running on different CPU.

The default of add_timer() is to run on the current cpu. It only moves the
timer to a different cpu when the power saving code says so. So 874bbfe600a6
enforces that the timer runs on the cpu on which queue_delayed_work() is
called, but before that commit it was likely that the timer was queued on the
calling cpu. So there is nothing which can depend on running on a different
CPU, except callers of queue_delayed_work_on() which provide the target cpu
explicitely. 874bbfe600a6 does not affect those callers at all.

Now, what's different is:

+       if (cpu == WORK_CPU_UNBOUND)
+               cpu = raw_smp_processor_id();
        dwork->cpu = cpu;

So before that change dwork->cpu was set to WORK_CPU_UNBOUND. Now it's set to
the current cpu, but I can't see how that matters.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ