linux-kernel - Re: Crashes with 874bbfe600a6 in 3.18.25

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160203122855.GB6762@dhcp22.suse.cz>
Date:	Wed, 3 Feb 2016 13:28:56 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Jiri Slaby <jslaby@...e.cz>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
	Ben Hutchings <ben@...adent.org.uk>, Tejun Heo <tj@...nel.org>,
	Sasha Levin <sasha.levin@...cle.com>, Shaohua Li <shli@...com>,
	LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
	Daniel Bilik <daniel.bilik@...system.cz>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25

[I wasn't aware of this email thread before so I am jumping in late]

On Wed 03-02-16 10:35:32, Jiri Slaby wrote:
> On 01/26/2016, 02:09 PM, Thomas Gleixner wrote:
> > On Tue, 26 Jan 2016, Petr Mladek wrote:
[...]
> >> The commit 874bbfe600a6 ("workqueue: make sure delayed work run in
> >> local cpu") forces the timer to run on the local CPU. It might be correct
> >> for vmstat. But I wonder if it might break some other delayed work
> >> user that depends on running on different CPU.
> > 
> > The default of add_timer() is to run on the current cpu. It only moves the
> > timer to a different cpu when the power saving code says so. So 874bbfe600a6
> > enforces that the timer runs on the cpu on which queue_delayed_work() is
> > called, but before that commit it was likely that the timer was queued on the
> > calling cpu. So there is nothing which can depend on running on a different
> > CPU, except callers of queue_delayed_work_on() which provide the target cpu
> > explicitely. 874bbfe600a6 does not affect those callers at all.
> > 
> > Now, what's different is:
> > 
> > +       if (cpu == WORK_CPU_UNBOUND)
> > +               cpu = raw_smp_processor_id();
> >         dwork->cpu = cpu;
> > 
> > So before that change dwork->cpu was set to WORK_CPU_UNBOUND. Now it's set to
> > the current cpu, but I can't see how that matters.

It matters because if somebody did queue_delayed_work() and the
current cpu gets offlined then even though the associated timer gets
migrated the __queue_work wouldn't recognize the associated cpu as
WORK_CPU_UNBOUND anymore and won't reset the following path will go
kaboom...

> The CPU was 168, and that one was offlined in the meantime. So
> __queue_work fails at:
>   if (!(wq->flags & WQ_UNBOUND))
>     pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
>   else
>     pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
>     ^^^                           ^^^^ NODE is -1
>       \ pwq is NULL
> 
>   if (last_pool && last_pool != pwq->pool) { <--- BOOM

So I think 874bbfe600a6 is really bogus. It should be reverted. We
already have a proper fix for vmstat 176bed1de5bf ("vmstat: explicitly
schedule per-cpu work on the CPU we need it to run on"). This which
should be used for the stable trees as a replacement.

-- 
Michal Hocko
SUSE Labs