lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1454651069.3545.41.camel@gmail.com>
Date:	Fri, 05 Feb 2016 06:44:29 +0100
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Tejun Heo <tj@...nel.org>, Michal Hocko <mhocko@...nel.org>
Cc:	Jiri Slaby <jslaby@...e.cz>, Thomas Gleixner <tglx@...utronix.de>,
	Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
	Ben Hutchings <ben@...adent.org.uk>,
	Sasha Levin <sasha.levin@...cle.com>, Shaohua Li <shli@...com>,
	LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
	Daniel Bilik <daniel.bilik@...system.cz>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25

On Wed, 2016-02-03 at 11:24 -0500, Tejun Heo wrote:
> On Wed, Feb 03, 2016 at 01:28:56PM +0100, Michal Hocko wrote:
> > > The CPU was 168, and that one was offlined in the meantime. So
> > > __queue_work fails at:
> > >   if (!(wq->flags & WQ_UNBOUND))
> > >     pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
> > >   else
> > >     pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
> > >     ^^^                           ^^^^ NODE is -1
> > >       \ pwq is NULL
> > > 
> > >   if (last_pool && last_pool != pwq->pool) { <--- BOOM
> 
> So, the proper fix here is keeping cpu <-> node mapping stable across
> cpu on/offlining which has been being worked on for a long time now.
> The patchst is pending and it fixes other issues too.
> 
> > So I think 874bbfe600a6 is really bogus. It should be reverted. We
> > already have a proper fix for vmstat 176bed1de5bf ("vmstat:
> > explicitly
> > schedule per-cpu work on the CPU we need it to run on"). This which
> > should be used for the stable trees as a replacement.
> 
> It's not bogus.  We can't flip a property that has been guaranteed
> without any provision for verification.  Why do you think vmstat blow
> up in the first place?  vmstat would be the canary case as it runs
> frequently on all systems.  It's exactly the sign that we can't break
> this guarantee willy-nilly.

If the intent of the below is to fulfill a guarantee...

+       /* timer isn't guaranteed to run in this cpu, record earlier */
+       if (cpu == WORK_CPU_UNBOUND)
+               cpu = raw_smp_processor_id();
        dwork->cpu = cpu;
        timer->expires = jiffies + delay;
  
-       if (unlikely(cpu != WORK_CPU_UNBOUND))
-               add_timer_on(timer, cpu);
-       else
-               add_timer(timer);
+       add_timer_on(timer, cpu);

...it appears to be incomplete.  Hotplug aside, when adding a timer
with the expectation that it stay put, should it not also be pinned?

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ