lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 3 Feb 2016 14:28:10 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Michal Hocko <mhocko@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
	Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
	Ben Hutchings <ben@...adent.org.uk>,
	Sasha Levin <sasha.levin@...cle.com>, Shaohua Li <shli@...com>,
	Daniel Bilik <daniel.bilik@...system.cz>
Subject: Re: [PATCH wq/for-4.5-fixes] workqueue: handle NUMA_NO_NODE for
 unbound pool_workqueue lookup

Hello,

On Wed, Feb 03, 2016 at 08:12:19PM +0100, Thomas Gleixner wrote:
> > Signed-off-by: Tejun Heo <tj@...nel.org>
> > Reported-by: Mike Galbraith <umgwanakikbuti@...il.com>
> > Cc: Tang Chen <tangchen@...fujitsu.com>
> > Cc: Rafael J. Wysocki <rafael@...nel.org>
> > Cc: Len Brown <len.brown@...el.com>
> > Cc: stable@...r.kernel.org # v4.3+
> 
> 4.3+ ? Hasn't 874bbfe600a6 been backported to older stable kernels?
> 
> Adding a 'Fixes: 874bbfe600a6 ...' tag is what you really want here.

Oops, you're right.  Will add that once Mike confirms the fix.

> > @@ -570,6 +570,16 @@ static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq,
> >  						  int node)
> >  {
> >  	assert_rcu_or_wq_mutex_or_pool_mutex(wq);
> > +
> > +	/*
> > +	 * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a
> > +	 * delayed item is pending.  The plan is to keep CPU -> NODE
> > +	 * mapping valid and stable across CPU on/offlines.  Once that
> > +	 * happens, this workaround can be removed.
> 
> So what happens if the complete node is offline?

pool_workqueue lookup itself should be fine as dfl_pwq is assigned to
all nodes by default.  When the node comes back online, things can
break currently because cpu to node mapping may change.  That's what
Tang has been working on.  It's a bigger problem throughout the memory
allocation path tho because there's no synchronization around cpu ->
node mapping.  Hopefully, the pending patchset can get through sooner
than later.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ