lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170727192410.GI742618@devbig577.frc2.facebook.com>
Date:   Thu, 27 Jul 2017 15:24:10 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Michael Bringmann <mwb@...ux.vnet.ibm.com>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>,
        linux-kernel@...r.kernel.org, nfont@...ux.vnet.ibm.com
Subject: Re: [PATCH v6] workqueue: Fix edge cases for calc of pool's cpumask

Hello, Michael.

On Thu, Jul 27, 2017 at 02:07:53PM -0500, Michael Bringmann wrote:
> The problem lies with the ordering of events with respect to the order in
> which we add (or remove) CPUs to NUMA systems, and make use of that knowledge.

Isn't the root cause that the upper layers including workqueue expect
cpu <-> node mapping to be static but powerpc doesn't follow that?  I
don't get why ordering matters here.

> The CPUs present are assigned to nodes, and workqueues and their infrastructure
> are created to use the CPUs in a node.  Workqueues are created at boot time
> and updated or created as CPUs are added or removed.  However, there is little
> or no synchronization or ordering of these events, and the data structures

What I meant was that there's no synchronization construct protection
cpu <-> node mapping.  If arch code changes it during hot plug, it's
changing it underneath anybody who might be using that association.

> mapping CPUs to nodes may not be updated before the workqueue infrastructure
> is built for a node.  Thus we have the possibility of an invalid CPU mask
> attribute being attached to a newly created workqueue before the CPUs have
> been properly registered and published to a node.
> 
> This patch attempts to provide a partial ordering of events within workqueue
> by delaying the use of newly calculated CPU masks as the value for a workqueue
> attribute until they have valid content.  Instead the workqueue code must delay
> creating new workqueues until this function succeeds, or it can use a previously
> calculated cpumask attribute that is known to be valid.
> 
> This patch attempts to ensure that a valid, usable cpumask is used to set up
> newly created pools for workqueues.  This patch provides a fix for NUMA systems
> which can add/subtract processors dynamically.  The patch is expected to be an
> intermediate one while developers find and correct any underlying issues.

And what this patch does is adding a bandaid so that we at least don't
crash immediately when this condition triggers until the arch code can
be fixed properly.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ