[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7c21c88d-6ce3-ae1a-d257-3042efa1bddf@linux.vnet.ibm.com>
Date:   Thu, 27 Jul 2017 15:15:47 -0500
From:   Michael Bringmann <mwb@...ux.vnet.ibm.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>,
        linux-kernel@...r.kernel.org, nfont@...ux.vnet.ibm.com
Subject: Re: [PATCH v6] workqueue: Fix edge cases for calc of pool's cpumask
Revised text:
There is an underlying assumption in many layers / modules of the Linux
system that CPU <-> node mapping is static.  This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory.  PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.
Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules.  In systems which can change 
this mapping during 'hotplug', like PowerPC, the information is changing 
underneath all layers that might reference it.
This patch attempts to ensure that a valid, usable cpumask attribute is
used by the workqueue infrastructure when setting up new resource pools.
It prevents a crash that has been observed when an 'empty' cpumask is
passed along to the worker/task scheduling code.  It is intended as an
intermediate fix until a more fundamental review and correction of the
issue can be done.
On 07/27/2017 02:24 PM, Tejun Heo wrote:
> Hello, Michael.
> 
> On Thu, Jul 27, 2017 at 02:07:53PM -0500, Michael Bringmann wrote:
>> The problem lies with the ordering of events with respect to the order in
>> which we add (or remove) CPUs to NUMA systems, and make use of that knowledge.
> 
> Isn't the root cause that the upper layers including workqueue expect
> cpu <-> node mapping to be static but powerpc doesn't follow that?  I
> don't get why ordering matters here.
> 
>> The CPUs present are assigned to nodes, and workqueues and their infrastructure
>> are created to use the CPUs in a node.  Workqueues are created at boot time
>> and updated or created as CPUs are added or removed.  However, there is little
>> or no synchronization or ordering of these events, and the data structures
> 
> What I meant was that there's no synchronization construct protection
> cpu <-> node mapping.  If arch code changes it during hot plug, it's
> changing it underneath anybody who might be using that association.
> 
>> mapping CPUs to nodes may not be updated before the workqueue infrastructure
>> is built for a node.  Thus we have the possibility of an invalid CPU mask
>> attribute being attached to a newly created workqueue before the CPUs have
>> been properly registered and published to a node.
>>
>> This patch attempts to provide a partial ordering of events within workqueue
>> by delaying the use of newly calculated CPU masks as the value for a workqueue
>> attribute until they have valid content.  Instead the workqueue code must delay
>> creating new workqueues until this function succeeds, or it can use a previously
>> calculated cpumask attribute that is known to be valid.
>>
>> This patch attempts to ensure that a valid, usable cpumask is used to set up
>> newly created pools for workqueues.  This patch provides a fix for NUMA systems
>> which can add/subtract processors dynamically.  The patch is expected to be an
>> intermediate one while developers find and correct any underlying issues.
> 
> And what this patch does is adding a bandaid so that we at least don't
> crash immediately when this condition triggers until the arch code can
> be fixed properly.
> 
> Thanks.
> 
-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@...ux.vnet.ibm.com
Powered by blists - more mailing lists
 
