[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170824135122.GM491396@devbig577.frc2.facebook.com>
Date: Thu, 24 Aug 2017 06:51:23 -0700
From: Tejun Heo <tj@...nel.org>
To: Laurent Vivier <lvivier@...hat.com>
Cc: Michael Ellerman <mpe@...erman.id.au>,
linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>,
Lai Jiangshan <jiangshanlai@...il.com>,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH 1/2] powerpc/workqueue: update list of possible CPUs
Hello, Laurent.
On Thu, Aug 24, 2017 at 02:10:31PM +0200, Laurent Vivier wrote:
> > Yeah, it just needs to match up new cpus to the cpu ids assigned to
> > the right node.
>
> We are not able to assign the cpu ids to the right node before the CPU
> is present, because firmware doesn't provide CPU mapping <-> node id
> before that.
What I meant was to assign the matching CPU ID when the CPU becomes
present - ie. have CPU IDs available for different nodes and allocate
them to the new CPU according to its node mapping when it actually
comes up. Please note that I'm not saying this is the way to go, just
that it is a solvable problem from the arch code.
> > The node mapping for that cpu id changes *dynamically* while the
> > system is running and that can race with node-affinity sensitive
> > operations such as memory allocations.
>
> Memory is mapped to the node through its own firmware entry, so I don't
> think cpu id change can affect memory affinity, and before we know the
> node id of the CPU, the CPU is not present and thus it can't use memory.
The latter part isn't true. For example, percpu memory gets alloacted
for all possible CPUs according to their node affinity, so the memory
node association change which happens when the CPU comes up for the
first time can race against such allocations. I don't know whether
that's actually problematic but we don't have *any* synchronization
around it. If you think it's safe to have such races, please explain
why that is.
> > Please take a step back and think through the problem again. You
> > can't bandaid it this way.
>
> Could you give some ideas, proposals?
> As the firmware doesn't provide the information before the CPU is really
> plugged, I really don't know how to manage this problem.
There are two possible approaches, I think.
1. Make physical cpu -> logical cpu mapping indirect so that the
kernel's cpu ID assignment is always on the right numa node. This
may mean that the kernel might have to keep more possible CPUs
around than necessary but it does have the benefit that all memory
allocations are affine to the right node.
2. Make cpu <-> node mapping properly dynamic. Identify what sort of
synchronization we'd need around the mapping changing dynamically.
Note that we might not need much but it'll most likely need some.
Build synchronization and notification infrastructure around it.
Thanks.
--
tejun
Powered by blists - more mailing lists