lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyBhtTDWw_xZ28_+CguhVx=x7pds0dZVkUT7YqjkjUdbNQ@mail.gmail.com>
Date:   Tue, 15 Dec 2020 17:46:23 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Hillf Danton <hdanton@...a.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Qian Cai <cai@...hat.com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH 00/10] workqueue: break affinity initiatively

On Tue, Dec 15, 2020 at 4:49 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Dec 15, 2020 at 04:14:26PM +0800, Lai Jiangshan wrote:
> > On Tue, Dec 15, 2020 at 3:50 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Tue, Dec 15, 2020 at 01:44:53PM +0800, Lai Jiangshan wrote:
> > > > I don't know how the scheduler distinguishes all these
> > > > different cases under the "new assumption".
> > >
> > > The special case is:
> > >
> > >   (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1
> > >
> > >
> >
> > So unbound per-node workers can possibly match this test. So there is code
> > needed to handle for unbound workers/pools which is done by this patchset.
>
> Curious; how could a per-node worker match this? Only if the node is a
> single CPU, or otherwise too?

We have /sys/devices/virtual/workqueue/cpumask which can be read/written
to access to wq_unbound_cpumask.

A per-node worker's cpumask is wq_unbound_cpumask&possible_cpumask_of_the_node.
Since wq_unbound_cpumask can be changed by system adim, so a per-node
worker's cpumask is possible to be single CPU.

wq_unbound_cpumask is used when a system adim wants to isolate some
CPUs from unbound workqueques.  But I think it is rare case when the
admin causes a per-node worker's cpumask to be single CPU.

Even it is a rare case, we have to handle it.

>
> > Is this the code of is_per_cpu_kthread()? I think I should have also
> > used this function in workqueue and don't break affinity for unbound
> > workers have more than 1 cpu.
>
> Yes, that function captures it. If you want to use it, feel free to move
> it to include/linux/sched.h.

I will.  "single CPU" for unbound workers/pools is the rare case
and enough to bring the code to break affinity for unbound workers.
If we optimize for the common cases (multiple CPUs for unbound workers),
the optimization seems like additional code works only in the slow
path (hotunplug).

I will try it and see if it is worth.

>
> This class of threads is 'special', since it needs to violate the
> regular hotplug rules, and migrate_disable() made it just this little
> bit more special. It basically comes down to how we need certain per-cpu
> kthreads to run on a CPU while it's brought up, before userspace is
> allowed on, and similarly they need to run on the CPU after userspace is
> no longer allowed on in order to bring it down.
>
> (IOW, they must be allowed to violate the active mask)
>
> Due to migrate_disable() we had to move the migration code from the very
> last cpu-down stage, to earlier. This in turn brought the expectation
> (which is normally met) that per-cpu kthreads will stop/park or
> otherwise make themselves scarce when the CPU goes down. We can no
> longer force migrate them.

Thanks for explaining the rationale.

>
> Workqueues are the sole exception to that, they've got some really
> 'dodgy' hotplug behaviour.
>

Indeed.  No one want to wait for workqueue when hotunplug, so we have
to do something after the fact.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ