lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Jul 2022 13:38:49 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Valentin Schneider <vschneid@...hat.com>
Cc:     Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()

On Wed, Jul 27, 2022 at 4:36 AM Valentin Schneider <vschneid@...hat.com> wrote:
>
> On 26/07/22 07:30, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
> >> Hm so my choice of words in the changelog wasn't great - "initial setup"
> >> can be kernel init, but *also* setup of whatever workload is being deployed
> >> onto the system.
> >>
> >> So you can be having "normal" background activity (I've seen some IRQs end
> >> up with schedule_work() on isolated CPUs, they're not moved away at boot
> >> time but rather shortly before launching the latency-sensitive app), some
> >> preliminary stats collection / setup to make sure the CPU will be quiet
> >> (e.g. refresh_vm_stats()), and *then* the application starts with
> >> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.
> >
> > Ah, I see. I guess we'll need to figure out how to unbind the workers then.
> >
>
> I've been playing with different ways to unbind & wake the workers in a
> sleepable context, but so far I haven't been happy with any of my
> experiments.


I'm writing code to handle the problems of cpu affinity and prematurely
waking up of newly created worker.

This work of unbinding the dying worker is also on the list.
I haven't figured out a good solution.

I was planning to add set_cpus_allowed_ptr_off_rq() which only set
cpumasks to the task only if it is sleeping and returns -EBUSY otherwise.
And it is ensured and documented as being usable in an atomic context
and it is recommended to be used for dying tasks only.

I can't really ensure it would be implemented as I'm expecting since
it touches scheduler code.

I'd better back off.

>
> What hasn't changed much between my attempts is transferring to-be-destroyed
> kworkers from their pool->idle_list to a reaper_list which is walked by
> *something* that does unbind+wakeup. AFAIA as long as the kworker is off
> the pool->idle_list we can play with it (i.e. unbind+wake) off the
> pool->lock.
>
> It's the *something* that's annoying to get right, I don't want it to be
> overly complicated given most users are probably not impacted by what I'm
> trying to fix, but I'm getting the feeling it should still be a per-pool
> kthread. I toyed with a single reaper kthread but a central synchronization
> for all the pools feels like a stupid overhead.

I think fixing it in the workqueue.c is complicated.

Nevertheless, I will also try to fix it inside workqueue only to see
what will come up.

>
> If any of that sounds ludicrous please shout, otherwise I'm going to keep
> tinkering :)
>
> > Thanks.
> >
> > --
> > tejun
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ