linux-kernel - Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X/hGHNGB9fltElWB@hirez.programming.kicks-ass.net>
Date:   Fri, 8 Jan 2021 12:46:36 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        Valentin Schneider <valentin.schneider@....com>,
        Qian Cai <cai@...hat.com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Dexuan Cui <decui@...rosoft.com>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Paul McKenney <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively

On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
> 
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.

So I've been looking at this the past day or so, and the more I look,
the more I think commit:

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

is a real problem and we need to revert it (at least for now).

Let me attempt a brain dump:

 - the assumption that per-cpu kernel threads are 'well behaved' on
   hot-plug has, I think, been proven incorrect, it's far worse than
   just bounded workqueue. Therefore, it makes sense to provide the old
   semantics.

 - making the current code provide the old semantics (forcing affinity
   on per-cpu kernel threads) is tricky, but could probably be done:

    * we need to disallow new per-cpu kthreads while going down
    * we need to force push more agressive; basically when
      rcuwait_active(rq->hotplug_wait) push everything except that task,
      irrespective of is_per_cpu_kthread()
    * we need to disallow wakeups of anything not the hotplug thread or
      stop-machine from happening from the rcuwait_wait_event()

   and I have patches for most of that... except they're adding more
   complexity than 1cf12e08bc4d ever deleted.

However, even with all that, there's a further problem...

Fundamentally, waiting for !rq_has_pinned_tasks() so late in
hot-un-plug, is wrong I think. It means that migrate_disable() code
might encounter a mostly torn down CPU. This is OK-ish for per-cpu
kernel threads [*], but is now exposed to any random odd kernel code
that does migrate_disable().

[*] arguably running 'work' this late is similarly problematic.

Let me go do lunch and ponder this further..