lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 10 Oct 2022 12:40:02 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Connor O'Brien <connoro@...gle.com>, linux-kernel@...r.kernel.org
Cc:     kernel-team@...roid.com, John Stultz <jstultz@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Qais Yousef <qais.yousef@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Connor O'Brien <connoro@...gle.com>
Subject: Re: [RFC PATCH 09/11] sched/rt: Fix proxy/current (push,pull)ability

On 03/10/22 21:44, Connor O'Brien wrote:
> From: Valentin Schneider <valentin.schneider@....com>

This was one of my attempts at fixing RT load balancing (the BUG_ON in
pick_next_pushable_task() was quite easy to trigger), but I ended up
convincing myself this was insufficient - this only "tags" the donor and
the proxy, the entire blocked chain needs tagging. Hopefully not all of
what I'm about to write is nonsense, some of the neurons I need for this
haven't been used in a while - to be taken with a grain of salt.

Consider pick_highest_pushable_task() - we don't want any task in a blocked
chain to be pickable. There's no point in migrating it, we'll just hit
schedule()->proxy(), follow p->blocked_on and most likely move it back to
where the rest of the chain is. This applies any sort of balancing (CFS,
RT, DL).

ATM I think PE breaks the "run the N highest priority task on our N CPUs"
policy. Consider:

   p0 (FIFO42)
    |
    | blocked_on
    v
   p1 (FIFO41)
    |
    | blocked_on
    v
   p2 (FIFO40)

  Add on top p3 an unrelated FIFO1 task, and p4 an unrelated CFS task.

  CPU0
  current:  p0
  proxy:    p2
  enqueued: p0, p1, p2, p3

  CPU1
  current:  p4
  proxy:    p4
  enqueued: p4


pick_next_pushable_task() on CPU0 would pick p1 as the next highest
priority task to push away to e.g. CPU1, but that would be undone as soon
as proxy() happens on CPU1: we'd notice the CPU boundary and punt it back
to CPU0. What we would want here is to pick p3 instead to have it run on
CPU1.

I *think* we want only the proxy of an entire blocked-chain to be visible
to load-balance, unfortunately PE gathers the blocked-chain onto the
donor's CPU which kinda undoes that.

Having the blocked tasks remain in the rq is very handy as it directly
gives us the scheduling context and we can unwind the blocked chain for the
execution context, but it does wreak havock in load-balancing :/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ