lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 15 Oct 2022 17:28:36 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Connor O'Brien <connoro@...gle.com>, linux-kernel@...r.kernel.org,
        kernel-team@...roid.com, John Stultz <jstultz@...gle.com>,
        Qais Yousef <qais.yousef@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>, youssefesmat@...gle.com
Subject: Re: [RFC PATCH 07/11] sched: Add proxy execution

On Wed, Oct 12, 2022 at 01:54:26AM +0000, Joel Fernandes wrote:

> > +migrate_task:
> > +	/*
> > +	 * The blocked-on relation must not cross CPUs, if this happens
> > +	 * migrate @p to the @owner's CPU.
> > +	 *
> > +	 * This is because we must respect the CPU affinity of execution
> > +	 * contexts (@owner) but we can ignore affinity for scheduling
> > +	 * contexts (@p). So we have to move scheduling contexts towards
> > +	 * potential execution contexts.
> > +	 *
> > +	 * XXX [juril] what if @p is not the highest prio task once migrated
> > +	 * to @owner's CPU?
> 
> Then that sounds like the right thing is happening, and @p will not proxy()
> to @owner. Why does @p need to be highest prio?

So all this is a head-ache and definitely introduces some inversion
cases -- doubly so when combined with lovely things like
migrate_disable().

But even aside from the obvious affinity related pain; there is another
issue. Per:

--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1548,7 +1548,8 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)

	enqueue_rt_entity(rt_se, flags);

-       if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+       if (!task_current(rq, p) && p->nr_cpus_allowed > 1 &&
+           !task_is_blocked(p))
		enqueue_pushable_task(rq, p);
}

blocked entries are NOT put on the pushable list -- this means that the
normal mitigation for resolving a priority inversion like described
above (having both P_max and P_max-1 on the same CPU) no longer works.
That is, normally we would resolve the situation by pushing P_max-1 to
another CPU. But not with PE as it currently stands.

The reason we remove blocked entries from the pushable list is because
must migrate them towards the execution context (and respect the
execution context's affinity constraints).


Basically the whole push-pull RT balancer scheme is broken vs PE, and
that is a bit of a head-ache :/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ