lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhjblbx7glh.mognet@arm.com>
Date:   Fri, 05 Mar 2021 15:41:46 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Qais Yousef <qais.yousef@....com>
Cc:     tglx@...utronix.de, mingo@...nel.org, linux-kernel@...r.kernel.org,
        bigeasy@...utronix.de, swood@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vincent.donnefort@....com, tj@...nel.org,
        ouwen210@...mail.com
Subject: Re: [PATCH v4 15/19] sched: Fix migrate_disable() vs rt/dl balancing

On 05/03/21 15:56, Peter Zijlstra wrote:
> On Sat, Dec 26, 2020 at 01:54:45PM +0000, Qais Yousef wrote:
>>
>> > +static inline struct task_struct *get_push_task(struct rq *rq)
>> > +{
>> > +	struct task_struct *p = rq->curr;
>>
>> Shouldn't we verify the class of the task here? The RT task in migration
>> disabled could have been preempted by a dl or stopper task. Similarly, the dl
>> task could have been preempted by a stopper task.
>>
>> I don't think an RT task should be allowed to push a dl task under any
>> circumstances?
>
> Hmm, quite. Fancy doing a patch?

Last time we talked about this, I looked into

  push_rt_task() + find_lowest_rq()

IIRC, with how

  find_lowest_rq() + cpupri_find_fitness()

currently work, find_lowest_rq() should return -1 in push_rt_task() if
rq->curr is DL (CPUPRI_INVALID). IOW, Migration-Disabled RT tasks shouldn't
actually interfere with DL tasks (unless a DL task gets scheduled after we
drop the rq lock and kick the stopper, but we have that problem everywhere
including CFS active balance).


Now, for some blabbering. Re SMP invariant; wouldn't we actually want this
to happen? Consider:

  MD := Migration-Disabled.

  rq
           DL
           RT3
           RT2 (MD)   RT1

  current  DL         RT1        idle
           CPU0       CPU1       CPU2

If we were to ignore MD, the best spread for this would be something
like:

  rq
                                 RT1
           DL         RT3        RT2

  current  DL         RT3        RT2
           CPU0       CPU1       CPU2

Now, with Migration-Disabled we can't move RT2 to CPU2 - it has to stay
on CPU0 for as long as it is Migration-Disabled. Thus, a possible spread
would be:

  rq
           RT1
           RT2 (MD)   DL         RT3

  current  RT2        DL         RT3
           CPU0       CPU1       CPU

If you look closely, this is exactly the same as the previous spread
modulo CPU numbers. IOW, this is (again) a CPU renumbering exercise.

To respect the aforementioned scheduling invariant, we've had to move
that DL task, and while it does add interference, it's similar as to why we
push higher RT priority tasks to make room for lower RT priority, migration
disabled tasks. You get interference caused by a lower-priority entity for
the sake of your SMP scheduling invariant.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ