linux-kernel - Re: [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <871q8v7otl.fsf@oracle.com>
Date: Wed, 28 Feb 2024 22:43:34 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, peterz@...radead.org,
        torvalds@...ux-foundation.org, paulmck@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jpoimboe@...nel.org, mark.rutland@....com, jgross@...e.com,
        andrew.cooper3@...rix.com, bristot@...nel.org,
        mathieu.desnoyers@...icios.com, geert@...ux-m68k.org,
        glaubitz@...sik.fu-berlin.de, anton.ivanov@...bridgegreys.com,
        mattst88@...il.com, krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...lab.com, richard@....at, mjguzik@...il.com,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption


Juri Lelli <juri.lelli@...hat.com> writes:

> Hi Ankur,
>
> On 12/02/24 21:55, Ankur Arora wrote:
>> The default policy for lazy scheduling is to schedule in exit-to-user,
>> assuming that would happen within the remaining time quanta of the
>> task.
>>
>> However, that runs into the 'hog' problem -- the target task might
>> be running in the kernel and might not relinquish CPU on its own.
>>
>> Handle that by upgrading the ignored tif_resched(NR_lazy) bit to
>> tif_resched(NR_now) at the next tick.
>>
>> Cc: Ingo Molnar <mingo@...hat.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Juri Lelli <juri.lelli@...hat.com>
>> Cc: Vincent Guittot <vincent.guittot@...aro.org>
>> Originally-by: Thomas Gleixner <tglx@...utronix.de>
>> Link: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
>> Signed-off-by: Ankur Arora <ankur.a.arora@...cle.com>
>>
>> ---
>> Note:
>>   Instead of special casing the tick, it might be simpler to always
>>   do the upgrade on the second resched_curr().
>>
>>   The theoretical problem with doing that is that the current
>>   approach deterministically provides a well-defined extra unit of
>>   time. Going with a second resched_curr() might mean that the
>>   amount of extra time the task gets depends on the vagaries of
>>   the incoming resched_curr() (which is fine if it's mostly from
>>   the tick; not fine if we could get it due to other reasons.)
>>
>>   Practically, both performed equally well in my tests.
>>
>>   Thoughts?
>
> I'm still digesting the series, so I could simply be confused, but I
> have the impression that the extra unit of time might be a problem for
> deadline (and maybe rt as well?).
>
> For deadline we call resched_curr_tick() from the throttle part of
> update_curr_dl_se() if the dl_se happens to not be the leftmost anymore,
> so in this case I believe we really want to reschedule straight away and
> not wait for the second time around (otherwise we might be breaking the
> new leftmost tasks guarantees)?

Yes, agreed, this looks like it breaks the deadline invariant for both
preempt=none and preempt=voluntary.

For RT, update_curr_rt() seems to have a similar problem if the task
doesn't have RUNTIME_INF.

Relatedly, do you think there's a similar problem when switching to
a task with a higher scheduling class?
(Related to code is in patch 25, 26.)

For preempt=voluntary, wakeup_preempt() will do the right thing, but
for preempt=none, we only reschedule lazily so the target might
continue to run until the end of the tick.

Thanks for the review, btw.

--
ankur