linux-kernel - Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231108023702.GA2992223@google.com>
Date:   Wed, 8 Nov 2023 02:37:02 +0000
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Daniel Bristot de Oliveira <bristot@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v5 6/7] sched/deadline: Deferrable dl server

On Tue, Nov 07, 2023 at 11:47:32AM -0500, Steven Rostedt wrote:
> On Mon, 6 Nov 2023 16:37:32 -0500
> Joel Fernandes <joel@...lfernandes.org> wrote:
> 
> > Say CFS-server runtime is 0.3s and period is 1s.
> > 
> > At 0.7s, 0-laxity timer fires. CFS runs for 0.29s, then sleeps for
> > 0.005s and wakes up at 0.295s (its remaining runtime is 0.01s at this
> > point which is < the "time till deadline" of 0.005s)
> > 
> > Now the runtime of the CFS-server will be replenished to the full 0.3s
> > (due to CBS) and the deadline
> > pushed out.
> > 
> > The end result is, the total runtime that the CFS-server actually gets
> > is 0.595s (though yes it did sleep for 5ms in between, still that's
> > tiny -- say if it briefly blocked on a kernel mutex). That's almost
> > double the allocated runtime.
> > 
> > This is just theoretical and I have yet to see if it is actually an
> > issue in practice.
> 
> Let me see if I understand what you are asking. By pushing the execution of
> the CFS-server to the end of its period, if it it was briefly blocked and
> was not able to consume all of its zerolax time, its bandwidth gets
> refreshed. Then it can run again, basically doubling its total time.

I think my assumption about what happens during blocking was wrong. If it
blocked, the server is actually stopped via dl_server_stop() and it starts
all over again on enqueue.

That makes me worry about the opposite issue now. If the server restarts
because it blocked briefly, that means again it starts in a throttled state
and has to wait to run till zero-lax time. If CFS is a 99% load but blocks
very briefly after getting to run a little bit (totalling 1% of the time),
then it wont get 30% because it will keep getting delayed to the new 0-lax
every time it wakes up from its very-brief nap. Is that really Ok?

> But this is basically saying that it ran for its runtime at the start of
> one period and at the beginning of another, right?

I am not sure if this can happen but I could be missing something. AFAICS,
there is no scenario where the DL server gets to run at the start of a new
period unless RT is not running. The way the patch is written AFAICS,
whenever the DL-server runs out of runtime, it gets throttled and a timer
fires to go off at the beginning of the next period.
(update_curr_dl_se() -> dl_runtime_exceeded() -> start_dl_timer()).

In this timer handler (which fired at next period beginning), it will
actually replenish_dl_entity() to refresh the runtime and push the period
forward. Then it will throttle the server till the 0-lax time. That  means we
always end up running at the 0-lax time when starting a new period if RT is
running, and never at the beginning. Did I miss something?

On the other hand, if it does not run out of runtime, it will keep running
within its 0-lax time. We know there is enough time within its 0-lax time for
it to run because when we unthrottled it, we checked for that.

Switching gears, another (most likely theoretical) concern I had is what if
the 0-lax timer interrupt gets delayed a little bit. Then we will always end
up not having enough 0-lax time and keep requeuing the timer, that means CFS
will be starved always as we keep pushing the execution to the next period's
0-lax time.

Anyway, I guess I better get to testing this stuff tomorrow and day after on
ChromeOS before LPC starts. Personally I feel this is a great first cut and
hope we can get v5 into mainline and iteratively improve. :)

thanks,

 - Joel