linux-kernel - Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241119081740.GB11903@noisy.programming.kicks-ass.net>
Date: Tue, 19 Nov 2024 09:17:40 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Changwoo Min <changwoo@...lia.com>
Cc: tj@...nel.org, void@...ifault.com, mingo@...hat.com,
	kernel-dev@...lia.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq_clock

On Tue, Nov 19, 2024 at 10:19:44AM +0900, Changwoo Min wrote:

> > What's the purpose of that flag? Why can't BPF use sched_clock_local()
> > and call it a day?
> 
> Let's suppose the following timeline:
> 
>   T1. rq_lock(rq)
>   T2. update_rq_clock(rq)
>   T3. a sched_ext BPF operation
>   T4. rq_unlock(rq)
>   T5. a sched_ext BPF operation
>   T6. rq_lock(rq)
>   T7. update_rq_clock(rq)
> 
> For [T2, T4), we consider that rq clock is valid
> (SCX_RQ_CLK_UPDATED is set), so scx_bpf_clock_get_ns calls during
> [T2, T4) (including T3) will return the rq clock updated at T2.
> Let's think about what we should do for the duration [T4, T7)
> when a BPF scheduler can still call scx_bpf_clock_get_ns (T5).
> During that duration, we consider the rq clock is invalid
> (SCX_RQ_CLK_UPDATED is unset). So when calling
> scx_bpf_clock_get_ns at T5, we call sched_clock() to get the
> fresh clock.
> 
> I think the term `UPDATED` was misleading. I will change it to
> `VALID` in the next version.

So the reason rq->clock is tied to rq->lock, is to ensure a scheduling
operation happens at a single point in time.

Suppose re-nice, you dequeue the task, you modify its properties
(weight) and then you requeue it. If time were passing 'normally' the
task would loose the time between dequeue and enqueue -- this is not
right.

The only obvious exception here is a migration.

So the question then becomes, what is T5 doing and is it 'right' for it
to get a fresh clock value.

Please give an example of T5 -- I really don't know this BPF crap much
-- and reason about how the clock should behave.