linux-kernel - Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6e9a6084-2e78-4139-8c61-4d2f2ccd068e@igalia.com>
Date: Wed, 27 Nov 2024 09:41:26 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: tj@...nel.org, void@...ifault.com, mingo@...hat.com,
 kernel-dev@...lia.com, linux-kernel@...r.kernel.org,
 Changwoo Min <changwoo@...lia.com>
Subject: Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq_clock

Hello,

On 24. 11. 20. 00:57, Changwoo Min wrote:
> Hello,
> 
> On 24. 11. 19. 17:17, Peter Zijlstra wrote:
>> On Tue, Nov 19, 2024 at 10:19:44AM +0900, Changwoo Min wrote:
> 
>>> Let's suppose the following timeline:
>>>
>>>    T1. rq_lock(rq)
>>>    T2. update_rq_clock(rq)
>>>    T3. a sched_ext BPF operation
>>>    T4. rq_unlock(rq)
>>>    T5. a sched_ext BPF operation
>>>    T6. rq_lock(rq)
>>>    T7. update_rq_clock(rq)
>>>
>>> For [T2, T4), we consider that rq clock is valid
>>> (SCX_RQ_CLK_UPDATED is set), so scx_bpf_clock_get_ns calls during
>>> [T2, T4) (including T3) will return the rq clock updated at T2.
>>> Let's think about what we should do for the duration [T4, T7)
>>> when a BPF scheduler can still call scx_bpf_clock_get_ns (T5).
>>> During that duration, we consider the rq clock is invalid
>>> (SCX_RQ_CLK_UPDATED is unset). So when calling
>>> scx_bpf_clock_get_ns at T5, we call sched_clock() to get the
>>> fresh clock.
> 
>> So the question then becomes, what is T5 doing and is it 'right' for it
>> to get a fresh clock value.
>>
>> Please give an example of T5 -- I really don't know this BPF crap much
>> -- and reason about how the clock should behave.
> 
> Here is one example. `scx_central` uses a BPF timer for
> preemptive scheduling. In every msec, the timer callback checks
> if the currently running tasks exceed their timeslice. At the
> beginning of the BPF timer callback (central_timerfn in
> scx_central.bpf.c), scx_central gets the current time. When the
> BPF timer callback runs, the rq clock could be invalid, the same
> as T5. In this case, it is reasonable to return a fresh clock
> value rather than returning the old one (T2).
> 
> Besides this timer example, scx_bpf_clock_get_ns() can be called
> any callbacks defined in `struct sched_ext_ops`. Some callbacks
> can be called without holding a rq lock (e.g., ops.cpu_online,
> ops.cgroup_init). In these cases, it is reasonable to reutrn a
> fresh clock value rather returning the old one.

I wonder if the above example is sufficient for you. If you need
more examples or clarification, please let me know.


Regarding the my following my comment in the previous email, ...

On 24. 11. 19. 10:19, Changwoo Min wrote:
>> Growing sched_clock_data shouldn't be a problem, it's only 24 bytes,
>> so we have plenty free bytes there.
> 
> Alright. I will change the current implementation and extend
> `struct sched_clock_data` to store the `VALID` flag in the next
> version.


I found `struct sched_clock_data` is defined only when
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, so I think extending
`struct sched_clock_data` is not an approach approach. Extending
`struct scx_rq` seems the best option after opting out
sched_clock_data. I will make sure the cached clock value and
flag in the scx_rq are in the same cache line to minimize the
cache misses. What do you think?

Thanks!
Changwoo Min