linux-kernel - Re: [PATCH v4 4/6] sched_ext: Implement scx_bpf_now

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f7924c58-dfaa-424c-9dd4-cec9a50137c9@igalia.com>
Date: Fri, 13 Dec 2024 10:41:55 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, mingo@...hat.com, peterz@...radead.org,
 kernel-dev@...lia.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 4/6] sched_ext: Implement scx_bpf_now_ns()

Hello,

On 24. 12. 11. 17:14, Tejun Heo wrote:
> Hello,
> 
> I'd roll the preceding two patches into this one.
Sure. I will merge patches 2, 3, 4 into one.

> On Mon, Dec 09, 2024 at 03:15:29PM +0900, Changwoo Min wrote:
> ...
>> 3) Monotonically non-decreasing clock for the same CPU: scx_bpf_now_ns()
>>   guarantees the clock never goes backward when comparing them in the same
>>   CPU. On the other hand, when comparing clocks in different CPUs, there
>>   is no such guarantee -- the clock can go backward. It provides a
>>   monotonically *non-decreasing* clock so that it would provide the same
>>   clock values in two different scx_bpf_now_ns() calls in the same CPU
>>   during the same period of when the rq clock is valid.
> 
> We probably should provide helpers to calculate deltas between timestamps
> and use them consitently in SCX scheds. e.g. ops.runnable() and
> ops.running() can run on different CPUs and it'd be useful and common to
> calculate the delta between the two points in time.

If I understand correctly, it should be something similar to
jiffies_delta_to_msecs(). Regarding the API name, what about
scx_time_delta(s64 time_delta) and/or scx_time_diff(u64 time_a,
u64 time_b)?

>> +	if (!(rq->scx.flags & SCX_RQ_CLK_VALID) ||
>> +	    (rq->scx.prev_clock >= clock)) {
> 
> The clocks usually start at zero but it'd still be a good idea to use
> time_after64() and friends when comparing the ordering between timestamps.

Sure. I will update the code as suggested.

> 
>> +		/*
>> +		 * If the rq clock is invalid or goes backward,
>> +		 * start a new rq clock period with a fresh sched_clock_cpu().
>> +		 *
>> +		 * The cached rq clock can go backward because there is a
>> +		 * race with a timer interrupt. Suppose that a timer interrupt
> 
> This is not limited to timer interrupts, right? This kfunc can be called
> from anywhere including tracepoints for code running in IRQ
Yup, you are right. I will update the comments.


> 
>> +		 * occurred while running scx_bpf_now_ns() *after* reading the
>> +		 * rq clock and *before* comparing the if condition. The timer
>> +		 * interrupt will eventually call a BPF scheduler's ops.tick(),
>> +		 * and the BPF scheduler can call scx_bpf_now_ns(). Since the
>> +		 * scheduler core updates the rq clock before calling
>> +		 * ops.tick(), the scx_bpf_now_ns() call will get the fresh
>> +		 * clock. After handling the timer interrupt, the interrupted
> 
> This might be easier to explain with two column table explaning what each
> party is doing in what order.
I will beautify the text for readability.

> 
>> +		 * scx_bpf_now_ns() will be resumed, so the if condition will
>> +		 * be compared. In this case, the clock, which was read before
>> +		 * the timer interrupt, will be the same as rq->scx.prev_clock.
>> +		 * When such a case is detected, start a new rq clock period
>> +		 * with a fresh sched_clock_cpu().
>> +		 */
>> +		clock = sched_clock_cpu(cpu_of(rq));
>> +		scx_rq_clock_update(rq, clock);
> 
> Hmmm... what happens if e.g. a timer ends up performing multiple operations
> each going through rq pin/unpin?

That should be okay. After multiple rq pin/unpin operations, the
resumed scx_bpf_now_ns() will found that the prev_clock is
greater (not equal) than the current clock, so it will get the
fresh clock.

Thanks!
Changwoo Min