linux-kernel - Re: [PATCH v4 4/6] sched_ext: Implement scx_bpf_now

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z1lJ-AzRlFIv4OuP@slm.duckdns.org>
Date: Tue, 10 Dec 2024 22:14:48 -1000
From: Tejun Heo <tj@...nel.org>
To: Changwoo Min <multics69@...il.com>
Cc: void@...ifault.com, mingo@...hat.com, peterz@...radead.org,
	changwoo@...lia.com, kernel-dev@...lia.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 4/6] sched_ext: Implement scx_bpf_now_ns()

Hello,

I'd roll the preceding two patches into this one.

On Mon, Dec 09, 2024 at 03:15:29PM +0900, Changwoo Min wrote:
...
> 3) Monotonically non-decreasing clock for the same CPU: scx_bpf_now_ns()
>  guarantees the clock never goes backward when comparing them in the same
>  CPU. On the other hand, when comparing clocks in different CPUs, there
>  is no such guarantee -- the clock can go backward. It provides a
>  monotonically *non-decreasing* clock so that it would provide the same
>  clock values in two different scx_bpf_now_ns() calls in the same CPU
>  during the same period of when the rq clock is valid.

We probably should provide helpers to calculate deltas between timestamps
and use them consitently in SCX scheds. e.g. ops.runnable() and
ops.running() can run on different CPUs and it'd be useful and common to
calculate the delta between the two points in time.

...
> +__bpf_kfunc u64 scx_bpf_now_ns(void)
> +{
> +	struct rq *rq;
> +	u64 clock;
> +
> +	preempt_disable();
> +
> +	/*
> +	 * If the rq clock is valid, use the cached rq clock
> +	 * whenever the clock does not go backward.
> +	 */
> +	rq = this_rq();
> +	clock = rq->scx.clock;
> +
> +	if (!(rq->scx.flags & SCX_RQ_CLK_VALID) ||
> +	    (rq->scx.prev_clock >= clock)) {

The clocks usually start at zero but it'd still be a good idea to use
time_after64() and friends when comparing the ordering between timestamps.

> +		/*
> +		 * If the rq clock is invalid or goes backward,
> +		 * start a new rq clock period with a fresh sched_clock_cpu().
> +		 *
> +		 * The cached rq clock can go backward because there is a
> +		 * race with a timer interrupt. Suppose that a timer interrupt

This is not limited to timer interrupts, right? This kfunc can be called
from anywhere including tracepoints for code running in IRQ.

> +		 * occurred while running scx_bpf_now_ns() *after* reading the
> +		 * rq clock and *before* comparing the if condition. The timer
> +		 * interrupt will eventually call a BPF scheduler's ops.tick(),
> +		 * and the BPF scheduler can call scx_bpf_now_ns(). Since the
> +		 * scheduler core updates the rq clock before calling
> +		 * ops.tick(), the scx_bpf_now_ns() call will get the fresh
> +		 * clock. After handling the timer interrupt, the interrupted

This might be easier to explain with two column table explaning what each
party is doing in what order.

> +		 * scx_bpf_now_ns() will be resumed, so the if condition will
> +		 * be compared. In this case, the clock, which was read before
> +		 * the timer interrupt, will be the same as rq->scx.prev_clock.
> +		 * When such a case is detected, start a new rq clock period
> +		 * with a fresh sched_clock_cpu().
> +		 */
> +		clock = sched_clock_cpu(cpu_of(rq));
> +		scx_rq_clock_update(rq, clock);

Hmmm... what happens if e.g. a timer ends up performing multiple operations
each going through rq pin/unpin?

Thanks.

-- 
tejun