linux-kernel - Re: [PATCH v8 0/6] sched_ext: Support high-performance monotonically non-decreasing clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4Djg_Va5wZ90ZoV@gpd3>
Date: Fri, 10 Jan 2025 10:08:19 +0100
From: Andrea Righi <arighi@...dia.com>
To: Changwoo Min <changwoo@...lia.com>
Cc: tj@...nel.org, void@...ifault.com, mingo@...hat.com,
	peterz@...radead.org, kernel-dev@...lia.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 0/6] sched_ext: Support high-performance monotonically
 non-decreasing clock

Hi Changwoo,

On Thu, Jan 09, 2025 at 10:14:50PM +0900, Changwoo Min wrote:
> Many BPF schedulers (such as scx_central, scx_lavd, scx_rusty, scx_bpfland,
> and scx_flash) frequently call bpf_ktime_get_ns() for tracking tasks' runtime
> properties. If supported, bpf_ktime_get_ns() eventually reads a hardware
> timestamp counter (TSC). However, reading a hardware TSC is not
> performant in some hardware platforms, degrading IPC.
> 
> This patchset addresses the performance problem of reading hardware TSC
> by leveraging the rq clock in the scheduler core, introducing a
> scx_bpf_now() function for BPF schedulers. Whenever the rq clock
> is fresh and valid, scx_bpf_now() provides the rq clock, which is
> already updated by the scheduler core (update_rq_clock), so it can reduce
> reading the hardware TSC.
> 
> When the rq lock is released (rq_unpin_lock), the rq clock is invalidated,
> so a subsequent scx_bpf_now() call gets the fresh sched_clock for the caller.
> 
> In addition, scx_bpf_now() guarantees the clock is monotonically
> non-decreasing for the same CPU, so the clock cannot go backward
> in the same CPU.
> 
> Using scx_bpf_now() reduces the number of reading hardware TSC
> by 50-80% (76% for scx_lavd, 82% for scx_bpfland, and 51% for scx_rusty)
> for the following benchmark:
> 
>     perf bench -f simple sched messaging -t -g 20 -l 6000

Looks good to me, I also ran some stress tests using scx_bpf_now() with
this new patch set and I haven't noticed any issue.

Acked-by: Andrea Righi <arighi@...dia.com>

Thanks,
-Andrea