lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnSJ67xyroVUwIna@slm.duckdns.org>
Date: Thu, 20 Jun 2024 09:58:35 -1000
From: Tejun Heo <tj@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, mingo@...hat.com,
	peterz@...radead.org, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
	daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
	joshdon@...gle.com, brho@...gle.com, pjt@...gle.com,
	derkling@...gle.com, haoluo@...gle.com, dvernet@...a.com,
	dschatzberg@...a.com, dskarlat@...cmu.edu, riel@...riel.com,
	changwoo@...lia.com, himadrics@...ia.fr, memxor@...il.com,
	andrea.righi@...onical.com, joel@...lfernandes.org,
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class

Hello,

On Thu, Jun 20, 2024 at 08:47:23PM +0200, Thomas Gleixner wrote:
> One example I very explicitely mentioned back then is the dance around
> fork().  It took me at least an hour last year to grok the convoluted
> logic and it did not get any faster when I stared at it today again.
> 
> fork()
>   sched_fork()
>     scx_pre_fork()
>       percpu_down_rwsem(&scx_fork_rwsem);
> 
>     if (dl_prio(p)) {
>     	ret = -EINVAL;
>         goto cancel; // required to release the semaphore
>     }
> 
>   sched_cgroup_fork()
>     return scx_fork();
> 
>   sched_post_fork()
>     scx_post_fork()
>       percpu_up_rwsem(&scx_fork_rwsem);
> 
> Plus the extra scx_cancel_fork() which releases the scx_fork_rwsem in
> case that any call after sched_fork() fails.

This part is actually tricky. sched_cgroup_fork() part is mostly just me
trying to find the right place among existing hooks. We can either just
rename sched_cgroup_fork() to a more generic name or separate out the SCX
hook in the fork path.

When a BPF scheduler attaches, it needs to establish its base operating
condition - ie. allocate per-task data structures, change sched class, and
so on. There is trade-off between how fine-grained the synchronization can
be and how easy it is for the BPF schedulers and we really do wanna make it
easy for the BPF schedulers.

So, the current approach is just locking things down while attaching which
makes things a lot easier for the BPF schedulers. The locking is through a
percpu_rwsem, so it's super heavy on the writer side but really light on the
reader (fork) side. Maybe the overhead can be further reduced by guarding it
with static_key but the difference won't be much and I doubt it'd make any
noticeable difference in the fork path.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ