lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 13 Jan 2022 16:08:57 -0500 From: Daniel Jordan <daniel.m.jordan@...cle.com> To: Tejun Heo <tj@...nel.org> Cc: Peter Zijlstra <peterz@...radead.org>, Alexander Duyck <alexanderduyck@...com>, Alex Williamson <alex.williamson@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, Ben Segall <bsegall@...gle.com>, Cornelia Huck <cohuck@...hat.com>, Dan Williams <dan.j.williams@...el.com>, Dave Hansen <dave.hansen@...ux.intel.com>, Dietmar Eggemann <dietmar.eggemann@....com>, Herbert Xu <herbert@...dor.apana.org.au>, Ingo Molnar <mingo@...hat.com>, Jason Gunthorpe <jgg@...dia.com>, Johannes Weiner <hannes@...xchg.org>, Josh Triplett <josh@...htriplett.org>, Michal Hocko <mhocko@...e.com>, Nico Pache <npache@...hat.com>, Pasha Tatashin <pasha.tatashin@...een.com>, Steffen Klassert <steffen.klassert@...unet.com>, Steve Sistare <steven.sistare@...cle.com>, Tim Chen <tim.c.chen@...ux.intel.com>, Vincent Guittot <vincent.guittot@...aro.org>, linux-mm@...ck.org, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org Subject: Re: [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth On Wed, Jan 12, 2022 at 10:18:16AM -1000, Tejun Heo wrote: > Hello, Hi, Tejun. > On Tue, Jan 11, 2022 at 11:29:50AM -0500, Daniel Jordan wrote: > ... > > This problem arises with multithreaded jobs, but is also an issue in other > > places. CPU activity from async memory reclaim (kswapd, cswapd?[5]) should be > > accounted to the cgroup that the memory belongs to, and similarly CPU activity > > from net rx should be accounted to the task groups that correspond to the > > packets being received. There are also vague complaints from Android[6]. > > These are pretty big holes in CPU cycle accounting right now and I think > spend-first-and-backcharge is the right solution for most of them given > experiences from other controllers. That said, > > > Each use case has its own requirements[7]. In padata and reclaim, the task > > group to account to is known ahead of time, but net rx has to spend cycles > > processing a packet before its destination task group is known, so any solution > > should be able to work without knowing the task group in advance. Furthermore, > > the CPU controller shouldn't throttle reclaim or net rx in real time since both > > are doing high priority work. These make approaches that run kthreads directly > > in a task group, like cgroup-aware workqueues[8] or a kernel path for > > CLONE_INTO_CGROUP, infeasible. Running kthreads directly in cgroups also has a > > downside for padata because helpers' MAX_NICE priority is "shadowed" by the > > priority of the group entities they're running under. > > > > The proposed solution of remote charging can accrue debt to a task group to be > > paid off or forgiven later, addressing all these issues. A kthread calls the > > interface > > > > void cpu_cgroup_remote_begin(struct task_struct *p, > > struct cgroup_subsys_state *css); > > > > to begin remote charging to @css, causing @p's current sum_exec_runtime to be > > updated and saved. The @css arg isn't required and can be removed later to > > facilitate the unknown cgroup case mentioned above. Then the kthread calls > > another interface > > > > void cpu_cgroup_remote_charge(struct task_struct *p, > > struct cgroup_subsys_state *css); > > > > to account the sum_exec_runtime that @p has used since the first call. > > Internally, a new field cfs_bandwidth::debt is added to keep track of unpaid > > debt that's only used when the debt exceeds the quota in the current period. > > > > Weight-based control isn't implemented for now since padata helpers run at > > MAX_NICE and so always yield to anything higher priority, meaning they would > > rarely compete with other task groups. > > If we're gonna do this, let's please do it right and make weight based > control work too. Otherwise, its usefulness is pretty limited. Ok, understood. Doing it as presented is an incremental step and all that's required for this. I figured weight could be added later with the first user that actually needs it. I did prototype weight too, though, just to see if it was all gonna work together, so given how the discussion elsewhere in the thread is going, I might respin the scheduler part of this with another use case and weight-based control included. I got this far, do the interface and CFS skeleton seem sane? Both are basically unchanged with weight-based control included, the weight parts are just more code on top. Thanks for looking.
Powered by blists - more mailing lists