lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <27b7882e-1201-b173-6f56-9ececb5780e8@linux.intel.com> Date: Thu, 2 Feb 2023 14:26:06 +0000 From: Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com> To: Tejun Heo <tj@...nel.org> Cc: Intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>, Zefan Li <lizefan.x@...edance.com>, Dave Airlie <airlied@...hat.com>, Daniel Vetter <daniel.vetter@...ll.ch>, Rob Clark <robdclark@...omium.org>, Stéphane Marchesin <marcheu@...omium.org>, "T . J . Mercier" <tjmercier@...gle.com>, Kenny.Ho@....com, Christian König <christian.koenig@....com>, Brian Welty <brian.welty@...el.com>, Tvrtko Ursulin <tvrtko.ursulin@...el.com> Subject: Re: [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control On 28/01/2023 01:11, Tejun Heo wrote: > On Thu, Jan 12, 2023 at 04:56:07PM +0000, Tvrtko Ursulin wrote: > ... >> + /* >> + * 1st pass - reset working values and update hierarchical weights and >> + * GPU utilisation. >> + */ >> + if (!__start_scanning(root, period_us)) >> + goto out_retry; /* >> + * Always come back later if scanner races with >> + * core cgroup management. (Repeated pattern.) >> + */ >> + >> + css_for_each_descendant_pre(node, &root->css) { >> + struct drm_cgroup_state *drmcs = css_to_drmcs(node); >> + struct cgroup_subsys_state *css; >> + unsigned int over_weights = 0; >> + u64 unused_us = 0; >> + >> + if (!css_tryget_online(node)) >> + goto out_retry; >> + >> + /* >> + * 2nd pass - calculate initial budgets, mark over budget >> + * siblings and add up unused budget for the group. >> + */ >> + css_for_each_child(css, &drmcs->css) { >> + struct drm_cgroup_state *sibling = css_to_drmcs(css); >> + >> + if (!css_tryget_online(css)) { >> + css_put(node); >> + goto out_retry; >> + } >> + >> + sibling->per_s_budget_us = >> + DIV_ROUND_UP_ULL(drmcs->per_s_budget_us * >> + sibling->weight, >> + drmcs->sum_children_weights); >> + >> + sibling->over = sibling->active_us > >> + sibling->per_s_budget_us; >> + if (sibling->over) >> + over_weights += sibling->weight; >> + else >> + unused_us += sibling->per_s_budget_us - >> + sibling->active_us; >> + >> + css_put(css); >> + } >> + >> + /* >> + * 3rd pass - spread unused budget according to relative weights >> + * of over budget siblings. >> + */ >> + css_for_each_child(css, &drmcs->css) { >> + struct drm_cgroup_state *sibling = css_to_drmcs(css); >> + >> + if (!css_tryget_online(css)) { >> + css_put(node); >> + goto out_retry; >> + } >> + >> + if (sibling->over) { >> + u64 budget_us = >> + DIV_ROUND_UP_ULL(unused_us * >> + sibling->weight, >> + over_weights); >> + sibling->per_s_budget_us += budget_us; >> + sibling->over = sibling->active_us > >> + sibling->per_s_budget_us; >> + } >> + >> + css_put(css); >> + } >> + >> + css_put(node); >> + } >> + >> + /* >> + * 4th pass - send out over/under budget notifications. >> + */ >> + css_for_each_descendant_post(node, &root->css) { >> + struct drm_cgroup_state *drmcs = css_to_drmcs(node); >> + >> + if (!css_tryget_online(node)) >> + goto out_retry; >> + >> + if (drmcs->over || drmcs->over_budget) >> + signal_drm_budget(drmcs, >> + drmcs->active_us, >> + drmcs->per_s_budget_us); >> + drmcs->over_budget = drmcs->over; >> + >> + css_put(node); >> + } > > It keeps bothering me that the distribution logic has no memory. Maybe this > is good enough for coarse control with long cycle durations but it likely > will get in trouble if pushed to finer grained control. State keeping > doesn't require a lot of complexity. The only state that needs tracking is > each cgroup's vtime and then the core should be able to tell specific > drivers how much each cgroup is over or under fairly accurately at any given > time. > > That said, this isn't a blocker. What's implemented can work well enough > with coarse enough time grain and that might be enough for the time being > and we can get back to it later. I think Michal already mentioned it but it > might be a good idea to track active and inactive cgroups and build the > weight tree with only active ones. There are machines with a lot of mostly > idle cgroups (> tens of thousands) and tree wide scanning even at low > frequency can become a pretty bad bottleneck. Right, that's the kind of experience (tens of thousands) I was missing, thank you. Another one item on my TODO list then but I have a question first. When you say active/inactive - to what you are referring in the cgroup world? Offline/online? For those my understanding was offline was a temporary state while css is getting destroyed. Also, I am really postponing implementing those changes until I hear at least something from the DRM community. Regards, Tvrtko
Powered by blists - more mailing lists