[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30f42096-3f42-594e-8ff1-c09341925518@linux.intel.com>
Date: Thu, 24 Nov 2022 14:32:25 +0000
From: Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To: Tejun Heo <tj@...nel.org>
Cc: Intel-gfx@...ts.freedesktop.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
Zefan Li <lizefan.x@...edance.com>,
Dave Airlie <airlied@...hat.com>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Rob Clark <robdclark@...omium.org>,
Stéphane Marchesin <marcheu@...omium.org>,
"T . J . Mercier" <tjmercier@...gle.com>, Kenny.Ho@....com,
Christian König <christian.koenig@....com>,
Brian Welty <brian.welty@...el.com>,
Tvrtko Ursulin <tvrtko.ursulin@...el.com>
Subject: Re: [RFC 11/13] cgroup/drm: Introduce weight based drm cgroup control
On 22/11/2022 21:29, Tejun Heo wrote:
> On Wed, Nov 09, 2022 at 04:11:39PM +0000, Tvrtko Ursulin wrote:
>> +DRM scheduling soft limits
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Because of the heterogenous hardware and driver DRM capabilities, soft limits
>> +are implemented as a loose co-operative (bi-directional) interface between the
>> +controller and DRM core.
>> +
>> +The controller configures the GPU time allowed per group and periodically scans
>> +the belonging tasks to detect the over budget condition, at which point it
>> +invokes a callback notifying the DRM core of the condition.
>> +
>> +DRM core provides an API to query per process GPU utilization and 2nd API to
>> +receive notification from the cgroup controller when the group enters or exits
>> +the over budget condition.
>> +
>> +Individual DRM drivers which implement the interface are expected to act on this
>> +in the best-effort manner only. There are no guarantees that the soft limits
>> +will be respected.
>
> Soft limits is a bit of misnomer and can be confused with best-effort limits
> such as memory.high. Prolly best to not use the term.
Are you suggesting "best effort limits" or "best effort <something>"? It
would sounds good to me if we found the right <something>. Best effort
budget perhaps?
>> +static bool
>> +__start_scanning(struct drm_cgroup_state *root, unsigned int period_us)
>> +{
>> + struct cgroup_subsys_state *node;
>> + bool ok = false;
>> +
>> + rcu_read_lock();
>> +
>> + css_for_each_descendant_post(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> +
>> + if (!css_tryget_online(node))
>> + goto out;
>> +
>> + drmcs->active_us = 0;
>> + drmcs->sum_children_weights = 0;
>> +
>> + if (node == &root->css)
>> + drmcs->per_s_budget_ns =
>> + DIV_ROUND_UP_ULL(NSEC_PER_SEC * period_us,
>> + USEC_PER_SEC);
>> + else
>> + drmcs->per_s_budget_ns = 0;
>> +
>> + css_put(node);
>> + }
>> +
>> + css_for_each_descendant_post(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> + struct drm_cgroup_state *parent;
>> + u64 active;
>> +
>> + if (!css_tryget_online(node))
>> + goto out;
>> + if (!node->parent) {
>> + css_put(node);
>> + continue;
>> + }
>> + if (!css_tryget_online(node->parent)) {
>> + css_put(node);
>> + goto out;
>> + }
>> + parent = css_to_drmcs(node->parent);
>> +
>> + active = drmcs_get_active_time_us(drmcs);
>> + if (active > drmcs->prev_active_us)
>> + drmcs->active_us += active - drmcs->prev_active_us;
>> + drmcs->prev_active_us = active;
>> +
>> + parent->active_us += drmcs->active_us;
>> + parent->sum_children_weights += drmcs->weight;
>> +
>> + css_put(node);
>> + css_put(&parent->css);
>> + }
>> +
>> + ok = true;
>> +
>> +out:
>> + rcu_read_unlock();
>> +
>> + return ok;
>> +}
>
> A more conventional and scalable way to go about this would be using an
> rbtree keyed by virtual time. Both CFS and blk-iocost are examples of this,
> but I think for drm, it can be a lot simpler.
It's well impressive you were able to figure out what I am doing there.
:) And probably you can see that this is the first time I am attempting
an algorithm like this one. I think I made it /dtrt/ with a few post/pre
walks so the right pieces of data propagate correctly.
Are you suggesting a parallel/shadow tree to be kept in the drm
controller (which would shadow the cgroup hierarchy)? Or something else?
The mention of rbtree is not telling me much, but I will look into the
referenced examples. (Although I will refrain from major rework until
more people start "biting" into all this.)
Also, when you mention scalability you are concerned about multiple tree
walks I have per iteration? I wasn't so much worried about that,
definitely not for the RFC, but even in general due relatively low
frequency of scanning and a good amount of less trivial cost being
outside the actual tree walks (drm client walks, GPU utilisation
calculations, maybe more). But perhaps I don't have the right idea on
how big cgroups hierarchies can be compared to number of drm clients etc.
Regards,
Tvrtko
Powered by blists - more mailing lists