[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6818e34e-d41c-67b7-85dd-76d2e47bc078@linux.intel.com>
Date: Thu, 7 Oct 2021 10:09:51 +0100
From: Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To: Barry Song <21cnbao@...il.com>
Cc: "Wanghui (John)" <john.wanghui@...wei.com>,
Intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
LKML <linux-kernel@...r.kernel.org>,
Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [RFC 1/8] sched: Add nice value change notifier
On 07/10/2021 09:50, Tvrtko Ursulin wrote:
>
> On 06/10/2021 21:21, Barry Song wrote:
>> On Thu, Oct 7, 2021 at 2:44 AM Tvrtko Ursulin
>> <tvrtko.ursulin@...ux.intel.com> wrote:
>>>
>>>
>>> Hi,
>>>
>>> On 06/10/2021 08:58, Barry Song wrote:
>>>> On Wed, Oct 6, 2021 at 5:15 PM Wanghui (John)
>>>> <john.wanghui@...wei.com> wrote:
>>>>>
>>>>> HI Tvrtko
>>>>>
>>>>> On 2021/10/4 22:36, Tvrtko Ursulin wrote:
>>>>>> void set_user_nice(struct task_struct *p, long nice)
>>>>>> {
>>>>>> bool queued, running;
>>>>>> - int old_prio;
>>>>>> + int old_prio, ret;
>>>>>> struct rq_flags rf;
>>>>>> struct rq *rq;
>>>>>>
>>>>>> @@ -6915,6 +6947,9 @@ void set_user_nice(struct task_struct *p,
>>>>>> long nice)
>>>>>>
>>>>>> out_unlock:
>>>>>> task_rq_unlock(rq, p, &rf);
>>>>>> +
>>>>>> + ret = atomic_notifier_call_chain(&user_nice_notifier_list,
>>>>>> nice, p);
>>>>>> + WARN_ON_ONCE(ret != NOTIFY_DONE);
>>>>>> }
>>>>> How about adding a new "io_nice" to task_struct,and move the call
>>>>> chain to
>>>>> sched_setattr/getattr, there are two benefits:
>>>>
>>>> We already have an ionice for block io scheduler. hardly can this
>>>> new io_nice
>>>> be generic to all I/O. it seems the patchset is trying to link
>>>> process' nice with
>>>> GPU's scheduler, to some extent, it makes more senses than having a
>>>> common ionice because we have a lot of IO devices in the systems, we
>>>> don't
>>>> know which I/O the ionice of task_struct should be applied to.
>>>>
>>>> Maybe we could have an ionice dedicated for GPU just like ionice for
>>>> CFQ
>>>> of bio/request scheduler.
>>>
>>> Thought crossed my mind but I couldn't see the practicality of a 3rd
>>> nice concept. I mean even to start with I struggle a bit with the
>>> usefulness of existing ionice vs nice. Like coming up with practical
>>> examples of usecases where it makes sense to decouple the two
>>> priorities.
>>>
>>> From a different angle I did think inheriting CPU nice makes sense for
>>> GPU workloads. This is because today, and more so in the future,
>>> computations on a same data set do flow from one to the other.
>>>
>>> Like maybe a simple example of batch image processing where CPU decodes,
>>> GPU does a transform and then CPU encodes. Or a different mix, doesn't
>>> really matter, since the main point it is one computing pipeline from
>>> users point of view.
>>>
>>
>> I am on it. but I am also seeing two problems here:
>> 1. nice is not global in linux. For example, if you have two cgroups,
>> cgroup A
>> has more quota then cgroup B. Tasks in B won't win even if it has a
>> lower nice.
>> cgroups will run proportional-weight time-based division of CPU.
>>
>> 2. Historically, we had dynamic nice which was adjusted based on the
>> average
>> sleep/running time; right now, we don't have dynamic nice, but virtual
>> time
>> still make tasks which sleep more preempt other tasks with the same nice
>> or even lower nice.
>> virtual time += physical time/weight by nice
>> so, static nice number doesn't always make sense to decide preemption.
>>
>> So it seems your patch only works under some simple situation for example
>> no cgroups, tasks have similar sleep/running time.
>
> Yes, I broadly agree with your assessment. Although there are plans for
> adding cgroup support to i915 scheduling, I doubt as fine grained
> control and exact semantics as there are on the CPU side will happen.
>
> Mostly because the drive seems to be for more micro-controller managed
> scheduling which adds further challenges in connecting the two sides
> together.
>
> But when you say it is a problem, I would characterize it more a
> weakness in terms of being only a subset of possible control. It is
> still richer (better?) than what currently exists and as demonstrated
> with benchmarks in my cover letter it can deliver improvements in user
> experience. If in the mid term future we can extend it with cgroup
> support then the concept should still apply and get closer to how you
> described nice works in the CPU world.
>
> Main question in my mind is whether the idea of adding the
> sched_attr/priority notifier to the kernel can be justified. Because as
> mentioned before, everything apart from adjusting currently running GPU
> jobs could be done purely in userspace. Stack changes would be quite
> extensive and all, but that is not usually a good enough reason to put
> something in the kernel. That's why it is an RFC an invitation to discuss.
>
> Even ionice inherits from nice (see task_nice_ioprio()) so I think
> argument can be made for drivers as well.
Now that I wrote this, I had a little bit of a light bulb moment. If I
abandon the idea of adjusting the priority of already submitted work
items, then I can do much of what I want purely from within the confines
of i915.
I simply add code to inherit from current task nice on every new work
item submission. This should probably bring the majority of the benefit
I measured.
Regards,
Tvrtko
Powered by blists - more mailing lists