linux-kernel - Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <390db8af-1510-580b-133c-dacf5adc56d1@amd.com>
Date:   Wed, 27 Sep 2023 13:54:38 +0200
From:   Christian König <christian.koenig@....com>
To:     Boris Brezillon <boris.brezillon@...labora.com>
Cc:     Danilo Krummrich <dakr@...hat.com>, airlied@...il.com,
        daniel@...ll.ch, matthew.brost@...el.com,
        faith.ekstrand@...labora.com, luben.tuikov@....com,
        dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org,
        Donald Robson <Donald.Robson@...tec.com>,
        Frank Binns <Frank.Binns@...tec.com>,
        Sarah Walker <sarah.walker@...tec.com>
Subject: Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow
 control

Am 26.09.23 um 09:11 schrieb Boris Brezillon:
> On Mon, 25 Sep 2023 19:55:21 +0200
> Christian König <christian.koenig@....com> wrote:
>
>> Am 25.09.23 um 14:55 schrieb Boris Brezillon:
>>> +The imagination team, who's probably interested too.
>>>
>>> On Mon, 25 Sep 2023 00:43:06 +0200
>>> Danilo Krummrich <dakr@...hat.com> wrote:
>>>   
>>>> Currently, job flow control is implemented simply by limiting the amount
>>>> of jobs in flight. Therefore, a scheduler is initialized with a
>>>> submission limit that corresponds to a certain amount of jobs.
>>>>
>>>> This implies that for each job drivers need to account for the maximum
>>>> job size possible in order to not overflow the ring buffer.
>>>>
>>>> However, there are drivers, such as Nouveau, where the job size has a
>>>> rather large range. For such drivers it can easily happen that job
>>>> submissions not even filling the ring by 1% can block subsequent
>>>> submissions, which, in the worst case, can lead to the ring run dry.
>>>>
>>>> In order to overcome this issue, allow for tracking the actual job size
>>>> instead of the amount job jobs. Therefore, add a field to track a job's
>>>> submission units, which represents the amount of units a job contributes
>>>> to the scheduler's submission limit.
>>> As mentioned earlier, this might allow some simplifications in the
>>> PowerVR driver where we do flow-control using a dma_fence returned
>>> through ->prepare_job(). The only thing that'd be missing is a way to
>>> dynamically query the size of a job (a new hook?), instead of having the
>>> size fixed at creation time, because PVR jobs embed native fence waits,
>>> and the number of native fences will decrease if some of these fences
>>> are signalled before ->run_job() is called, thus reducing the job size.
>> Exactly that is a little bit questionable since it allows for the device
>> to postpone jobs infinitely.
>>
>> It would be good if the scheduler is able to validate if it's ever able
>> to run the job when it is pushed into the entity.
> Yes, we do that already. We check that the immutable part of the job
> (everything that's not a native fence wait) fits in the ringbuf.

Yeah, but thinking more about it there might be really bad side effects. 
We shouldn't use a callback nor job credits because it might badly 
influence fairness between entities.

In other words when one entity submits always large jobs and another one 
always small ones then the scheduler would prefer the one which submits 
the smaller ones because they are easier to fit into the ring buffer.

What we can do is the follow:
1. The scheduler has some initial credits it can use to push jobs.
2. Each scheduler fence (and *not* the job) has a credits field of how 
much it will use.
3. After letting a a job run the credits of it's fence are subtracted 
from the available credits of the scheduler.
4. The scheduler can keep running jobs as long as it has a positive 
credit count.
5. When the credit count becomes negative it goes to sleep until a 
scheduler fence signals and the count becomes positive again.

This way jobs are handled equally, you can still push jobs up to at 
least halve your ring buffer size and you should be able to handle your 
PowerVR case by calculating the credits you actually used in your 
run_job() callback.

As far as I can see that approach should work, shouldn't it?

Regards,
Christian.