linux-kernel - Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230926091129.2d7d7472@collabora.com>
Date:   Tue, 26 Sep 2023 09:11:29 +0200
From:   Boris Brezillon <boris.brezillon@...labora.com>
To:     Christian König <christian.koenig@....com>
Cc:     Danilo Krummrich <dakr@...hat.com>, airlied@...il.com,
        daniel@...ll.ch, matthew.brost@...el.com,
        faith.ekstrand@...labora.com, luben.tuikov@....com,
        dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org,
        Donald Robson <Donald.Robson@...tec.com>,
        Frank Binns <Frank.Binns@...tec.com>,
        Sarah Walker <sarah.walker@...tec.com>
Subject: Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow
 control

On Mon, 25 Sep 2023 19:55:21 +0200
Christian König <christian.koenig@....com> wrote:

> Am 25.09.23 um 14:55 schrieb Boris Brezillon:
> > +The imagination team, who's probably interested too.
> >
> > On Mon, 25 Sep 2023 00:43:06 +0200
> > Danilo Krummrich <dakr@...hat.com> wrote:
> >  
> >> Currently, job flow control is implemented simply by limiting the amount
> >> of jobs in flight. Therefore, a scheduler is initialized with a
> >> submission limit that corresponds to a certain amount of jobs.
> >>
> >> This implies that for each job drivers need to account for the maximum
> >> job size possible in order to not overflow the ring buffer.
> >>
> >> However, there are drivers, such as Nouveau, where the job size has a
> >> rather large range. For such drivers it can easily happen that job
> >> submissions not even filling the ring by 1% can block subsequent
> >> submissions, which, in the worst case, can lead to the ring run dry.
> >>
> >> In order to overcome this issue, allow for tracking the actual job size
> >> instead of the amount job jobs. Therefore, add a field to track a job's
> >> submission units, which represents the amount of units a job contributes
> >> to the scheduler's submission limit.  
> > As mentioned earlier, this might allow some simplifications in the
> > PowerVR driver where we do flow-control using a dma_fence returned
> > through ->prepare_job(). The only thing that'd be missing is a way to
> > dynamically query the size of a job (a new hook?), instead of having the
> > size fixed at creation time, because PVR jobs embed native fence waits,
> > and the number of native fences will decrease if some of these fences
> > are signalled before ->run_job() is called, thus reducing the job size.  
> 
> Exactly that is a little bit questionable since it allows for the device 
> to postpone jobs infinitely.
> 
> It would be good if the scheduler is able to validate if it's ever able 
> to run the job when it is pushed into the entity.

Yes, we do that already. We check that the immutable part of the job
(everything that's not a native fence wait) fits in the ringbuf.