linux-kernel - Re: [PATCH drm-misc-next v3] drm/sched: implement dynamic job-flow control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <ZUEW1mxwGO3bxxGM@pollux>
Date:   Tue, 31 Oct 2023 16:01:42 +0100
From:   Danilo Krummrich <dakr@...hat.com>
To:     Christian König <christian.koenig@....com>
Cc:     Boris Brezillon <boris.brezillon@...labora.com>, airlied@...il.com,
        daniel@...ll.ch, matthew.brost@...el.com, faith@...strand.net,
        luben.tuikov@....com, dri-devel@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH drm-misc-next v3] drm/sched: implement dynamic job-flow
 control

On Tue, Oct 31, 2023 at 02:20:50PM +0100, Christian König wrote:
> Hi Danilo,
> 
> sorry for splitting up the mail thread. I had to fetch this mail from my
> spam folder for some reason.
> 
> Am 30.10.23 um 18:56 schrieb Danilo Krummrich:
> > Hi Christian,
> > 
> > [SNIP]
> > > > And yes, we can live with the overhead of making jobs
> > > > slightly bigger than they actually are, thus potentially delaying their
> > > > execution. It's just that I don't quite understand the rational behind
> > > > this conservatism, as I don't really see what negative impact this extra
> > > > ->update_job_credits() call in the credit checking path has, other than
> > > > the slight overhead of an if-check for drivers that don't need it.
> > >  From experience it showed that we should not make the scheduler more
> > > complicated than necessary. And I still think that the ring buffers only
> > > need to be filled enough to keep the hardware busy.
> > Right, and this callback contributes to exactly that.
> > 
> > I don't really think there is much to worry about in terms of introducing more
> > complexity. The implementation behind this callback is fairly trivial - it's
> > simply called right before we check whether a job fits on the ring, to fetch
> > the job's actual size.
> > 
> > I would agree if the implementation of that would be bulky, tricky and asking
> > for a compromise. But, since it actually is simple and straight forward I really
> > think that if we implement some kind of dynamic job-flow control it should be
> > implemented as acurate as possible rather than doing it half-baked.
> 
> Yeah, I see the intention here and can perfectly relate to it it's just that
> I have prioritize other things.

I don't see any work being required from your side for this.

> 
> Adding this callback allows for the driver to influence the job submission
> and while this might seems useful now I'm just to much of a burned child to
> do stuff like this without having a really good reason for it.

It does influence the job submission in the exact same way as the initial credit
count set through drm_sched_job_init() does. There is absolutely nothing with
this callback a driver couldn't mess up in the exact same way with the initial
credit count set through drm_sched_job_init(). Following this logic we'd need to
abandon the whole patch.

Hence, I don't really understand why you're so focused on this callback.
Especially, since it's an optional one.

> 
> > > If this here has some measurable positive effect then yeah we should
> > > probably do it, but as long as it's only nice to have I have some objections
> > > to that.
> > Can't answer this, since Nouveau doesn't support native fence waits. However, I
> > guess it depends on how many native fences a job carries. So, if we'd have two
> > jobs with each of them carrying a lot of native fences, but not a lot of actual
> > work, I can very well imagine that over-accounting can have a measureable
> > impact.
> 
> What I can imagine as well is things like the hardware or firmware is
> looking at the fullness of the ring buffer to predict how much pending work
> there is.
> 
> > I just wonder if we really want to ask for real measurements given that the
> > optimization is rather trivial and cheap and we already have enough evidence
> > that it makes sense conceptually.
> 
> Well that's the point I disagree on, this callback isn't trivial. If (for
> example) the driver returns a value larger than initially estimated for the
> job we can run into an endless loop.

I agree it doesn't make sense to increase, but it wouldn't break anything,
unless the job size starts exceeding the capacity of the ring. And this case is
handled anyway.

> 
> It's just one more thing which can go boom. At bare minimum we should check
> that the value is always decreasing.

Considering the above I still agree, such a check makes sense - gonna add one.

- Danilo

> 
> Christian.
> 
> > 
> > - Danilo
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Regards,
> > > > 
> > > > Boris