[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4933d674-0b3e-0b79-7749-a796f7b1cb6f@linux.intel.com>
Date: Tue, 19 Jul 2022 08:19:54 +0100
From: Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To: Mauro Carvalho Chehab <mauro.chehab@...ux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@...nel.org>,
Thomas Hellström
<thomas.hellstrom@...ux.intel.com>,
David Airlie <airlied@...ux.ie>,
dri-devel@...ts.freedesktop.org,
Lucas De Marchi <lucas.demarchi@...el.com>,
linux-kernel@...r.kernel.org,
Chris Wilson <chris.p.wilson@...el.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Dave Airlie <airlied@...hat.com>, stable@...r.kernel.org,
intel-gfx@...ts.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations
once wedged
On 18/07/2022 17:06, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:45:22 +0100
> Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com> wrote:
>
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson@...el.com>
>>>
>>> Skip all further TLB invalidations once the device is wedged and
>>> had been reset, as, on such cases, it can no longer process instructions
>>> on the GPU and the user no longer has access to the TLB's in each engine.
>>>
>>> That helps to reduce the performance regression introduced by TLB
>>> invalidate logic.
>>>
>>> Cc: stable@...r.kernel.org
>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Is the claim of a performance regression this solved based on a wedged
>> GPU which does not work any more to the extend where mmio tlb
>> invalidation requests keep timing out? If so please clarify in the
>> commit text and then it looks good to me. Even if it is IMO a very
>> borderline situation to declare something a fix.
>
> Indeed this helps on a borderline situation: if GT is wedged, TLB
> invalidation will timeout, so it makes sense to keep the patch with a
> comment like:
>
> drm/i915/gt: Skip TLB invalidations once wedged
>
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
>
> So, an attempt to do a TLB cache invalidation will produce a timeout.
>
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
Yeah that is better but whether bothering stable with it is the
question. Wedged GPU means constant endless -EIO to userspace so very
hard to imagine that after a TLB invalidation timeout or two there would
be further ones. But okay, it's tiny so fine I guess.
Regards,
Tvrtko
Powered by blists - more mailing lists