lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4933d674-0b3e-0b79-7749-a796f7b1cb6f@linux.intel.com>
Date:   Tue, 19 Jul 2022 08:19:54 +0100
From:   Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To:     Mauro Carvalho Chehab <mauro.chehab@...ux.intel.com>
Cc:     Mauro Carvalho Chehab <mchehab@...nel.org>,
        Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>,
        David Airlie <airlied@...ux.ie>,
        dri-devel@...ts.freedesktop.org,
        Lucas De Marchi <lucas.demarchi@...el.com>,
        linux-kernel@...r.kernel.org,
        Chris Wilson <chris.p.wilson@...el.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Dave Airlie <airlied@...hat.com>, stable@...r.kernel.org,
        intel-gfx@...ts.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations
 once wedged


On 18/07/2022 17:06, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:45:22 +0100
> Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com> wrote:
> 
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson@...el.com>
>>>
>>> Skip all further TLB invalidations once the device is wedged and
>>> had been reset, as, on such cases, it can no longer process instructions
>>> on the GPU and the user no longer has access to the TLB's in each engine.
>>>
>>> That helps to reduce the performance regression introduced by TLB
>>> invalidate logic.
>>>
>>> Cc: stable@...r.kernel.org
>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Is the claim of a performance regression this solved based on a wedged
>> GPU which does not work any more to the extend where mmio tlb
>> invalidation requests keep timing out? If so please clarify in the
>> commit text and then it looks good to me. Even if it is IMO a very
>> borderline situation to declare something a fix.
> 
> Indeed this helps on a borderline situation: if GT is wedged, TLB
> invalidation will timeout, so it makes sense to keep the patch with a
> comment like:
> 
>      drm/i915/gt: Skip TLB invalidations once wedged
>      
>      Skip all further TLB invalidations once the device is wedged and
>      had been reset, as, on such cases, it can no longer process instructions
>      on the GPU and the user no longer has access to the TLB's in each engine.
>      
>      So, an attempt to do a TLB cache invalidation will produce a timeout.
>      
>      That helps to reduce the performance regression introduced by TLB
>      invalidate logic.

Yeah that is better but whether bothering stable with it is the 
question. Wedged GPU means constant endless -EIO to userspace so very 
hard to imagine that after a TLB invalidation timeout or two there would 
be further ones. But okay, it's tiny so fine I guess.

Regards,

Tvrtko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ