lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d79492ad-b99a-f9a9-f64a-52b94db68a3b@linux.intel.com>
Date:   Tue, 28 Jun 2022 16:49:23 +0100
From:   Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>
To:     "Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab" 
        <mauro.chehab@...ux.intel.com>
Cc:     Andi Shyti <andi.shyti@...ux.intel.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Chris Wilson <chris.p.wilson@...el.com>,
        Fei Yang <fei.yang@...el.com>,
        Thomas Hellstrom <thomas.hellstrom@...el.com>,
        Bruce Chang <yu.bruce.chang@...el.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Dave Airlie <airlied@...hat.com>,
        David Airlie <airlied@...ux.ie>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        John Harrison <John.C.Harrison@...el.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Matt Roper <matthew.d.roper@...el.com>,
        Matthew Brost <matthew.brost@...el.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@...el.com>,
        Umesh Nerlige Ramappa <umesh.nerlige.ramappa@...el.com>,
        dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org,
        Mika Kuoppala <mika.kuoppala@...ux.intel.com>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        stable@...r.kernel.org,
        Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>
Subject: Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple
 engine resets


Hi,

On 27/06/2022 10:00, Mauro Carvalho Chehab (by way of Mauro Carvalho 
Chehab <mauro.chehab@...ux.intel.com>) wrote:
> Hi Tvrtko,
> 
> On Fri, 24 Jun 2022 09:34:21 +0100
> Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com> wrote:
> 
>> On 23/06/2022 12:17, Andi Shyti wrote:
>>> Hi Mauro,
>>>
>>> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>>>> From: Chris Wilson <chris.p.wilson@...el.com>
>>>>
>>>> Don't allow two engines to be reset in parallel, as they would both
>>>> try to select a reset bit (and send requests to common registers)
>>>> and wait on that register, at the same time. Serialize control of
>>>> the reset requests/acks using the uncore->lock, which will also ensure
>>>> that no other GT state changes at the same time as the actual reset.
>>>>
>>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>
>>>> Reported-by: Mika Kuoppala <mika.kuoppala@...ux.intel.com>
>>>> Signed-off-by: Chris Wilson <chris@...is-wilson.co.uk>
>>>> Cc: Mika Kuoppala <mika.kuoppala@...ux.intel.com>
>>>> Cc: Andi Shyti <andi.shyti@...el.com>
>>>> Cc: stable@...r.kernel.org
>>>> Acked-by: Thomas Hellström <thomas.hellstrom@...ux.intel.com>
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab@...nel.org>
>>>
>>> Reviewed-by: Andi Shyti <andi.shyti@...ux.intel.com>
>>
>> Notice I had a bunch of questions and asks in this series so please do
>> not merge until those are addressed.
>>
>> In this particular patch (and some others) for instance Fixes: tag, at
>> least against that sha, shouldn't be there.
> 
> Hmm... I sent an answer to your points, but I can't see it at:
> 
> 	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/
> 
> Maybe it got lost somewhere, I dunno.

Yeah, no replies received on my end I'm afraid.

> 
> Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
> directly related to changeset 7938d61591d3. Yet, this one is required for
> patch 6 to work.
> 
> The other patches on this series, though, are modifying the code
> introduced by changeset 7938d61591d3.

Modifying the code does not strictly means something is a fix for a 
certain patch.

> Patch 2 is clearly a workaround needed for TLB cache invalidation to
> work on some GPUs. So, while not related to Broadwell, they're also
> fixing some TLB cache issues. So, IMO, it should keep the fixes.

Umesh commented that patch 2 is not needed - who is right then? :)

> I tried to port just the two serialize patches to drm-tip, in order
> to solve the issues on Broadwell, but it didn't work, as the logic
> inside the spinlock could be calling schedule() with a spinlock hold:
>   
> 	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
> 	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
> 	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
> 	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]
> 
> I didn't investigate the root cause, but it seems related to PM, so
> patches 1 and 3 seem to be required for the serialization logic
> to actually work.

Yes that is clear, what is needed is the split of the for_each_engine 
loop into request and wait.

But question is how much backporting trouble will the _extra_ changes 
patch 1 brings create.

In the ideal world patch 1 wouldn't be an optimising one, I mean adding 
skipping of TLB invalidations on idle engines but just the loop split. 
That would make it smaller and more suitable for Cc: stable. Because 
both i915_gem_pages.c and intel_gt_pm.h hunks wouldn't even be there. 
And the refactor in intel_gt_invalidate_tlbs would be smaller since it 
wouldn't be adding the engine awake checks...

> So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
> on patches: 1, 2, 3 and 6.

... which for me means a different patch 1, followed by patch 6 (moved 
to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible 
to implement and if it is just send those minimal patches out alone?

Maybe it even makes sense to squash such 1&2 into a single patch.

Again, since the original TLB flush was backported quite far back into 
long term stable releases I think it would be much easier to really have 
a minimal patch/series to fix Broadwell in those kernels.

Regards,

Tvrtko

> 
> Yet, IMO the entire series should be merged on -stable.
> 
> If that's OK for you and there's no additional issues to be
> addressed, I'll submit a v2 of this series removing the Fixes tag
> from patches 4 and 5.
> 
> Regards,
> Mauro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ