lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200109145305.GV20978@mellanox.com>
Date:   Thu, 9 Jan 2020 14:53:09 +0000
From:   Jason Gunthorpe <jgg@...lanox.com>
To:     Ralph Campbell <rcampbell@...dia.com>
CC:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "nouveau@...ts.freedesktop.org" <nouveau@...ts.freedesktop.org>,
        Jerome Glisse <jglisse@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ben Skeggs <bskeggs@...hat.com>
Subject: Re: [BUG] nouveau lockdep splat

On Wed, Jan 08, 2020 at 05:16:40PM -0800, Ralph Campbell wrote:
> I hit this while testing HMM with nouveau on linux-5.5-rc5.
> I'm not a lockdep expert but my understanding of this is that an
> invalidation callback could potentially call kzalloc(GFP_KERNEL)
> which could cause another invalidation and recursively deadlock.
> Looking at the drivers/gpu/drm/nouveau/nvkm/ layer, I do see a
> number of places where GFP_KERNEL is used for allocations and I
> don't see an easy way to avoid that.

Not quite..

Any lock held by the invalidation callback becomes a lock where
GFP_KERNEL cannot be used within it's critical region.

Ie we can't have a notifier callback block on a lock which is held by
another thread which is blocked on GFP_KERNEL as we now risk
deadlocking on other mm locks if that allocation triggers reclaim.

AFAIK there is no fix from the core side. The driver must respect this
and be organized to deal with it. Daniel fixed the intel driver
already, I fixed RDMA recently, the other drivers must also be fixed.

Some choices
 - Split up the lock held by the notifier callback so it doesn't need
   to cover allocations
 - Use GFP_ATOMIC for allocations
 - Speculatively do allocations before obtaining the lock and free if
   they were not needed.

I suppose it will be some troublbe for nouveau, but it must be done
there..

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ