[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM=9twm1x9rH==uoGQLYa8b4feQMz=Ne14WPuhCPy9_H1u5Tw@mail.gmail.com>
Date: Sat, 31 Jan 2026 13:00:58 +1000
From: Dave Airlie <airlied@...il.com>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: John Hubbard <jhubbard@...dia.com>, Danilo Krummrich <dakr@...nel.org>, Zhi Wang <zhiw@...dia.com>,
linux-kernel@...r.kernel.org,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, Simona Vetter <simona@...ll.ch>, Jonathan Corbet <corbet@....net>,
Alex Deucher <alexander.deucher@....com>, Christian Koenig <christian.koenig@....com>,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>, Rodrigo Vivi <rodrigo.vivi@...el.com>,
Tvrtko Ursulin <tursulin@...ulin.net>, Rui Huang <ray.huang@....com>,
Matthew Auld <matthew.auld@...el.com>, Matthew Brost <matthew.brost@...el.com>,
Lucas De Marchi <lucas.demarchi@...el.com>,
Thomas Hellstrom <thomas.hellstrom@...ux.intel.com>, Helge Deller <deller@....de>,
Alice Ryhl <aliceryhl@...gle.com>, Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>,
Gary Guo <gary@...yguo.net>, Bjorn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
Trevor Gross <tmgross@...ch.edu>, Alistair Popple <apopple@...dia.com>, Timur Tabi <ttabi@...dia.com>,
Edwin Peer <epeer@...dia.com>, Alexandre Courbot <acourbot@...dia.com>, Andrea Righi <arighi@...dia.com>,
Andy Ritger <aritger@...dia.com>, Alexey Ivanov <alexeyi@...dia.com>,
Balbir Singh <balbirs@...dia.com>, Philipp Stanner <phasta@...nel.org>,
Elle Rhumsaa <elle@...thered-steel.dev>, Daniel Almeida <daniel.almeida@...labora.com>,
nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
rust-for-linux@...r.kernel.org, linux-doc@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
intel-xe@...ts.freedesktop.org, linux-fbdev@...r.kernel.org
Subject: Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN
windows to write to VRAM
On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <joelagnelf@...dia.com> wrote:
>
>
>
> On 1/29/2026 10:38 PM, John Hubbard wrote:
> > On 1/29/26 5:59 PM, Joel Fernandes wrote:
> >> On 1/29/26 8:12 PM, John Hubbard wrote:
> >>> On 1/29/26 4:26 PM, Joel Fernandes wrote:
> >>>> Based on the below discussion and research, I came up with some deadlock
> >>>> scenarios that we need to handle in the v6 series of these patches.
> >>>> [...]
> >>>> memory allocations under locks that we need in the dma-fence signaling
> >>>> critical path (when doing the virtual memory map/unmap)
> >>>
> >>> unmap? Are you seeing any allocations happening during unmap? I don't
> >>> immediately see any, but that sounds surprising.
> >>
> >> Not allocations but we are acquiring locks during unmap. My understanding
> >> is (at least some) unmaps have to also be done in the dma fence signaling
> >> critical path (the run stage), but Danilo/you can correct me if I am wrong
> >> on that. We cannot avoid all locking but those same locks cannot be held in
> >> any other paths which do a memory allocation (as mentioned in one of the
> >> deadlock scenarios), that is probably the main thing to check for unmap.
> >>
> >
> > Right, OK we are on the same page now: no allocations happening on unmap,
> > but it can still deadlock, because the driver is typically going to
> > use a single lock to protect calls both map and unmap-related calls
> > to the buddy allocator.
>
> Yes exactly!
>
> >
> > For the deadlock above, I think a good way to break that deadlock is
> > to not allow taking that lock in a fence signaling calling path.
> >
> > So during an unmap, instead of "lock, unmap/free, unlock" it should
> > move the item to a deferred-free list, which is processed separately.
> > Of course, this is a little complex, because the allocation and reclaim
> > has to be aware of such lists if they get large.
> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks
> (whichever we take during map). The GPU buddy actually does GFP_KERNEL
> allocations internally which is problematic.
>
> Some solutions / next steps:
>
> 1. allocating (VRAM and system memory) outside mm locks just before acquiring them.
>
> 2. pre-allocating both VRAM and system memory needed, before the DMA fence
> critical paths (The issue is also to figure out how much memory to pre-allocate
> for the page table pages based on the VM_BIND request. I think we can analyze
> the page tables in the submit stage to make an estimate).
>
> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm
> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the
> allocate path. I am not sure what do yet about this. ISTR the maple tree also
> has similar issues.
>
> 4. Using non-reclaimable memory allocations where pre-allocation or
> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we
> don't fail allocations when memory is scarce).
>
> Will work on these issues for the v7. Thanks,
The way this works on nouveau at least (and I haven't yet read the
nova code in depth).
Is we have 4 stages of vmm page table mgmt.
ref - locked with a ref lock - can allocate/free memory - just makes
sure the page tables exist and are reference counted
map - locked with a map lock - cannot allocate memory - fill in the
PTEs in the page table
unmap - locked with a map lock - cannot allocate memory - removes
entries in PTEs
unref - locked with a ref lock - can allocate/free memory - just drops
references and frees (not sure if it ever merges).
So maps and unmaps can be in fence signalling paths, but unrefs are
done in free job from a workqueue.
Dave.
>
> --
> Joel Fernandes
>
Powered by blists - more mailing lists