[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260123121343.396bc4cd.zhiw@nvidia.com>
Date: Fri, 23 Jan 2026 12:13:43 +0200
From: Zhi Wang <zhiw@...dia.com>
To: Joel Fernandes <joelagnelf@...dia.com>
CC: <linux-kernel@...r.kernel.org>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Simona Vetter <simona@...ll.ch>, Jonathan Corbet <corbet@....net>, "Alex
Deucher" <alexander.deucher@....com>, Christian Koenig
<christian.koenig@....com>, Jani Nikula <jani.nikula@...ux.intel.com>,
"Joonas Lahtinen" <joonas.lahtinen@...ux.intel.com>, Rodrigo Vivi
<rodrigo.vivi@...el.com>, Tvrtko Ursulin <tursulin@...ulin.net>, Huang Rui
<ray.huang@....com>, Matthew Auld <matthew.auld@...el.com>, Matthew Brost
<matthew.brost@...el.com>, Lucas De Marchi <lucas.demarchi@...el.com>,
"Thomas Hellstrom" <thomas.hellstrom@...ux.intel.com>, Helge Deller
<deller@....de>, Danilo Krummrich <dakr@...nel.org>, Alice Ryhl
<aliceryhl@...gle.com>, "Miguel Ojeda" <ojeda@...nel.org>, Alex Gaynor
<alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo
<gary@...yguo.net>, Bjorn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin
<lossin@...nel.org>, "Andreas Hindborg" <a.hindborg@...nel.org>, Trevor Gross
<tmgross@...ch.edu>, "Alistair Popple" <apopple@...dia.com>, Alexandre
Courbot <acourbot@...dia.com>, "Andrea Righi" <arighi@...dia.com>, Alexey
Ivanov <alexeyi@...dia.com>, "Philipp Stanner" <phasta@...nel.org>, Elle
Rhumsaa <elle@...thered-steel.dev>, "Daniel Almeida"
<daniel.almeida@...labora.com>, <nouveau@...ts.freedesktop.org>,
<dri-devel@...ts.freedesktop.org>, <rust-for-linux@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <amd-gfx@...ts.freedesktop.org>,
<intel-gfx@...ts.freedesktop.org>, <intel-xe@...ts.freedesktop.org>,
<linux-fbdev@...r.kernel.org>
Subject: Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN
windows to write to VRAM
On Thu, 22 Jan 2026 18:16:00 -0500
Joel Fernandes <joelagnelf@...dia.com> wrote:
> On Wed, 21 Jan 2026 12:52:10 -0500, Joel Fernandes wrote:
> > I think we can incrementally build on this series to add support for
> > the same, it is not something this series directly addresses since I
> > have spend majority of my time last several months making translation
> > *work* which is itself no east task. This series is just preliminary
> > based on work from last several months and to make BAR1 work. For
> > instance, I kept PRAMIN simple based on feedback that we don't want to
> > over complicate without fully understanding all the requirements.
> > There is also additional requirements for locking design that have
> > implications with DMA fencing etc, for instance.
> >
> > Anyway thinking out loud, I am thinking for handling concurrency at
> > the page table entry level (if we ever need it), we could use per-PT
> > spinlocks similar to the Linux kernel. But lets plan on how to do this
> > properly and based on actual requirements.
>
> Thanks for the discussion on concurrency, Zhi.
>
> My plan is to make TLB and PRAMIN use immutable references in their
> function calls and then implement internal locking. I've already done
> this for the GPU buddy functions, so it should be doable, and we'll keep
> it consistent. As a result, we will have finer-grain locking on the
> memory management objects instead of requiring to globally lock a common
> GpuMm object. I'll plan on doing this for v7.
>
> Also, the PTE allocation race you mentioned is already handled by PRAMIN
> serialization. Since threads must hold the PRAMIN lock to write page
> table entries, concurrent writers are not possible:
>
> Thread A: acquire PRAMIN lock
> Thread A: read PDE (via PRAMIN) -> NULL
> Thread A: alloc PT page, write PDE
> Thread A: release PRAMIN lock
>
> Thread B: acquire PRAMIN lock
> Thread B: read PDE (via PRAMIN) -> sees A's pointer
> Thread B: uses existing PT page, no allocation needed
>
> No atomic compare-and-swap on VRAM is needed because the PRAMIN lock
> serializes access. Please let me know if you had a different scenario in
> mind, but I think this covers it.
>
> Zhi, feel free to use v6 though for any testing you are doing while I
> rework the locking.
>
Hi Joel:
Thanks so much for the work and the discussion. It is super important
efforts for me to move on for the vGPU work. :)
As we discussed, the concurrency matters most when booting multiple vGPUs.
At that time, the concurrency happens at:
1) Allocating GPU memory chunks
2) Reserving GPU channels
3) Mapping GPU memory to BAR1 page table
We basically need kinda protection there. E.g. Guard/Access on immutable
references, which is backed by the mutex. I believe there shouldn't be a
non-sleepible path reaching those. This should be fine.
I can see you are thinking of fine-granularity locking scheme, which I
think is the right direction to go. I agreed with the above two locks.
For 1), I can recall that you mentioned there is some lock protection
already there.
For 2), We can think of it when reaching there.
However for 3), We need to have one there as well beside the above two
locks. Have you already had one in the GPU VA allocator?
If yes, the above two locks should be good enough so far. IMO.
Z.
Powered by blists - more mailing lists