linux-kernel - Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN windows to write to VRAM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c0a3ac65-e2e5-4b62-bc75-49b1599e160f@nvidia.com>
Date: Wed, 28 Jan 2026 10:27:07 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: Zhi Wang <zhiw@...dia.com>, linux-kernel@...r.kernel.org,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Jonathan Corbet <corbet@....net>, Alex Deucher <alexander.deucher@....com>,
 Christian Koenig <christian.koenig@....com>,
 Jani Nikula <jani.nikula@...ux.intel.com>,
 Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
 Rodrigo Vivi <rodrigo.vivi@...el.com>, Tvrtko Ursulin
 <tursulin@...ulin.net>, Huang Rui <ray.huang@....com>,
 Matthew Auld <matthew.auld@...el.com>,
 Matthew Brost <matthew.brost@...el.com>,
 Lucas De Marchi <lucas.demarchi@...el.com>,
 Thomas Hellstrom <thomas.hellstrom@...ux.intel.com>,
 Helge Deller <deller@....de>, Alice Ryhl <aliceryhl@...gle.com>,
 Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
 Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
 Bjorn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin
 <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
 Trevor Gross <tmgross@...ch.edu>, John Hubbard <jhubbard@...dia.com>,
 Alistair Popple <apopple@...dia.com>, Timur Tabi <ttabi@...dia.com>,
 Edwin Peer <epeer@...dia.com>, Alexandre Courbot <acourbot@...dia.com>,
 Andrea Righi <arighi@...dia.com>, Andy Ritger <aritger@...dia.com>,
 Alexey Ivanov <alexeyi@...dia.com>, Balbir Singh <balbirs@...dia.com>,
 Philipp Stanner <phasta@...nel.org>, Elle Rhumsaa
 <elle@...thered-steel.dev>, Daniel Almeida <daniel.almeida@...labora.com>,
 nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
 rust-for-linux@...r.kernel.org, linux-doc@...r.kernel.org,
 amd-gfx@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
 intel-xe@...ts.freedesktop.org, linux-fbdev@...r.kernel.org
Subject: Re: [PATCH RFC v6 05/26] nova-core: mm: Add support to use PRAMIN
 windows to write to VRAM



On 1/28/2026 7:04 AM, Danilo Krummrich wrote:
> On Fri Jan 23, 2026 at 12:16 AM CET, Joel Fernandes wrote:
>> My plan is to make TLB and PRAMIN use immutable references in their function
>> calls and then implement internal locking. I've already done this for the GPU
>> buddy functions, so it should be doable, and we'll keep it consistent. As a
>> result, we will have finer-grain locking on the memory management objects
>> instead of requiring to globally lock a common GpuMm object. I'll plan on
>> doing this for v7.
>>
>> Also, the PTE allocation race you mentioned is already handled by PRAMIN
>> serialization. Since threads must hold the PRAMIN lock to write page table
>> entries, concurrent writers are not possible:
>>
>>    Thread A: acquire PRAMIN lock
>>    Thread A: read PDE (via PRAMIN) -> NULL
>>    Thread A: alloc PT page, write PDE
>>    Thread A: release PRAMIN lock
>>
>>    Thread B: acquire PRAMIN lock
>>    Thread B: read PDE (via PRAMIN) -> sees A's pointer
>>    Thread B: uses existing PT page, no allocation needed
> 
> This won't work unfortunately.
> 
> We have to separate allocations and modifications of the page tabe. Or in other
> words, we must not allocate new PDEs or PTEs while holding the lock protecting
> the page table from modifications.

I will go over these concerns, just to clarify - do you mean forbidding 
*any* lock or do you mean only forbidding non-atomic locks? I believe we 
can avoid non-atomic locks completely - actually I just wrote a patch 
before I read this email to do just. If we are to forbid any locking at 
all, that might require some careful redesign to handle the above race 
afaics.

> 
> Once we have VM_BIND in nova-drm, we will have the situation that userspace
> passes jobs to modify the GPUs virtual address space and hence the page tables.

Thanks for listing all the concerns below, this is very valuable. I will 
go over all these and all cases before posting the v7 now that I have this.

--
Joel Fernandes


> Such a jobs has mainly three stages.
> 
>    (1) The submit stage.
> 
>        This is where the job is initialized, dependencies are set up and the
>        driver has to pre-allocate all kinds of structures that are required
>        throughout the subsequent stages of the job.
> 
>    (2) The run stage.
> 
>        This is the stage where the job is staged for execution and its DMA fence
>        has been made public (i.e. it is accessible by userspace).
> 
>        This is the stage where we are in the DMA fence signalling critical
>        section, hence we can't do any non-atomic allocations, since otherwise we
>        could deadlock in MMU notifier callbacks for instance.
> 
>        This is the stage where the page table is actually modified. Hence, we
>        can't acquire any locks that might be held elsewhere while doing
>        non-atomic allocations. Also note that this is transitive, e.g. if you
>        take lock A and somewhere else a lock B is taked while A is already held
>        and we do non-atomic allocations while holding B, then A can't be held in
>        the DMA fence signalling critical path either.
> 
>        It is also worth noting that this is the stage where we know the exact
>        operations we have to execute based on the VM_BIND request from userspace.
> 
>        For instance, in the submit stage we may only know that userspace wants
>        that we map a BO with a certain offset in the GPUs virtual address space
>        at [0x0, 0x1000000]. What we don't know is what exact operations this does
>        require, i.e. "What do we have to unmap first?", "Are there any
>        overlapping mappings that we have to truncate?", etc.
> 
>        So, we have to consider this when we pre-allocate in the submit stage.
> 
>    (3) The cleanup stage.
> 
>        This is where the job has been signaled and hence left the DMA fence
>        signalling critical section.
> 
>        In this stage the job is cleaned up, which includes freeing data that is
>        not required anymore, such as PTEs and PDEs.