lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <634b7879-055e-4d5c-aaa8-25f88bbdae75@nvidia.com>
Date: Sun, 2 Nov 2025 17:04:04 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Timur Tabi <ttabi@...dia.com>, "dakr@...nel.org" <dakr@...nel.org>
Cc: Alexandre Courbot <acourbot@...dia.com>,
 "lossin@...nel.org" <lossin@...nel.org>,
 "a.hindborg@...nel.org" <a.hindborg@...nel.org>,
 "boqun.feng@...il.com" <boqun.feng@...il.com>,
 "aliceryhl@...gle.com" <aliceryhl@...gle.com>, Zhi Wang <zhiw@...dia.com>,
 "simona@...ll.ch" <simona@...ll.ch>,
 "alex.gaynor@...il.com" <alex.gaynor@...il.com>,
 "ojeda@...nel.org" <ojeda@...nel.org>, "tmgross@...ch.edu"
 <tmgross@...ch.edu>,
 "nouveau@...ts.freedesktop.org" <nouveau@...ts.freedesktop.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "rust-for-linux@...r.kernel.org" <rust-for-linux@...r.kernel.org>,
 "bjorn3_gh@...tonmail.com" <bjorn3_gh@...tonmail.com>,
 Edwin Peer <epeer@...dia.com>, "airlied@...il.com" <airlied@...il.com>,
 Joel Fernandes <joelagnelf@...dia.com>,
 "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
 "gary@...yguo.net" <gary@...yguo.net>, Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH v4 3/3] gpu: nova-core: add boot42 support for next-gen
 GPUs

On 11/2/25 10:14 AM, Timur Tabi wrote:
> On Sat, 2025-11-01 at 18:36 -0700, John Hubbard wrote:
>> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
>> architecture and revision details, and will instead use NV_PMC_BOOT_42
>> in the future. NV_PMC_BOOT_0 will be zeroed out.
> 
> You missed this one.  Boot0 will not be completely zeroed out.
> 

Thanks for catching that, I'll write it like the other case.

>>
>>   
>> +impl TryFrom<regs::NV_PMC_BOOT_42> for Spec {
>> +    type Error = Error;
>> +
>> +    fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> {
>> +        Ok(Self {
>> +            chipset: boot42.chipset()?,
>> +            revision: boot42.revision(),
>> +        })
>> +    }
>> +}
>> +
>>   impl fmt::Display for Revision {
>>       fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>>           write!(f, "{:x}.{:x}", self.major, self.minor)
>> @@ -169,9 +180,34 @@ pub(crate) struct Spec {
>>   
>>   impl Spec {
>>       fn new(bar: &Bar0) -> Result<Spec> {
>> +        // Some brief notes about boot0 and boot42, in chronological order:
>> +        //
>> +        // NV04 through Volta:
>> +        //
>> +        //    Not supported by Nova. boot0 is necessary and sufficient to identify these
>> GPUs.
>> +        //    boot42 may not even exist on some of these GPUs.boot42
> 
> Did you intend to write more than just "boot42" at the end here?

Nope, that's just an odd typo fragment that I need to delete, thanks
for spotting it.

...
>>           let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>>   
>> -        Spec::try_from(boot0)
>> +        if boot0.use_boot42_instead() {
>> +            Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
>> +        } else {
>> +            Spec::try_from(boot0)
>> +        }
>>       }
>>   }
>>   
>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>> index 207b865335af..8b5ff3858210 100644
>> --- a/drivers/gpu/nova-core/regs.rs
>> +++ b/drivers/gpu/nova-core/regs.rs
>> @@ -25,6 +25,13 @@
>>   });
>>   
>>   impl NV_PMC_BOOT_0 {
>> +    pub(crate) fn use_boot42_instead(self) -> bool {
>> +        // "Future" GPUs (some time after Rubin) will set `architecture_0`
>> +        // to 0, and `architecture_1` to 1, and put the architecture details in
>> +        // boot42 instead.
>> +        self.architecture_0() == 0 && self.architecture_1() == 1
>> +    }
> 
> So this was the crux of my initial objection, and I just don't think this is truly "forward
> looking".  The code is using boot42 only if boot0 is "zeroed out".  So sometimes Nova will use

To put it another way: the code is only using boot42 if boot0 is
encoded, by the HW team, to go read boot42. As you know, the future
ref manual literally says "go read boot42."

> boot0 and sometimes it will use boot42, depending on the GPU.  It's this inconsistency that
> bothers me.
> 
> Instead, I think Nova should use only boot42, so that we have consistent information across all
> GPUs.  boot0 should only be used to avoid accidentally reading boot42 when it doesn't exist.

I am convinced that the most appropriate thing for a device driver
to do is to match what the HW configuration says. We should draw
the dividing line at the changeover point, which is in an upcoming
ref manual.

Once boot0 has the encoding set to "go read boot42", the driver
does that. Until then, HW promises that boot0 is correct.

It may look all nice and neat to use "Nova is a new driver" to
pick the point to change, but again, it's more accurate and
appropriate for a device driver to follow HW's lead, and use
what boot0 says to do.


> 
> Previously, Danilo said this:
> 
>> I think you're indeed talking about the same thing, but thinking differently
>> about the implementation details.
>>
>> A standalone is_ancient_gpu() function called from probe() like
>>
>> 	if is_ancient_gpu(bar) {
>> 		return Err(ENODEV);
>> 	}
>>
>> is what we would probably do in C, but in Rust we should just call
>>
>> 	let spec = Spec::new()?;
>>
>> from probe() and Spec::new() will return Err(ENODEV) when it run into an ancient
>> GPU spec internally.
> 
> This I agree with.  The first thing that Spec::new() should do is check whether we're on an
> ancient GPU that does not even have boot42.  If so, return Err(ENODEV).  Otherwise, from that
> point onward, no code will ever look at boot0 again.  boot0 should never be used to return the
> actual architecture/gpu information.
> 

I don't think we have a conflict on this point, if you read through how
the code works. The only difference is the point I wrote about above.

I'm hoping you'll allow me to proceed with that.

thanks,
-- 
John Hubbard


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ