[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c6df1aa00dab5b7e2a43c952180fc74e40f146a.camel@nvidia.com>
Date: Sun, 2 Nov 2025 18:14:30 +0000
From: Timur Tabi <ttabi@...dia.com>
To: "dakr@...nel.org" <dakr@...nel.org>, John Hubbard <jhubbard@...dia.com>
CC: Alexandre Courbot <acourbot@...dia.com>, "lossin@...nel.org"
	<lossin@...nel.org>, "a.hindborg@...nel.org" <a.hindborg@...nel.org>,
	"boqun.feng@...il.com" <boqun.feng@...il.com>, "aliceryhl@...gle.com"
	<aliceryhl@...gle.com>, Zhi Wang <zhiw@...dia.com>, "simona@...ll.ch"
	<simona@...ll.ch>, "alex.gaynor@...il.com" <alex.gaynor@...il.com>,
	"ojeda@...nel.org" <ojeda@...nel.org>, "tmgross@...ch.edu"
	<tmgross@...ch.edu>, "nouveau@...ts.freedesktop.org"
	<nouveau@...ts.freedesktop.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "rust-for-linux@...r.kernel.org"
	<rust-for-linux@...r.kernel.org>, "bjorn3_gh@...tonmail.com"
	<bjorn3_gh@...tonmail.com>, Edwin Peer <epeer@...dia.com>,
	"airlied@...il.com" <airlied@...il.com>, Joel Fernandes
	<joelagnelf@...dia.com>, "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
	"gary@...yguo.net" <gary@...yguo.net>, Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH v4 3/3] gpu: nova-core: add boot42 support for next-gen
 GPUs
On Sat, 2025-11-01 at 18:36 -0700, John Hubbard wrote:
> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
> architecture and revision details, and will instead use NV_PMC_BOOT_42
> in the future. NV_PMC_BOOT_0 will be zeroed out.
You missed this one.  Boot0 will not be completely zeroed out.
> 
>  
> +impl TryFrom<regs::NV_PMC_BOOT_42> for Spec {
> +    type Error = Error;
> +
> +    fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> {
> +        Ok(Self {
> +            chipset: boot42.chipset()?,
> +            revision: boot42.revision(),
> +        })
> +    }
> +}
> +
>  impl fmt::Display for Revision {
>      fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>          write!(f, "{:x}.{:x}", self.major, self.minor)
> @@ -169,9 +180,34 @@ pub(crate) struct Spec {
>  
>  impl Spec {
>      fn new(bar: &Bar0) -> Result<Spec> {
> +        // Some brief notes about boot0 and boot42, in chronological order:
> +        //
> +        // NV04 through Volta:
> +        //
> +        //    Not supported by Nova. boot0 is necessary and sufficient to identify these
> GPUs.
> +        //    boot42 may not even exist on some of these GPUs.boot42
Did you intend to write more than just "boot42" at the end here?
> +        //
> +        // Turing through Blackwell:
> +        //
> +        //     Supported by both Nouveau and Nova. boot0 is still necessary and sufficient to
> +        //     identify these GPUs. boot42 exists on these GPUs but we don't need to use it.
> +        //
> +        // Rubin:
> +        //
> +        //     Only supported by Nova. Need to use boot42 to fully identify these GPUs.
> +        //
> +        // "Future" (after Rubin) GPUs:
> +        //
> +        //    Only supported by Nova. NV_PMC_BOOT's ARCH_0 (bits 28:24) will be zeroed out,
> and
> +        //    ARCH_1 (bit 8:8) will be set to 1, which will mean, "refer to NV_PMC_BOOT_42".
> +
>          let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>  
> -        Spec::try_from(boot0)
> +        if boot0.use_boot42_instead() {
> +            Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
> +        } else {
> +            Spec::try_from(boot0)
> +        }
>      }
>  }
>  
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 207b865335af..8b5ff3858210 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -25,6 +25,13 @@
>  });
>  
>  impl NV_PMC_BOOT_0 {
> +    pub(crate) fn use_boot42_instead(self) -> bool {
> +        // "Future" GPUs (some time after Rubin) will set `architecture_0`
> +        // to 0, and `architecture_1` to 1, and put the architecture details in
> +        // boot42 instead.
> +        self.architecture_0() == 0 && self.architecture_1() == 1
> +    }
So this was the crux of my initial objection, and I just don't think this is truly "forward
looking".  The code is using boot42 only if boot0 is "zeroed out".  So sometimes Nova will use
boot0 and sometimes it will use boot42, depending on the GPU.  It's this inconsistency that
bothers me.
Instead, I think Nova should use only boot42, so that we have consistent information across all
GPUs.  boot0 should only be used to avoid accidentally reading boot42 when it doesn't exist.
Previously, Danilo said this:
> I think you're indeed talking about the same thing, but thinking differently
> about the implementation details.
> 
> A standalone is_ancient_gpu() function called from probe() like
> 
> 	if is_ancient_gpu(bar) {
> 		return Err(ENODEV);
> 	}
> 
> is what we would probably do in C, but in Rust we should just call
> 
> 	let spec = Spec::new()?;
> 
> from probe() and Spec::new() will return Err(ENODEV) when it run into an ancient
> GPU spec internally.
This I agree with.  The first thing that Spec::new() should do is check whether we're on an
ancient GPU that does not even have boot42.  If so, return Err(ENODEV).  Otherwise, from that
point onward, no code will ever look at boot0 again.  boot0 should never be used to return the
actual architecture/gpu information.
Powered by blists - more mailing lists