lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35d74754-ed0c-4f6d-817e-86638ca2bb70@nvidia.com>
Date: Fri, 31 Jan 2025 20:01:00 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>, <airlied@...il.com>,
	<simona@...ll.ch>, <corbet@....net>, <maarten.lankhorst@...ux.intel.com>,
	<mripard@...nel.org>, <tzimmermann@...e.de>, <ajanulgu@...hat.com>,
	<lyude@...hat.com>, <pstanner@...hat.com>, <zhiw@...dia.com>,
	<cjia@...dia.com>, <bskeggs@...dia.com>, <acurrid@...dia.com>
CC: <ojeda@...nel.org>, <alex.gaynor@...il.com>, <boqun.feng@...il.com>,
	<gary@...yguo.net>, <bjorn3_gh@...tonmail.com>, <benno.lossin@...ton.me>,
	<a.hindborg@...nel.org>, <aliceryhl@...gle.com>, <tmgross@...ch.edu>,
	<dri-devel@...ts.freedesktop.org>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <nouveau@...ts.freedesktop.org>,
	<rust-for-linux@...r.kernel.org>
Subject: Re: [PATCH 1/2] gpu: nova-core: add initial driver stub

On 1/31/25 2:04 PM, Danilo Krummrich wrote:
> Add the initial nova-core driver stub.
> 
> nova-core is intended to serve as a common base for nova-drm (the
> corresponding DRM driver) and the vGPU manager VFIO driver, serving as a
> hard- and firmware abstraction layer for GSP-based NVIDIA GPUs.
> 
> The Nova project, including nova-core and nova-drm, in the long term,
> is intended to serve as the successor of Nouveau for all GSP-based GPUs.
> 
> The motivation for both, starting a successor project for Nouveau and
> doing so using the Rust programming language, is documented in detail
> through a previous post on the mailing list [1], an LWN article [2] and a
> talk from LPC '24.
> 
> In order to avoid the chicken and egg problem to require a user to
> upstream Rust abstractions, but at the same time require the Rust
> abstractions to implement the driver, nova-core kicks off as a driver
> stub and is subsequently developed upstream.
> 
> Link: https://lore.kernel.org/dri-devel/Zfsj0_tb-0-tNrJy@cassiopeiae/T/#u [1]
> Link: https://lwn.net/Articles/990736/ [2]
> Link: https://youtu.be/3Igmx28B3BQ?si=sBdSEer4tAPKGpOs [3]
> Signed-off-by: Danilo Krummrich <dakr@...nel.org>
> ---
>   MAINTAINERS                        |  10 ++
>   drivers/gpu/Makefile               |   1 +
>   drivers/gpu/nova-core/Kconfig      |  13 +++
>   drivers/gpu/nova-core/Makefile     |   3 +
>   drivers/gpu/nova-core/driver.rs    |  47 ++++++++
>   drivers/gpu/nova-core/gpu.rs       | 171 +++++++++++++++++++++++++++++
>   drivers/gpu/nova-core/nova_core.rs |  14 +++
>   drivers/video/Kconfig              |   1 +
>   8 files changed, 260 insertions(+)
>   create mode 100644 drivers/gpu/nova-core/Kconfig
>   create mode 100644 drivers/gpu/nova-core/Makefile
>   create mode 100644 drivers/gpu/nova-core/driver.rs
>   create mode 100644 drivers/gpu/nova-core/gpu.rs
>   create mode 100644 drivers/gpu/nova-core/nova_core.rs

Hi Danilo,

This is pleasantly clean, and even elegant. I was pleasantly surprised at
the level of firmware loading support in Rust, and how this approach takes
advantage of our r535 firmware snapshot that is in Turing, Ampere, and Ada.

It loads up on my GA104 system and I've been poking around at it.

Some minor comments below, but this looks like a very good starting "stub"
to get merged.

> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d1086e53a317..f7ddca7de0ef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7446,6 +7446,16 @@ T:	git https://gitlab.freedesktop.org/drm/nouveau.git
>   F:	drivers/gpu/drm/nouveau/
>   F:	include/uapi/drm/nouveau_drm.h
>   
> +CORE DRIVER FOR NVIDIA GPUS [RUST]
> +M:	Danilo Krummrich <dakr@...nel.org>
> +L:	nouveau@...ts.freedesktop.org
> +S:	Supported
> +Q:	https://patchwork.freedesktop.org/project/nouveau/

Are you sure? I'm not sure how patchwork things work, but it seems
unfortunate to confuse Nova and nouveau here.

> +B:	https://gitlab.freedesktop.org/drm/nova/-/issues
> +C:	irc://irc.oftc.net/nouveau
> +T:	git https://gitlab.freedesktop.org/drm/nova.git nova-next
> +F:	drivers/gpu/nova-core/
> +
>   DRM DRIVER FOR OLIMEX LCD-OLINUXINO PANELS
>   M:	Stefan Mavrodiev <stefan@...mex.com>
>   S:	Maintained
> diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
> index 8997f0096545..36a54d456630 100644
> --- a/drivers/gpu/Makefile
> +++ b/drivers/gpu/Makefile
> @@ -5,3 +5,4 @@
>   obj-y			+= host1x/ drm/ vga/
>   obj-$(CONFIG_IMX_IPUV3_CORE)	+= ipu-v3/
>   obj-$(CONFIG_TRACE_GPU_MEM)		+= trace/
> +obj-$(CONFIG_NOVA_CORE)		+= nova-core/
> diff --git a/drivers/gpu/nova-core/Kconfig b/drivers/gpu/nova-core/Kconfig
> new file mode 100644
> index 000000000000..33ac937b244a
> --- /dev/null
> +++ b/drivers/gpu/nova-core/Kconfig
> @@ -0,0 +1,13 @@
> +config NOVA_CORE
> +	tristate "Nova Core GPU driver"
> +	depends on PCI
> +	depends on RUST
> +	depends on RUST_FW_LOADER_ABSTRACTIONS
> +	default n
> +	help
> +	  Choose this if you want to build the Nova Core driver for Nvidia
> +	  GSP-based GPUs.

Maybe a little note about what "GSP" is and how you know if you have it,
would help. Turing and later architectures have GSP firmware. All the
user might know is the GPU architecture, I wouldn't expect the user
to know if it has a "GSP".

> +
> +	  This driver is work in progress and may not be functional.
> +
> +	  If M is selected, the module will be called nova-core.

Or nova_core? I realize the driver core translates between "-" and "_",
just trying to be consistent.

It does show up as /sys/module/nova_core .


> diff --git a/drivers/gpu/nova-core/Makefile b/drivers/gpu/nova-core/Makefile
> new file mode 100644
> index 000000000000..2d78c50126e1
> --- /dev/null
> +++ b/drivers/gpu/nova-core/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_NOVA_CORE) += nova_core.o
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> new file mode 100644
> index 000000000000..2a2aa9b0630b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -0,0 +1,47 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use kernel::{bindings, c_str, pci, prelude::*};
> +
> +use crate::gpu::Gpu;
> +
> +#[pin_data]
> +pub(crate) struct NovaCore {
> +    #[pin]
> +    pub(crate) gpu: Gpu,
> +}
> +
> +const BAR0_SIZE: usize = 8;
> +pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
> +
> +kernel::pci_device_table!(
> +    PCI_TABLE,
> +    MODULE_PCI_TABLE,
> +    <NovaCore as pci::Driver>::IdInfo,
> +    [(
> +        pci::DeviceId::from_id(bindings::PCI_VENDOR_ID_NVIDIA, bindings::PCI_ANY_ID as _),
> +        ()
> +    )]
> +);
> +
> +impl pci::Driver for NovaCore {
> +    type IdInfo = ();
> +    const ID_TABLE: pci::IdTable<Self::IdInfo> = &PCI_TABLE;
> +
> +    fn probe(pdev: &mut pci::Device, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> {
> +        dev_dbg!(pdev.as_ref(), "Probe Nova Core GPU driver.\n");
> +
> +        pdev.enable_device_mem()?;
> +        pdev.set_master();
> +
> +        let bar = pdev.iomap_region_sized::<BAR0_SIZE>(0, c_str!("nova-core"))?;

Another question about whether it should be nova-core or nova_core.

> +
> +        let this = KBox::pin_init(
> +            try_pin_init!(Self {
> +                gpu <- Gpu::new(pdev, bar)?,
> +            }),
> +            GFP_KERNEL,
> +        )?;
> +
> +        Ok(this)
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> new file mode 100644
> index 000000000000..cf62390e72eb
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use kernel::{
> +    device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::CString,
> +};
> +
> +use crate::driver::Bar0;
> +use core::fmt::Debug;
> +
> +/// Enum representation of the GPU chipset.
> +#[derive(Debug)]
> +pub(crate) enum Chipset {
> +    TU102 = 0x162,
> +    TU104 = 0x164,
> +    TU106 = 0x166,
> +    TU117 = 0x167,
> +    TU116 = 0x168,
> +    GA102 = 0x172,
> +    GA103 = 0x173,
> +    GA104 = 0x174,
> +    GA106 = 0x176,
> +    GA107 = 0x177,
> +    AD102 = 0x192,
> +    AD103 = 0x193,
> +    AD104 = 0x194,
> +    AD106 = 0x196,
> +    AD107 = 0x197,
> +}
> +
> +/// Enum representation of the GPU generation.
> +#[derive(Debug)]
> +pub(crate) enum CardType {
> +    /// Turing
> +    TU100 = 0x160,
> +    /// Ampere
> +    GA100 = 0x170,
> +    /// Ada Lovelace
> +    AD100 = 0x190,
> +}
> +
> +/// Structure holding the metadata of the GPU.
> +#[allow(dead_code)]
> +pub(crate) struct GpuSpec {
> +    /// Contents of the boot0 register.
> +    boot0: u64,

It is redundant to store boot0, when all of the following fields
are deduced from boot0.

> +    card_type: CardType,
> +    chipset: Chipset,
> +    /// The revision of the chipset.
> +    chiprev: u8,
> +}
> +
> +/// Structure encapsulating the firmware blobs required for the GPU to operate.
> +#[allow(dead_code)]
> +pub(crate) struct Firmware {
> +    booter_load: firmware::Firmware,
> +    booter_unload: firmware::Firmware,
> +    gsp: firmware::Firmware,
> +}
> +
> +/// Structure holding the resources required to operate the GPU.
> +#[allow(dead_code)]
> +#[pin_data]
> +pub(crate) struct Gpu {
> +    spec: GpuSpec,
> +    /// MMIO mapping of PCI BAR 0
> +    bar: Devres<Bar0>,
> +    fw: Firmware,
> +}
> +
> +// TODO replace with something like derive(FromPrimitive)
> +impl Chipset {
> +    fn from_u32(value: u32) -> Option<Chipset> {
> +        match value {
> +            0x162 => Some(Chipset::TU102),
> +            0x164 => Some(Chipset::TU104),
> +            0x166 => Some(Chipset::TU106),
> +            0x167 => Some(Chipset::TU117),
> +            0x168 => Some(Chipset::TU116),
> +            0x172 => Some(Chipset::GA102),
> +            0x173 => Some(Chipset::GA103),
> +            0x174 => Some(Chipset::GA104),
> +            0x176 => Some(Chipset::GA106),
> +            0x177 => Some(Chipset::GA107),
> +            0x192 => Some(Chipset::AD102),
> +            0x193 => Some(Chipset::AD103),
> +            0x194 => Some(Chipset::AD104),
> +            0x196 => Some(Chipset::AD106),
> +            0x197 => Some(Chipset::AD107),
> +            _ => None,
> +        }
> +    }
> +}
> +
> +// TODO replace with something like derive(FromPrimitive)
> +impl CardType {
> +    fn from_u32(value: u32) -> Option<CardType> {
> +        match value {
> +            0x160 => Some(CardType::TU100),
> +            0x170 => Some(CardType::GA100),
> +            0x190 => Some(CardType::AD100),

Is this how nouveau does it too? I mean, classifying cards as GA100,
and variants as TU102. It feels wrong to me, because we have for example
GA100 GPUs. I mean, GA100 is the same kind of thing as a GA102: each is
a GPU.

If I were naming card types, I'd calling them by their architecture names:
Turing, Ampere, Ada.

> +            _ => None,
> +        }
> +    }
> +}
> +
> +impl GpuSpec {
> +    fn new(bar: &Devres<Bar0>) -> Result<GpuSpec> {
> +        let bar = bar.try_access().ok_or(ENXIO)?;
> +        let boot0 = u64::from_le(bar.readq(0));
> +        let chip = ((boot0 & 0x1ff00000) >> 20) as u32;
> +
> +        if boot0 & 0x1f000000 == 0 {
> +            return Err(ENODEV);
> +        }
> +
> +        let Some(chipset) = Chipset::from_u32(chip) else {
> +            return Err(ENODEV);
> +        };
> +
> +        let Some(card_type) = CardType::from_u32(chip & 0x1f0) else {
> +            return Err(ENODEV);
> +        };
> +
> +        Ok(Self {
> +            boot0,
> +            card_type,
> +            chipset,
> +            chiprev: (boot0 & 0xff) as u8,
> +        })
> +    }
> +}
> +
> +impl Firmware {
> +    fn new(dev: &device::Device, spec: &GpuSpec, ver: &str) -> Result<Firmware> {
> +        let mut chip_name = CString::try_from_fmt(fmt!("{:?}", spec.chipset))?;
> +        chip_name.make_ascii_lowercase();
> +
> +        let fw_booter_load_path =
> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_load-{}.bin", &*chip_name, ver))?;
> +        let fw_booter_unload_path =
> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_unload-{}.bin", &*chip_name, ver))?;
> +        let fw_gsp_path =
> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/gsp-{}.bin", &*chip_name, ver))?;
> +
> +        let booter_load = firmware::Firmware::request(&fw_booter_load_path, dev)?;
> +        let booter_unload = firmware::Firmware::request(&fw_booter_unload_path, dev)?;
> +        let gsp = firmware::Firmware::request(&fw_gsp_path, dev)?;
> +
> +        Ok(Firmware {
> +            booter_load,
> +            booter_unload,
> +            gsp,
> +        })
> +    }
> +}
> +
> +impl Gpu {
> +    pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
> +        let spec = GpuSpec::new(&bar)?;
> +        let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;

lol there it is: our one, "stable" set of GSP firmware. Maybe a one line comment
above might be appropriate, to mention that this is hardcoded, but new firmware
versions will not be. On the other hand, that's obvious. :)

> +
> +        dev_info!(
> +            pdev.as_ref(),
> +            "NVIDIA {:?} ({:#x})",
> +            spec.chipset,
> +            spec.boot0
> +        );
> +
> +        Ok(pin_init!(Self { spec, bar, fw }))
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> new file mode 100644
> index 000000000000..b130d9ca6a0f
> --- /dev/null
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -0,0 +1,14 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Nova Core GPU Driver
> +
> +mod driver;
> +mod gpu;
> +
> +kernel::module_pci_driver! {
> +    type: driver::NovaCore,
> +    name: "NovaCore",
> +    author: "Danilo Krummrich",
> +    description: "Nova Core GPU driver",
> +    license: "GPL v2",
> +}
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index 44c9ef1435a2..5df981920a94 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -39,6 +39,7 @@ source "drivers/gpu/vga/Kconfig"
>   
>   source "drivers/gpu/host1x/Kconfig"
>   source "drivers/gpu/ipu-v3/Kconfig"
> +source "drivers/gpu/nova-core/Kconfig"
>   
>   source "drivers/gpu/drm/Kconfig"
>   
> 
> base-commit: 69b8923f5003664e3ffef102e73333edfa2abdcf

I'm always grateful when anyone uses "git format-patch --base", it makes
life simpler.


thanks,
-- 
John Hubbard


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ