[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D7LF554L2J0N.JRPHDUCHVKP3@nvidia.com>
Date: Thu, 06 Feb 2025 23:05:37 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>, <airlied@...il.com>,
<simona@...ll.ch>, <corbet@....net>, <maarten.lankhorst@...ux.intel.com>,
<mripard@...nel.org>, <tzimmermann@...e.de>, <ajanulgu@...hat.com>,
<lyude@...hat.com>, <pstanner@...hat.com>, <zhiw@...dia.com>,
<cjia@...dia.com>, <jhubbard@...dia.com>, <bskeggs@...dia.com>,
<acurrid@...dia.com>
Cc: <ojeda@...nel.org>, <alex.gaynor@...il.com>, <boqun.feng@...il.com>,
<gary@...yguo.net>, <bjorn3_gh@...tonmail.com>, <benno.lossin@...ton.me>,
<a.hindborg@...nel.org>, <aliceryhl@...gle.com>, <tmgross@...ch.edu>,
<dri-devel@...ts.freedesktop.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <nouveau@...ts.freedesktop.org>,
<rust-for-linux@...r.kernel.org>, "Nouveau"
<nouveau-bounces@...ts.freedesktop.org>
Subject: Re: [PATCH v2 1/2] gpu: nova-core: add initial driver stub
Hi Danilo,
Here are few comments - or maybe I should say nits, as they are really
minor. Note that all my experience in Rust is from user-space, so feel
free to ignore anything that does not make sense in the context of the
kernel or is too pedantic.
On Wed Feb 5, 2025 at 4:03 AM JST, Danilo Krummrich wrote:
> Add the initial nova-core driver stub.
>
> nova-core is intended to serve as a common base for nova-drm (the
> corresponding DRM driver) and the vGPU manager VFIO driver, serving as a
> hard- and firmware abstraction layer for GSP-based NVIDIA GPUs.
>
> The Nova project, including nova-core and nova-drm, in the long term,
> is intended to serve as the successor of Nouveau for all GSP-based GPUs.
>
> The motivation for both, starting a successor project for Nouveau and
> doing so using the Rust programming language, is documented in detail
> through a previous post on the mailing list [1], an LWN article [2] and a
> talk from LPC '24.
>
> In order to avoid the chicken and egg problem to require a user to
> upstream Rust abstractions, but at the same time require the Rust
> abstractions to implement the driver, nova-core kicks off as a driver
> stub and is subsequently developed upstream.
>
> Link: https://lore.kernel.org/dri-devel/Zfsj0_tb-0-tNrJy@cassiopeiae/T/#u [1]
> Link: https://lwn.net/Articles/990736/ [2]
> Link: https://youtu.be/3Igmx28B3BQ?si=sBdSEer4tAPKGpOs [3]
> Signed-off-by: Danilo Krummrich <dakr@...nel.org>
> ---
> Changes in v2:
> - Fix module name in Kconfig description. (John)
> - Expand Kconfig description a bit. (John)
> - Expand name for PCI BAR0 region.
> - Do not store / print boot0 raw register value. (John)
> - Rename CardType to Architecture, rename enum names to represent the
> architecture name and adjust enum values according to the register
> definition. (John)
> - Add an abstraction for register accesses.
> - Print chipset, architecture and revision.
> - Load bootloader firmware. (Timur)
> ---
<snip>
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> new file mode 100644
> index 000000000000..be260a8ffe46
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -0,0 +1,187 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use kernel::{
> + device, devres::Devres, error::code::*, firmware, fmt, pci, prelude::*, str::CString,
> +};
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::fmt;
> +
> +/// Enum representation of the GPU chipset.
> +#[derive(fmt::Debug)]
I suspect you will eventually want to also derive Copy and Clone, as
well as PartialEq and Eq (so the assigned values can be used), but it's
of course fine to postpone this until we actually need them.
Note that the usage made of Debug suggests that you actually want
Display - but I understand that implementing Display would be more
cumbersome.
> +pub(crate) enum Chipset {
> + TU102 = 0x162,
> + TU104 = 0x164,
> + TU106 = 0x166,
> + TU117 = 0x167,
> + TU116 = 0x168,
> + GA102 = 0x172,
> + GA103 = 0x173,
> + GA104 = 0x174,
> + GA106 = 0x176,
> + GA107 = 0x177,
> + AD102 = 0x192,
> + AD103 = 0x193,
> + AD104 = 0x194,
> + AD106 = 0x196,
> + AD107 = 0x197,
> +}
> +
> +/// Enum representation of the GPU generation.
> +#[derive(fmt::Debug)]
> +pub(crate) enum Architecture {
> + Turing = 0x16,
> + Ampere = 0x17,
> + Ada = 0x19,
> +}
> +
> +pub(crate) struct Revision {
> + major: u8,
> + minor: u8,
> +}
> +
> +/// Structure holding the metadata of the GPU.
> +pub(crate) struct Spec {
> + chipset: Chipset,
> + arch: Architecture,
> + /// The revision of the chipset.
> + revision: Revision,
> +}
> +
> +/// Structure encapsulating the firmware blobs required for the GPU to operate.
> +#[allow(dead_code)]
> +pub(crate) struct Firmware {
> + booter_load: firmware::Firmware,
> + booter_unload: firmware::Firmware,
> + bootloader: firmware::Firmware,
> + gsp: firmware::Firmware,
> +}
> +
> +/// Structure holding the resources required to operate the GPU.
> +#[allow(dead_code)]
> +#[pin_data]
> +pub(crate) struct Gpu {
> + spec: Spec,
> + /// MMIO mapping of PCI BAR 0
> + bar: Devres<Bar0>,
> + fw: Firmware,
> +}
> +
> +// TODO replace with something like derive(FromPrimitive)
> +impl Chipset {
> + fn from_u32(value: u32) -> Option<Chipset> {
> + match value {
> + 0x162 => Some(Chipset::TU102),
> + 0x164 => Some(Chipset::TU104),
> + 0x166 => Some(Chipset::TU106),
> + 0x167 => Some(Chipset::TU117),
> + 0x168 => Some(Chipset::TU116),
> + 0x172 => Some(Chipset::GA102),
> + 0x173 => Some(Chipset::GA103),
> + 0x174 => Some(Chipset::GA104),
> + 0x176 => Some(Chipset::GA106),
> + 0x177 => Some(Chipset::GA107),
> + 0x192 => Some(Chipset::AD102),
> + 0x193 => Some(Chipset::AD103),
> + 0x194 => Some(Chipset::AD104),
> + 0x196 => Some(Chipset::AD106),
> + 0x197 => Some(Chipset::AD107),
> + _ => None,
> + }
> + }
> +}
Shouldn't this be an implementation of TryFrom<u32>? By doing so you can
return ENODEV as the error and simplify the caller code below.
> +
> +// TODO:
> +// - replace with something like derive(FromPrimitive)
> +// - consider to store within Chipset, if arbitrary_enum_discriminant becomes stable
> +impl Architecture {
> + fn from_u32(value: u32) -> Option<Architecture> {
> + match value {
> + 0x16 => Some(Architecture::Turing),
> + 0x17 => Some(Architecture::Ampere),
> + 0x19 => Some(Architecture::Ada),
> + _ => None,
> + }
> + }
> +}
> +
> +impl Revision {
> + fn new(major: u8, minor: u8) -> Self {
> + Self { major, minor }
> + }
Suggestion: add a version that takes a Boot0 as argument and call the
right methods directly in the method instead of relying on the caller to
do that for us, e.g:
fn from_boot0(boot0: ®s::Boot0) -> Self {
Self::new(boot0.major_rev(), boot0.minor_rev())
}
Then new() can also be removed if Boot0 is the only sensible source of
Revision.
(I'd argue that Boot0 should also implement Copy, that way this method
can take it by value directly)
> +}
> +
> +impl fmt::Display for Revision {
> + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> + write!(f, "{:x}.{:x}", self.major, self.minor)
> + }
> +}
> +
> +impl Spec {
> + fn new(bar: &Devres<Bar0>) -> Result<Spec> {
> + let bar = bar.try_access().ok_or(ENXIO)?;
> + let boot0 = regs::Boot0::read(&bar);
> +
> + let Some(chipset) = Chipset::from_u32(boot0.chipset()) else {
> + return Err(ENODEV);
> + };
> +
> + let Some(arch) = Architecture::from_u32(boot0.arch()) else {
> + return Err(ENODEV);
> + };
Technically the Architecture is already known if the Chipset has been
built successfully, so there should be no need to build it again (and
test for a failure that cannot happen at this point).
Since the architecture information is already embedded in Chipset, maybe
we can have an arch() method there?
Something like:
impl Chipset {
pub(crate) fn arch(self) -> Architecture {
match self as u32 & !0xf {
0x160 => Architecture::Turing,
0x170 => Architecture::Ampere,
0x190 => Architecture::Ada,
_ => unreachable!(),
}
}
}
This would also enable us to remove Architecture::from_u32() and
Spec::arch, which is redundant with Spec::chipset anyway.
A better (but more verbose) implementation of Chipset::arch() might be
to match every possible variant, so we get a build error if we forget to
handle a new chipset instead of hitting the unreachable!() at runtime...
> +
> + let revision = Revision::new(boot0.major_rev(), boot0.minor_rev());
> +
> + Ok(Self {
> + arch,
> + chipset,
> + revision,
> + })
> + }
> +}
> +
> +impl Firmware {
> + fn new(dev: &device::Device, spec: &Spec, ver: &str) -> Result<Firmware> {
> + let mut chip_name = CString::try_from_fmt(fmt!("{:?}", spec.chipset))?;
> + chip_name.make_ascii_lowercase();
> +
> + let fw_booter_load_path =
> + CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_load-{}.bin", &*chip_name, ver))?;
> + let fw_booter_unload_path =
> + CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_unload-{}.bin", &*chip_name, ver))?;
> + let fw_bootloader_path =
> + CString::try_from_fmt(fmt!("nvidia/{}/gsp/bootloader-{}.bin", &*chip_name, ver))?;
> + let fw_gsp_path =
> + CString::try_from_fmt(fmt!("nvidia/{}/gsp/gsp-{}.bin", &*chip_name, ver))?;
> +
> + let booter_load = firmware::Firmware::request(&fw_booter_load_path, dev)?;
> + let booter_unload = firmware::Firmware::request(&fw_booter_unload_path, dev)?;
> + let bootloader = firmware::Firmware::request(&fw_bootloader_path, dev)?;
> + let gsp = firmware::Firmware::request(&fw_gsp_path, dev)?;
> +
> + Ok(Firmware {
> + booter_load,
> + booter_unload,
> + bootloader,
> + gsp,
> + })
This looks like a good opportunity to use a closure and avoid
repeating the code:
let request_fw = |type_| {
CString::try_from_fmt(fmt!("nvidia/{}/gsp/{}-{}.bin", type_, &*chip_name, ver))
.and_then(|path| firmware::Firmware::request(&path, dev))
};
It is also short enough that you can directly invoke it when building
the Firmware object, without using temporary variables:
Ok(Firmware {
booter_load: request_fw("booter_load")?,
booter_unload: request_fw("booter_unload")?,
bootloader: request_fw("bootloader")?,
gsp: request_fw("gsp")?,
})
IMHO this has the benefit of being more concise and keeping related
operations closer.
Thanks!
Alex.
Powered by blists - more mailing lists