[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9977ad2e-ce2d-48b5-a222-f74a821abfeb@nvidia.com>
Date: Wed, 30 Apr 2025 10:38:11 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: Alexandre Courbot <acourbot@...dia.com>,
Danilo Krummrich <dakr@...nel.org>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>,
Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>,
Trevor Gross <tmgross@...ch.edu>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Jonathan Corbet <corbet@....net>, John Hubbard <jhubbard@...dia.com>,
Ben Skeggs <bskeggs@...dia.com>, Timur Tabi <ttabi@...dia.com>,
Alistair Popple <apopple@...dia.com>, linux-kernel@...r.kernel.org,
rust-for-linux@...r.kernel.org, nouveau@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions and
base code
On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
> Hi Danilo,
>
> On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
>> This patch could probably split up a bit, to make it more pleasant to review. :)
>
> Probably yes. I thought since it is mostly new files, splitting up
> wouldn't change much. Let me see what I can do.
>
>>
>> On Sun, Apr 20, 2025 at 09:19:43PM +0900, Alexandre Courbot wrote:
>>>
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone)]
>>> +pub(crate) enum FalconSecurityModel {
>>> + #[default]
>>> + None = 0,
>>> + Light = 2,
>>> + Heavy = 3,
>>> +}
>>
>> Please add an explanation for the different security modules. Where are the
>> differences?
>>
>> I think most of the structures, registers, abbreviations, etc. introduced in
>> this patch need some documentation.
>
> I've documented things a bit better for the next revision.
>
>>
>> Please see https://docs.kernel.org/gpu/nova/guidelines.html#documentation.
>>
>>> +
>>> +impl TryFrom<u32> for FalconSecurityModel {
>>> + type Error = Error;
>>> +
>>> + fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
>>> + use FalconSecurityModel::*;
>>> +
>>> + let sec_model = match value {
>>> + 0 => None,
>>> + 2 => Light,
>>> + 3 => Heavy,
>>> + _ => return Err(EINVAL),
>>> + };
>>> +
>>> + Ok(sec_model)
>>> + }
>>> +}
>>> +
>>> +#[repr(u8)]
>>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>>> +pub(crate) enum FalconCoreRevSubversion {
>>> + #[default]
>>> + Subversion0 = 0,
>>> + Subversion1 = 1,
>>> + Subversion2 = 2,
>>> + Subversion3 = 3,
>>> +}
>>> +
>>> +impl From<u32> for FalconCoreRevSubversion {
>>> + fn from(value: u32) -> Self {
>>> + use FalconCoreRevSubversion::*;
>>> +
>>> + match value & 0b11 {
>>> + 0 => Subversion0,
>>> + 1 => Subversion1,
>>> + 2 => Subversion2,
>>> + 3 => Subversion3,
>>> + // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
>>> + 4..=u32::MAX => unsafe { unreachable_unchecked() },
>>> + }
>>
>> FalconCoreRev uses TryFrom to avoid unsafe code, I think FalconCoreRevSubversion
>> should do the same thing.
>
> Since the field from which `FalconCoreRevSubversion` is built is only 2
> bits, I thought we could avoid using `TryFrom` since we are effectively
> covering all possible values (I wish Rust has n-bit integer types :)).
> But yeah I have probably overthought that, and that unsafe block is
> unsightly. Converted to `TryFrom`.
>
>>
>>> +/// Trait defining the parameters of a given Falcon instance.
>>> +pub(crate) trait FalconEngine: Sync {
>>> + /// Base I/O address for the falcon, relative from which its registers are accessed.
>>> + const BASE: usize;
>>> +}
>>> +
>>> +/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
>>> +#[derive(Debug)]
>>> +pub(crate) struct FalconLoadTarget {
>>> + /// Offset from the start of the source object to copy from.
>>> + pub(crate) src_start: u32,
>>> + /// Offset from the start of the destination memory to copy into.
>>> + pub(crate) dst_start: u32,
>>> + /// Number of bytes to copy.
>>> + pub(crate) len: u32,
>>> +}
>>> +
>>> +#[derive(Debug)]
>>> +pub(crate) struct FalconBromParams {
>>> + pub(crate) pkc_data_offset: u32,
>>> + pub(crate) engine_id_mask: u16,
>>> + pub(crate) ucode_id: u8,
>>> +}
>>> +
>>> +pub(crate) trait FalconFirmware {
>>> + type Target: FalconEngine;
>>> +
>>> + /// Returns the DMA handle of the object containing the firmware.
>>> + fn dma_handle(&self) -> bindings::dma_addr_t;
>>> +
>>> + /// Returns the load parameters for `IMEM`.
>>> + fn imem_load(&self) -> FalconLoadTarget;
>>> +
>>> + /// Returns the load parameters for `DMEM`.
>>> + fn dmem_load(&self) -> FalconLoadTarget;
>>> +
>>> + /// Returns the parameters to write into the BROM registers.
>>> + fn brom_params(&self) -> FalconBromParams;
>>> +
>>> + /// Returns the start address of the firmware.
>>> + fn boot_addr(&self) -> u32;
>>> +}
>>> +
>>> +/// Contains the base parameters common to all Falcon instances.
>>> +pub(crate) struct Falcon<E: FalconEngine> {
>>> + pub hal: Arc<dyn FalconHal<E>>,
>>
>> This should probably be private and instead should be exposed via Deref.
>
> Agreed - actually not all the HAL is supposed to be exposed, so I've
> added a proxy method for the only method that needs to be called from
> outside this module.
>
>>
>> Also, please see my comment at create_falcon_hal() regarding the dynamic
>> dispatch.
>>
>>> +}
>>> +
>>> +impl<E: FalconEngine + 'static> Falcon<E> {
>>> + pub(crate) fn new(
>>> + pdev: &pci::Device,
>>> + chipset: Chipset,
>>> + bar: &Devres<Bar0>,
>>> + need_riscv: bool,
>>> + ) -> Result<Self> {
>>> + let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
>>> + // Ensure that the revision and security model contain valid values.
>>> + let _rev = hwcfg1.core_rev()?;
>>> + let _sec_model = hwcfg1.security_model()?;
>>> +
>>> + if need_riscv {
>>> + let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
>>> + if !hwcfg2.riscv() {
>>> + dev_err!(
>>> + pdev.as_ref(),
>>> + "riscv support requested on falcon that does not support it\n"
>>> + );
>>> + return Err(EINVAL);
>>> + }
>>> + }
>>> +
>>> + Ok(Self {
>>> + hal: hal::create_falcon_hal(chipset)?,
>>
>> I'd prefer to move the contents of create_falcon_hal() into this constructor.
>
> I think it is actually beneficial to have this in a dedicated method:
> that way the individual HAL constructors do not need to be visible to
> the `falcon` module and can be contained in the `hal` sub-module, which
> I think helps keeping things at their place. Is there a good reason to
> prefer doing it here?
>
> Ah, maybe you are thinking that we are returning a Boxed HAL because we
> are going through this function? It's actually on purpose - see below.
>
>>> +pub(crate) struct Gsp;
>>> +impl FalconEngine for Gsp {
>>> + const BASE: usize = 0x00110000;
>>> +}
>>> +
>>> +pub(crate) type GspFalcon = Falcon<Gsp>;
>>
>> Please drop this type alias, Falcon<Gsp> seems simple enough and is much more
>> obvious IMHO.
>
> Yeah, I wanted to avoid having to import two symbols into the gpu
> module, but I've probably been overthinking it again.
>
>>
>>> +
>>> +impl Falcon<Gsp> {
>>> + /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
>>> + /// allow GSP to signal CPU for processing new messages in message queue.
>>> + pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
>>> + with_bar!(bar, |b| regs::FalconIrqsclr::default()
>>> + .set_swgen0(true)
>>> + .write(b, Gsp::BASE))
>>> + }
>>> +}
>>> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/falcon/hal.rs
>>> @@ -0,0 +1,54 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +use kernel::devres::Devres;
>>> +use kernel::prelude::*;
>>> +use kernel::sync::Arc;
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::falcon::{FalconBromParams, FalconEngine};
>>> +use crate::gpu::Chipset;
>>> +use crate::timer::Timer;
>>> +
>>> +mod ga102;
>>> +
>>> +/// Hardware Abstraction Layer for Falcon cores.
>>> +///
>>> +/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
>>> +/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
>>> +/// registers.
>>> +pub(crate) trait FalconHal<E: FalconEngine>: Sync {
>>> + // Activates the Falcon core if the engine is a risvc/falcon dual engine.
>>> + fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
>>> + Ok(())
>>> + }
>>> +
>>> + fn get_signature_reg_fuse_version(
>>> + &self,
>>> + bar: &Devres<Bar0>,
>>> + engine_id_mask: u16,
>>> + ucode_id: u8,
>>> + ) -> Result<u32>;
>>> +
>>> + // Program the BROM registers prior to starting a secure firmware.
>>> + fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
>>> +}
>>> +
>>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>>> +///
>>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>>> +/// requested HAL.
>>
>> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
>> relevant to FalconHal impls?
>>
>> Can't we do something like I do in the following example [1]?
>>
>> ```
>> use std::marker::PhantomData;
>> use std::ops::Deref;
>>
>> trait Engine {
>> const BASE: u32;
>> }
>>
>> trait Hal<E: Engine> {
>> fn access(&self);
>> }
>>
>> struct Gsp;
>>
>> impl Engine for Gsp {
>> const BASE: u32 = 0x1;
>> }
>>
>> struct Sec2;
>>
>> impl Engine for Sec2 {
>> const BASE: u32 = 0x2;
>> }
>>
>> struct GA100<E: Engine>(PhantomData<E>);
>>
>> impl<E: Engine> Hal<E> for GA100<E> {
>> fn access(&self) {
>> println!("Base: {}", E::BASE);
>> }
>> }
>>
>> impl<E: Engine> GA100<E> {
>> fn new() -> Self {
>> Self(PhantomData)
>> }
>> }
>>
>> //struct Falcon<E: Engine>(GA100<E>);
>>
>> struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);
>>
>> impl<H: Hal<E>, E: Engine> Falcon<H, E> {
>> fn new(hal: H) -> Self {
>> Self(hal, PhantomData)
>> }
>> }
>>
>> impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
>> type Target = H;
>>
>> fn deref(&self) -> &Self::Target {
>> &self.0
>> }
>> }
>>
>> fn main() {
>> let gsp = Falcon::new(GA100::<Gsp>::new());
>> let sec2 = Falcon::new(GA100::<Sec2>::new());
>>
>> gsp.access();
>> sec2.access();
>> }
>> ```
>>
>> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
>
> So are you have noticed there are two dimensions from which the falcons
> can be instantiated:
>
> - The engine, which determines its register BASE,
> - The HAL, which is determined by the chipset.
>
> For the engine, I want to keep things static for the main reason that if
> BASE was dynamic, we would have to do all our IO using
> try_read()/try_write() and check for an out-of-bounds error at each
> register access. The cost of monomorphization is limited as there are
> only a handful of engines.
>
> But the HAL introduces a second dimension to this, and if we support N
> engines then the amount of monomorphized code would then increase by N
> for each new HAL we add. Chipsets are released at a good cadence, so
> this is the dimension that risks growing the most.
>
> It is also the one that makes use of methods to abstract things (vs.
> fixed parameters), so it is a natural candidate for using virtual
> methods. I am not a fan of having ever-growing boilerplate match
> statements for each method that needs to be abstracted, especially since
> this is that virtual methods do without requiring extra code, and for a
> runtime penalty that is completely negligible in our context and IMHO
> completely balanced by the smaller binary size that results from their
> use.
Adding to what Alex said, note that the runtime cost is still there even without
using dyn. Because at runtime, the match conditionals need to route function
calls to the right place. I am just not seeing the benefits of not using dyn for
this use case and only drawbacks. IMHO, we should try to not be doing the
compiler's job.
Maybe the only benefit is you don't need an Arc or Kbox wrapper?
- Joel
Powered by blists - more mailing lists