lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D9K09AU4KTQJ.3TVVZPMDB0H7I@nvidia.com>
Date: Wed, 30 Apr 2025 22:25:03 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
 <alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
 <gary@...yguo.net>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <benno.lossin@...ton.me>,
 "Andreas Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl"
 <aliceryhl@...gle.com>, "Trevor Gross" <tmgross@...ch.edu>, "David Airlie"
 <airlied@...il.com>, "Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "Jonathan Corbet"
 <corbet@....net>, "John Hubbard" <jhubbard@...dia.com>, "Ben Skeggs"
 <bskeggs@...dia.com>, "Joel Fernandes" <joelagnelf@...dia.com>, "Timur
 Tabi" <ttabi@...dia.com>, "Alistair Popple" <apopple@...dia.com>,
 <linux-kernel@...r.kernel.org>, <rust-for-linux@...r.kernel.org>,
 <nouveau@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions
 and base code

Hi Danilo,

On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
> This patch could probably split up a bit, to make it more pleasant to review. :)

Probably yes. I thought since it is mostly new files, splitting up
wouldn't change much. Let me see what I can do.

>
> On Sun, Apr 20, 2025 at 09:19:43PM +0900, Alexandre Courbot wrote:
>> 
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone)]
>> +pub(crate) enum FalconSecurityModel {
>> +    #[default]
>> +    None = 0,
>> +    Light = 2,
>> +    Heavy = 3,
>> +}
>
> Please add an explanation for the different security modules. Where are the
> differences?
>
> I think most of the structures, registers, abbreviations, etc. introduced in
> this patch need some documentation.

I've documented things a bit better for the next revision.

>
> Please see https://docs.kernel.org/gpu/nova/guidelines.html#documentation.
>
>> +
>> +impl TryFrom<u32> for FalconSecurityModel {
>> +    type Error = Error;
>> +
>> +    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
>> +        use FalconSecurityModel::*;
>> +
>> +        let sec_model = match value {
>> +            0 => None,
>> +            2 => Light,
>> +            3 => Heavy,
>> +            _ => return Err(EINVAL),
>> +        };
>> +
>> +        Ok(sec_model)
>> +    }
>> +}
>> +
>> +#[repr(u8)]
>> +#[derive(Debug, Default, Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
>> +pub(crate) enum FalconCoreRevSubversion {
>> +    #[default]
>> +    Subversion0 = 0,
>> +    Subversion1 = 1,
>> +    Subversion2 = 2,
>> +    Subversion3 = 3,
>> +}
>> +
>> +impl From<u32> for FalconCoreRevSubversion {
>> +    fn from(value: u32) -> Self {
>> +        use FalconCoreRevSubversion::*;
>> +
>> +        match value & 0b11 {
>> +            0 => Subversion0,
>> +            1 => Subversion1,
>> +            2 => Subversion2,
>> +            3 => Subversion3,
>> +            // SAFETY: the `0b11` mask limits the possible values to `0..=3`.
>> +            4..=u32::MAX => unsafe { unreachable_unchecked() },
>> +        }
>
> FalconCoreRev uses TryFrom to avoid unsafe code, I think FalconCoreRevSubversion
> should do the same thing.

Since the field from which `FalconCoreRevSubversion` is built is only 2
bits, I thought we could avoid using `TryFrom` since we are effectively
covering all possible values (I wish Rust has n-bit integer types :)).
But yeah I have probably overthought that, and that unsafe block is
unsightly. Converted to `TryFrom`.

>
>> +/// Trait defining the parameters of a given Falcon instance.
>> +pub(crate) trait FalconEngine: Sync {
>> +    /// Base I/O address for the falcon, relative from which its registers are accessed.
>> +    const BASE: usize;
>> +}
>> +
>> +/// Represents a portion of the firmware to be loaded into a particular memory (e.g. IMEM or DMEM).
>> +#[derive(Debug)]
>> +pub(crate) struct FalconLoadTarget {
>> +    /// Offset from the start of the source object to copy from.
>> +    pub(crate) src_start: u32,
>> +    /// Offset from the start of the destination memory to copy into.
>> +    pub(crate) dst_start: u32,
>> +    /// Number of bytes to copy.
>> +    pub(crate) len: u32,
>> +}
>> +
>> +#[derive(Debug)]
>> +pub(crate) struct FalconBromParams {
>> +    pub(crate) pkc_data_offset: u32,
>> +    pub(crate) engine_id_mask: u16,
>> +    pub(crate) ucode_id: u8,
>> +}
>> +
>> +pub(crate) trait FalconFirmware {
>> +    type Target: FalconEngine;
>> +
>> +    /// Returns the DMA handle of the object containing the firmware.
>> +    fn dma_handle(&self) -> bindings::dma_addr_t;
>> +
>> +    /// Returns the load parameters for `IMEM`.
>> +    fn imem_load(&self) -> FalconLoadTarget;
>> +
>> +    /// Returns the load parameters for `DMEM`.
>> +    fn dmem_load(&self) -> FalconLoadTarget;
>> +
>> +    /// Returns the parameters to write into the BROM registers.
>> +    fn brom_params(&self) -> FalconBromParams;
>> +
>> +    /// Returns the start address of the firmware.
>> +    fn boot_addr(&self) -> u32;
>> +}
>> +
>> +/// Contains the base parameters common to all Falcon instances.
>> +pub(crate) struct Falcon<E: FalconEngine> {
>> +    pub hal: Arc<dyn FalconHal<E>>,
>
> This should probably be private and instead should be exposed via Deref.

Agreed - actually not all the HAL is supposed to be exposed, so I've
added a proxy method for the only method that needs to be called from
outside this module.

>
> Also, please see my comment at create_falcon_hal() regarding the dynamic
> dispatch.
>
>> +}
>> +
>> +impl<E: FalconEngine + 'static> Falcon<E> {
>> +    pub(crate) fn new(
>> +        pdev: &pci::Device,
>> +        chipset: Chipset,
>> +        bar: &Devres<Bar0>,
>> +        need_riscv: bool,
>> +    ) -> Result<Self> {
>> +        let hwcfg1 = with_bar!(bar, |b| regs::FalconHwcfg1::read(b, E::BASE))?;
>> +        // Ensure that the revision and security model contain valid values.
>> +        let _rev = hwcfg1.core_rev()?;
>> +        let _sec_model = hwcfg1.security_model()?;
>> +
>> +        if need_riscv {
>> +            let hwcfg2 = with_bar!(bar, |b| regs::FalconHwcfg2::read(b, E::BASE))?;
>> +            if !hwcfg2.riscv() {
>> +                dev_err!(
>> +                    pdev.as_ref(),
>> +                    "riscv support requested on falcon that does not support it\n"
>> +                );
>> +                return Err(EINVAL);
>> +            }
>> +        }
>> +
>> +        Ok(Self {
>> +            hal: hal::create_falcon_hal(chipset)?,
>
> I'd prefer to move the contents of create_falcon_hal() into this constructor.

I think it is actually beneficial to have this in a dedicated method:
that way the individual HAL constructors do not need to be visible to
the `falcon` module and can be contained in the `hal` sub-module, which
I think helps keeping things at their place. Is there a good reason to
prefer doing it here?

Ah, maybe you are thinking that we are returning a Boxed HAL because we
are going through this function? It's actually on purpose - see below.

>> +pub(crate) struct Gsp;
>> +impl FalconEngine for Gsp {
>> +    const BASE: usize = 0x00110000;
>> +}
>> +
>> +pub(crate) type GspFalcon = Falcon<Gsp>;
>
> Please drop this type alias, Falcon<Gsp> seems simple enough and is much more
> obvious IMHO.

Yeah, I wanted to avoid having to import two symbols into the gpu
module, but I've probably been overthinking it again.

>
>> +
>> +impl Falcon<Gsp> {
>> +    /// Clears the SWGEN0 bit in the Falcon's IRQ status clear register to
>> +    /// allow GSP to signal CPU for processing new messages in message queue.
>> +    pub(crate) fn clear_swgen0_intr(&self, bar: &Devres<Bar0>) -> Result<()> {
>> +        with_bar!(bar, |b| regs::FalconIrqsclr::default()
>> +            .set_swgen0(true)
>> +            .write(b, Gsp::BASE))
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/falcon/hal.rs b/drivers/gpu/nova-core/falcon/hal.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..5ebf4e88f1f25a13cf47859a53507be53e795d34
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/falcon/hal.rs
>> @@ -0,0 +1,54 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +use kernel::sync::Arc;
>> +
>> +use crate::driver::Bar0;
>> +use crate::falcon::{FalconBromParams, FalconEngine};
>> +use crate::gpu::Chipset;
>> +use crate::timer::Timer;
>> +
>> +mod ga102;
>> +
>> +/// Hardware Abstraction Layer for Falcon cores.
>> +///
>> +/// Implements chipset-specific low-level operations. The trait is generic against [`FalconEngine`]
>> +/// so its `BASE` parameter can be used in order to avoid runtime bound checks when accessing
>> +/// registers.
>> +pub(crate) trait FalconHal<E: FalconEngine>: Sync {
>> +    // Activates the Falcon core if the engine is a risvc/falcon dual engine.
>> +    fn select_core(&self, _bar: &Devres<Bar0>, _timer: &Timer) -> Result<()> {
>> +        Ok(())
>> +    }
>> +
>> +    fn get_signature_reg_fuse_version(
>> +        &self,
>> +        bar: &Devres<Bar0>,
>> +        engine_id_mask: u16,
>> +        ucode_id: u8,
>> +    ) -> Result<u32>;
>> +
>> +    // Program the BROM registers prior to starting a secure firmware.
>> +    fn program_brom(&self, bar: &Devres<Bar0>, params: &FalconBromParams) -> Result<()>;
>> +}
>> +
>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> +///
>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> +/// requested HAL.
>
> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
> relevant to FalconHal impls?
>
> Can't we do something like I do in the following example [1]?
>
> ```
> use std::marker::PhantomData;
> use std::ops::Deref;
>
> trait Engine {
>     const BASE: u32;
> }
>
> trait Hal<E: Engine> {
>     fn access(&self);
> }
>
> struct Gsp;
>
> impl Engine for Gsp {
>     const BASE: u32 = 0x1;
> }
>
> struct Sec2;
>
> impl Engine for Sec2 {
>     const BASE: u32 = 0x2;
> }
>
> struct GA100<E: Engine>(PhantomData<E>);
>
> impl<E: Engine> Hal<E> for GA100<E> {
>     fn access(&self) {
>         println!("Base: {}", E::BASE);
>     }
> }
>
> impl<E: Engine> GA100<E> {
>     fn new() -> Self {
>         Self(PhantomData)
>     }
> }
>
> //struct Falcon<E: Engine>(GA100<E>);
>
> struct Falcon<H: Hal<E>, E: Engine>(H, PhantomData<E>);
>
> impl<H: Hal<E>, E: Engine> Falcon<H, E> {
>     fn new(hal: H) -> Self {
>         Self(hal, PhantomData)
>     }
> }
>
> impl<H: Hal<E>, E: Engine> Deref for Falcon<H, E> {
>     type Target = H;
>
>     fn deref(&self) -> &Self::Target {
>         &self.0
>     }
> }
>
> fn main() {
>     let gsp = Falcon::new(GA100::<Gsp>::new());
>     let sec2 = Falcon::new(GA100::<Sec2>::new());
>
>     gsp.access();
>     sec2.access();
> }
> ```
>
> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2

So are you have noticed there are two dimensions from which the falcons
can be instantiated:

- The engine, which determines its register BASE,
- The HAL, which is determined by the chipset.

For the engine, I want to keep things static for the main reason that if
BASE was dynamic, we would have to do all our IO using
try_read()/try_write() and check for an out-of-bounds error at each
register access. The cost of monomorphization is limited as there are
only a handful of engines.

But the HAL introduces a second dimension to this, and if we support N
engines then the amount of monomorphized code would then increase by N
for each new HAL we add. Chipsets are released at a good cadence, so
this is the dimension that risks growing the most.

It is also the one that makes use of methods to abstract things (vs.
fixed parameters), so it is a natural candidate for using virtual
methods. I am not a fan of having ever-growing boilerplate match
statements for each method that needs to be abstracted, especially since
this is that virtual methods do without requiring extra code, and for a
runtime penalty that is completely negligible in our context and IMHO
completely balanced by the smaller binary size that results from their
use.

Cheers,
Alex.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ