lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <69893efa-fd7b-4fd9-933a-4e418a7634ef@nvidia.com>
Date: Sat, 1 Feb 2025 11:43:51 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: airlied@...il.com, simona@...ll.ch, corbet@....net,
 maarten.lankhorst@...ux.intel.com, mripard@...nel.org, tzimmermann@...e.de,
 ajanulgu@...hat.com, lyude@...hat.com, pstanner@...hat.com, zhiw@...dia.com,
 cjia@...dia.com, bskeggs@...dia.com, acurrid@...dia.com, ojeda@...nel.org,
 alex.gaynor@...il.com, boqun.feng@...il.com, gary@...yguo.net,
 bjorn3_gh@...tonmail.com, benno.lossin@...ton.me, a.hindborg@...nel.org,
 aliceryhl@...gle.com, tmgross@...ch.edu, dri-devel@...ts.freedesktop.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 nouveau@...ts.freedesktop.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH 1/2] gpu: nova-core: add initial driver stub

On 2/1/25 5:52 AM, Danilo Krummrich wrote:
> On Fri, Jan 31, 2025 at 08:01:00PM -0800, John Hubbard wrote:
>> On 1/31/25 2:04 PM, Danilo Krummrich wrote:
...
>>> +/// Enum representation of the GPU generation.
>>> +#[derive(Debug)]
>>> +pub(crate) enum CardType {
>>> +    /// Turing
>>> +    TU100 = 0x160,
>>> +    /// Ampere
>>> +    GA100 = 0x170,
>>> +    /// Ada Lovelace
>>> +    AD100 = 0x190,
>>> +}
>>> +
>>> +/// Structure holding the metadata of the GPU.
>>> +#[allow(dead_code)]
>>> +pub(crate) struct GpuSpec {
>>> +    /// Contents of the boot0 register.
>>> +    boot0: u64,
>>
>> It is redundant to store boot0, when all of the following fields
>> are deduced from boot0.
> 
> Yes, I think we can probably remove it, I only use it to print it in Gpu::new()
> as a sign of life and because I don't know if boot0 contains any other useful
> information than chipset and chiprev.
> 
> But maybe you can help me out here? :) That is, share the register layout and
> field names. This way I could also get rid of those magic numbers, and put in
> proper naming for fields, masks and shifts.
> 

We've open sourced those, starting around 2017, and for example the Ampere
fields for boot0 are here:

https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/ampere/ga100/dev_boot.ref.txt

If more information is needed in this area, let me know and we might be able
to open source it, depending of course on what it is. It's just 
explaining what
the chip is, after all.

...
>>> +// TODO replace with something like derive(FromPrimitive)
>>> +impl CardType {
>>> +    fn from_u32(value: u32) -> Option<CardType> {
>>> +        match value {
>>> +            0x160 => Some(CardType::TU100),
>>> +            0x170 => Some(CardType::GA100),
>>> +            0x190 => Some(CardType::AD100),
>>
>> Is this how nouveau does it too? I mean, classifying cards as GA100,
>> and variants as TU102. It feels wrong to me, because we have for example
>> GA100 GPUs. I mean, GA100 is the same kind of thing as a GA102: each is
>> a GPU.
> 
> Yes, that's what nouveau came up with and it's meant as e.g. 'GA1xx'. But yes,
> I agree it's a bit confusing.
> 
> OOC, what about the first digit in this example? For Blackwell it would seem to
> be 'GB2xx'. Can you shed some light on this?

I'll take a homework assignment to go dig that up. After GSP and Open RM 
came
out, we stopped explicitly publishing to 
https://github.com/NVIDIA/open-gpu-doc ,
because the same information was being published in
https://github.com/NVIDIA/open-gpu-kernel-modules .


> 
>>
>> If I were naming card types, I'd calling them by their architecture names:
>> Turing, Ampere, Ada.
> 
> Yeah, that is probably less cryptic. Plus, we should probably name it something
> around the term "architecture" rather than "CardType".

Yes, "architecture" is accurate.

> 
>>
>>> +            _ => None,
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +impl GpuSpec {
>>> +    fn new(bar: &Devres<Bar0>) -> Result<GpuSpec> {
>>> +        let bar = bar.try_access().ok_or(ENXIO)?;
>>> +        let boot0 = u64::from_le(bar.readq(0));
>>> +        let chip = ((boot0 & 0x1ff00000) >> 20) as u32;
>>> +
>>> +        if boot0 & 0x1f000000 == 0 {
>>> +            return Err(ENODEV);
>>> +        }
>>> +
>>> +        let Some(chipset) = Chipset::from_u32(chip) else {
>>> +            return Err(ENODEV);
>>> +        };
>>> +
>>> +        let Some(card_type) = CardType::from_u32(chip & 0x1f0) else {
>>> +            return Err(ENODEV);
>>> +        };
>>> +
>>> +        Ok(Self {
>>> +            boot0,
>>> +            card_type,
>>> +            chipset,
>>> +            chiprev: (boot0 & 0xff) as u8,
>>> +        })
>>> +    }
>>> +}
>>> +
>>> +impl Firmware {
>>> +    fn new(dev: &device::Device, spec: &GpuSpec, ver: &str) -> Result<Firmware> {
>>> +        let mut chip_name = CString::try_from_fmt(fmt!("{:?}", spec.chipset))?;
>>> +        chip_name.make_ascii_lowercase();
>>> +
>>> +        let fw_booter_load_path =
>>> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_load-{}.bin", &*chip_name, ver))?;
>>> +        let fw_booter_unload_path =
>>> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_unload-{}.bin", &*chip_name, ver))?;
>>> +        let fw_gsp_path =
>>> +            CString::try_from_fmt(fmt!("nvidia/{}/gsp/gsp-{}.bin", &*chip_name, ver))?;
>>> +
>>> +        let booter_load = firmware::Firmware::request(&fw_booter_load_path, dev)?;
>>> +        let booter_unload = firmware::Firmware::request(&fw_booter_unload_path, dev)?;
>>> +        let gsp = firmware::Firmware::request(&fw_gsp_path, dev)?;
>>> +
>>> +        Ok(Firmware {
>>> +            booter_load,
>>> +            booter_unload,
>>> +            gsp,
>>> +        })
>>> +    }
>>> +}
>>> +
>>> +impl Gpu {
>>> +    pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
>>> +        let spec = GpuSpec::new(&bar)?;
>>> +        let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
>>
>> lol there it is: our one, "stable" set of GSP firmware. Maybe a one line comment
>> above might be appropriate, to mention that this is hardcoded, but new firmware
>> versions will not be. On the other hand, that's obvious. :)
> 
> Well, I guess we'll have to probe what the distribution provides us with and see
> if that's supported and sufficient for the chipset we try to initialize.

Oh, I think we can leave it alone for a moment. We're working hard on a
firmware plan to make things better.


thanks,
-- 
John Hubbard


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ