lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47c05bcf-7591-4148-8783-0c107b0c3c9d@nvidia.com>
Date: Sat, 6 Dec 2025 21:26:12 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Zhi Wang <zhiw@...dia.com>, rust-for-linux@...r.kernel.org,
 linux-pci@...r.kernel.org, nouveau@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org
Cc: airlied@...il.com, dakr@...nel.org, aliceryhl@...gle.com,
 bhelgaas@...gle.com, kwilczynski@...nel.org, ojeda@...nel.org,
 alex.gaynor@...il.com, boqun.feng@...il.com, gary@...yguo.net,
 bjorn3_gh@...tonmail.com, lossin@...nel.org, a.hindborg@...nel.org,
 tmgross@...ch.edu, markus.probst@...teo.de, helgaas@...nel.org,
 cjia@...dia.com, alex@...zbot.org, smitra@...dia.com, ankita@...dia.com,
 aniketa@...dia.com, kwankhede@...dia.com, targupta@...dia.com,
 acourbot@...dia.com, jhubbard@...dia.com, zhiwang@...nel.org
Subject: Re: [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU
 support is enabled

Hi Zhi,

On 12/6/2025 7:42 AM, Zhi Wang wrote:
> To support the maximum vGPUs on the device that support vGPU, a larger
> WPR2 heap size is required. By setting the WPR2 heap size larger than 256MB
> the scrubber ucode image is required to scrub the FB memory before any
> other ucode image is executed.
> 
> If not, the GSP firmware hangs when booting.
> 
> When vGPU support is enabled, execute the scrubber ucode image to scrub the
> FB memory before executing any other ucode images.
> 
[..]
>      pub(crate) const fn create(
> diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
> index f107f753214a..f622c9b960de 100644
> --- a/drivers/gpu/nova-core/firmware/booter.rs
> +++ b/drivers/gpu/nova-core/firmware/booter.rs
> @@ -269,6 +269,7 @@ fn new_booter(dev: &device::Device<device::Bound>, data: &[u8]) -> Result<Self>
>  
>  #[derive(Copy, Clone, Debug, PartialEq)]
>  pub(crate) enum BooterKind {
> +    Scrubber,
>      Loader,
>      #[expect(unused)]
>      Unloader,
> @@ -286,6 +287,7 @@ pub(crate) fn new(
>          bar: &Bar0,
>      ) -> Result<Self> {
>          let fw_name = match kind {
> +            BooterKind::Scrubber => "scrubber",
>              BooterKind::Loader => "booter_load",
>              BooterKind::Unloader => "booter_unload",
>          };
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index ec006c26f19f..8ef79433f017 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -151,6 +151,33 @@ pub(crate) fn boot(
>  
>          Self::run_fwsec_frts(dev, gsp_falcon, bar, &bios, &fb_layout)?;

Could you elaborate on how the timeout below works? See comment below.

>  
> +        if vgpu_support {
> +            let scrubber = BooterFirmware::new(
> +                dev,
> +                BooterKind::Scrubber,
> +                chipset,
> +                FIRMWARE_VERSION,
> +                sec2_falcon,
> +                bar,
> +            )?;
> +
> +            sec2_falcon.reset(bar)?;
> +            sec2_falcon.dma_load(bar, &scrubber)?;
> +
> +            let (mbox0, mbox1) = sec2_falcon.boot(bar, None, None)?;

boot() already returns -ETIMEDOUT via wait_till_halted()->read_poll_timeout().

The wait there is 2 seconds. I assume the scrubber would have completed by then.

> +
> +            dev_dbg!(
> +                pdev.as_ref(),
> +                "SEC2 MBOX0: {:#x}, MBOX1{:#x}\n",
> +                mbox0,
> +                mbox1
> +            );
> +
> +            if !regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed() {
> +                return Err(ETIMEDOUT);

So under which situation do you get to this point (!scrubber_completed) ?
Basically I am not sure if ETIMEDOUT is the right error to return here, because
boot() already returns ETIMEDOUT by waiting for the halt.

If you still want return ETIMEDOUT here, then it sounds like you're waiting for
scrubbing beyond the waiting already done by boot(). If so, then shouldn't you
need to use read_poll_timeout() here?

perhaps something like:

 read_poll_timeout(
     || Ok(regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()),
     |val: &bool| *val,
     Delta::from_millis(10),
     Delta::from_secs(5),
 )?;

Thanks.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ