lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DADD4T7HM4G7.3Q0CFL6W8J64U@nvidia.com>
Date: Wed, 04 Jun 2025 10:37:45 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Timur Tabi" <ttabi@...dia.com>, "dakr@...nel.org" <dakr@...nel.org>,
 "a.hindborg@...nel.org" <a.hindborg@...nel.org>, "ojeda@...nel.org"
 <ojeda@...nel.org>, "boqun.feng@...il.com" <boqun.feng@...il.com>,
 "simona@...ll.ch" <simona@...ll.ch>, "tmgross@...ch.edu"
 <tmgross@...ch.edu>, "alex.gaynor@...il.com" <alex.gaynor@...il.com>,
 "tzimmermann@...e.de" <tzimmermann@...e.de>, "mripard@...nel.org"
 <mripard@...nel.org>, "maarten.lankhorst@...ux.intel.com"
 <maarten.lankhorst@...ux.intel.com>, "benno.lossin@...ton.me"
 <benno.lossin@...ton.me>, "bjorn3_gh@...tonmail.com"
 <bjorn3_gh@...tonmail.com>, "airlied@...il.com" <airlied@...il.com>,
 "aliceryhl@...gle.com" <aliceryhl@...gle.com>, "gary@...yguo.net"
 <gary@...yguo.net>
Cc: "Alistair Popple" <apopple@...dia.com>, "John Hubbard"
 <jhubbard@...dia.com>, "rust-for-linux@...r.kernel.org"
 <rust-for-linux@...r.kernel.org>, "dri-devel@...ts.freedesktop.org"
 <dri-devel@...ts.freedesktop.org>, "nouveau@...ts.freedesktop.org"
 <nouveau@...ts.freedesktop.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, "Joel Fernandes" <joelagnelf@...dia.com>,
 "Ben Skeggs" <bskeggs@...dia.com>
Subject: Re: [PATCH v4 20/20] gpu: nova-core: load and run FWSEC-FRTS

On Fri May 30, 2025 at 6:30 AM JST, Timur Tabi wrote:
> On Wed, 2025-05-21 at 15:45 +0900, Alexandre Courbot wrote:
>
> I noticed something interesting in this change to Gpu::new().
>
>> +        // Check that the WPR2 region does not already exists - if it does, the GPU needs to be
>> +        // reset.
>> +        if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
>> +            dev_err!(
>> +                pdev.as_ref(),
>> +                "WPR2 region already exists - GPU needs to be reset to proceed\n"
>> +            );
>> +            return Err(EBUSY);
>> +        }
>
> You have a lot of checks in this code that display an error message and then return an Err().
>
> But then ...
>
>> +
>> +        // Reset falcon, load FWSEC-FRTS, and run it.
>> +        gsp_falcon.reset(bar)?;
>> +        gsp_falcon.dma_load(bar, &fwsec_frts)?;
>> +        let (mbox0, _) = gsp_falcon.boot(bar, Some(0), None)?;
>> +        if mbox0 != 0 {
>> +            dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", mbox0);
>> +            return Err(EINVAL);
>> +        }
>
> There are several lines where you just terminate them with "?".  This means that no error message is
> displays. 
>
> I think all of these ? should be replaced with something like:
>
> 	gsp_falcon.reset(bar).inspect_err(|e| {
>             dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e);
>         })?;
>
> This feels like something that would benefit from a macro, but I can't imagine what that would look
> like.

This is because we are checking the cause of the error (unexpected value
after firmware runs) in this file, so it is the correct place to display
an error message. If the falcon reset fails, the error happens within
the `reset()` method which can display an error message if needed, so I
thought it was adequate to just propagate the error here.

Now doing so would not tell us *which* falcon failed, and this sequence
is so important that it is a good idea to understand where is fails
precicely, so I've added a few `inspect_err` as you suggested for
clarity.

Ideally we would have something like the user-space `thiserror` crate to
manage errors nicely and have custom error types like Lyude suggested.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ