[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <E8245EE2-887A-447A-8576-DC845FD57DC1@nvidia.com>
Date: Thu, 11 Dec 2025 01:24:49 +0000
From: Joel Fernandes <joelagnelf@...dia.com>
To: Zhi Wang <zhiw@...dia.com>
CC: "rust-for-linux@...r.kernel.org" <rust-for-linux@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"nouveau@...ts.freedesktop.org" <nouveau@...ts.freedesktop.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"airlied@...il.com" <airlied@...il.com>, "dakr@...nel.org" <dakr@...nel.org>,
"aliceryhl@...gle.com" <aliceryhl@...gle.com>, "bhelgaas@...gle.com"
<bhelgaas@...gle.com>, "kwilczynski@...nel.org" <kwilczynski@...nel.org>,
"ojeda@...nel.org" <ojeda@...nel.org>, "alex.gaynor@...il.com"
<alex.gaynor@...il.com>, "boqun.feng@...il.com" <boqun.feng@...il.com>,
"gary@...yguo.net" <gary@...yguo.net>, "bjorn3_gh@...tonmail.com"
<bjorn3_gh@...tonmail.com>, "lossin@...nel.org" <lossin@...nel.org>,
"a.hindborg@...nel.org" <a.hindborg@...nel.org>, "tmgross@...ch.edu"
<tmgross@...ch.edu>, "markus.probst@...teo.de" <markus.probst@...teo.de>,
"helgaas@...nel.org" <helgaas@...nel.org>, Neo Jia <cjia@...dia.com>,
"alex@...zbot.org" <alex@...zbot.org>, Surath Mitra <smitra@...dia.com>,
Ankit Agrawal <ankita@...dia.com>, Aniket Agashe <aniketa@...dia.com>, Kirti
Wankhede <kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)"
<targupta@...dia.com>, Alexandre Courbot <acourbot@...dia.com>, John Hubbard
<jhubbard@...dia.com>, "zhiwang@...nel.org" <zhiwang@...nel.org>
Subject: Re: [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU
support is enabled
> On Dec 9, 2025, at 11:05 PM, Zhi Wang <zhiw@...dia.com> wrote:
> [..]
>>> +
>>> + dev_dbg!(
>>> + pdev.as_ref(),
>>> + "SEC2 MBOX0: {:#x}, MBOX1{:#x}\n",
>>> + mbox0,
>>> + mbox1
>>> + );
>>> +
>>> + if
>>> !regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()
>>> {
>>> + return Err(ETIMEDOUT);
>>
>> So under which situation do you get to this point
>> (!scrubber_completed) ? Basically I am not sure if ETIMEDOUT is the
>> right error to return here, because boot() already returns ETIMEDOUT
>> by waiting for the halt.
>>
>> If you still want return ETIMEDOUT here, then it sounds like you're
>> waiting for scrubbing beyond the waiting already done by boot(). If
>> so, then shouldn't you need to use read_poll_timeout() here?
>>
>> perhaps something like:
>>
>> read_poll_timeout(
>> ||
>> Ok(regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()),
>> |val: &bool| *val, Delta::from_millis(10),
>> Delta::from_secs(5),
>> )?;
>>
>
> This is the identical implementation to OpenRM [1]. According to that
> parts of code, I think the scrubber runs in the binary booting process.
> When it signals the firmware booting successfully, the scrubbing should
> be done. Let me change to another errno.
>
> [1]https://github.com/NVIDIA/open-gpu-kernel-modules/blob/a5bfb10e75a4046c5d991c65f49b5d29151e68cf/src/nvidia/src/kernel/gpu/gsp/arch/ada/kernel_gsp_ad102.c#L49
Sure, it was just misleading in the patch that we’re returning a timeout error, when the error is something else (like scrubber failed). Thanks for correcting it.
- Joel
Powered by blists - more mailing lists