[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCJ5ZOH6DO2S.8GGF9FABSVNT@nvidia.com>
Date: Wed, 03 Sep 2025 21:29:51 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
<alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
<gary@...yguo.net>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Andreas
Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl" <aliceryhl@...gle.com>,
"Trevor Gross" <tmgross@...ch.edu>, "David Airlie" <airlied@...il.com>,
"Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
<maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
"Thomas Zimmermann" <tzimmermann@...e.de>, "John Hubbard"
<jhubbard@...dia.com>, "Alistair Popple" <apopple@...dia.com>, "Joel
Fernandes" <joelagnelf@...dia.com>, "Timur Tabi" <ttabi@...dia.com>,
<rust-for-linux@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<nouveau@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v3 02/11] gpu: nova-core: move GSP boot code out of
`Gpu` constructor
On Wed Sep 3, 2025 at 8:05 PM JST, Danilo Krummrich wrote:
> On Wed Sep 3, 2025 at 12:44 PM CEST, Alexandre Courbot wrote:
>> On Wed Sep 3, 2025 at 5:26 PM JST, Danilo Krummrich wrote:
>>> On Wed Sep 3, 2025 at 9:08 AM CEST, Alexandre Courbot wrote:
>>>> On Wed Sep 3, 2025 at 4:53 AM JST, Danilo Krummrich wrote:
>>>>> On Tue Sep 2, 2025 at 4:31 PM CEST, Alexandre Courbot wrote:
>>>>>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>>>>>> index 274989ea1fb4a5e3e6678a08920ddc76d2809ab2..1062014c0a488e959379f009c2e8029ffaa1e2f8 100644
>>>>>> --- a/drivers/gpu/nova-core/driver.rs
>>>>>> +++ b/drivers/gpu/nova-core/driver.rs
>>>>>> @@ -6,6 +6,8 @@
>>>>>>
>>>>>> #[pin_data]
>>>>>> pub(crate) struct NovaCore {
>>>>>> + // Placeholder for the real `Gsp` object once it is built.
>>>>>> + pub(crate) gsp: (),
>>>>>> #[pin]
>>>>>> pub(crate) gpu: Gpu,
>>>>>> _reg: auxiliary::Registration,
>>>>>> @@ -40,8 +42,14 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self
>>>>>> )?;
>>>>>>
>>>>>> let this = KBox::pin_init(
>>>>>> - try_pin_init!(Self {
>>>>>> + try_pin_init!(&this in Self {
>>>>>> gpu <- Gpu::new(pdev, bar)?,
>>>>>> + gsp <- {
>>>>>> + // SAFETY: `this.gpu` is initialized to a valid value.
>>>>>> + let gpu = unsafe { &(*this.as_ptr()).gpu };
>>>>>> +
>>>>>> + gpu.start_gsp(pdev)?
>>>>>> + },
>>>>>
>>>>> Please use pin_chain() [1] for this.
>>>>
>>>> Sorry, but I couldn't figure out how I can use pin_chain here (and
>>>> couldn't find any relevant example in the kernel code either). Can you
>>>> elaborate a bit?
>>>
>>> I thought of just doing the following, which I think should be equivalent (diff
>>> against current nova-next).
>>>
>>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>>> index 274989ea1fb4..6d62867f7503 100644
>>> --- a/drivers/gpu/nova-core/driver.rs
>>> +++ b/drivers/gpu/nova-core/driver.rs
>>> @@ -41,7 +41,9 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self
>>>
>>> let this = KBox::pin_init(
>>> try_pin_init!(Self {
>>> - gpu <- Gpu::new(pdev, bar)?,
>>> + gpu <- Gpu::new(pdev, bar)?.pin_chain(|gpu| {
>>> + gpu.start_gsp(pdev)
>>> + }),
>>> _reg: auxiliary::Registration::new(
>>> pdev.as_ref(),
>>> c_str!("nova-drm"),
>>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>>> index 8caecaf7dfb4..211bc1a5a5b3 100644
>>> --- a/drivers/gpu/nova-core/gpu.rs
>>> +++ b/drivers/gpu/nova-core/gpu.rs
>>> @@ -266,7 +266,7 @@ fn run_fwsec_frts(
>>> pub(crate) fn new(
>>> pdev: &pci::Device<device::Bound>,
>>> devres_bar: Arc<Devres<Bar0>>,
>>> - ) -> Result<impl PinInit<Self>> {
>>> + ) -> Result<impl PinInit<Self, Error>> {
>>> let bar = devres_bar.access(pdev.as_ref())?;
>>> let spec = Spec::new(bar)?;
>>> let fw = Firmware::new(pdev.as_ref(), spec.chipset, FIRMWARE_VERSION)?;
>>> @@ -302,11 +302,16 @@ pub(crate) fn new(
>>>
>>> Self::run_fwsec_frts(pdev.as_ref(), &gsp_falcon, bar, &bios, &fb_layout)?;
>>>
>>> - Ok(pin_init!(Self {
>>> + Ok(try_pin_init!(Self {
>>> spec,
>>> bar: devres_bar,
>>> fw,
>>> sysmem_flush,
>>> }))
>>> }
>>> +
>>> + pub(crate) fn start_gsp(&self, _pdev: &pci::Device<device::Core>) -> Result {
>>> + // noop
>>> + Ok(())
>>> + }
>>> }
>>>
>>> But maybe it doesn't capture your intend?
>>
>> The issue is that `start_gsp` returns a value (currently a placeholder
>> `()`, but it will change into a real type) that needs to be stored into
>> the newly-introduced `gsp` member of `NovaCore`. I could not figure how
>> how `pin_chain` could help with this (and this is the same problem for
>> the other `unsafe` statements in `firmware/gsp.rs`).
>
> Ok, I see, I think Benno is already working on a solution to access previously
> initialized fields from subsequent initializers.
>
> @Benno: What's the status of this? I haven't seen an issue for that in the
> pin-init GitHub repo, should we create one?
>
> However, in this case I'm a bit confused why we want Gsp next to Gpu? Why not
> just make Gsp a member of Gpu then?
To be honest I am not completely sure about the best layout yet and will
need more visibility to understand whether this is optimal. But
considering that we want to run the GSP boot process over a built `Gpu`
instance, we cannot store the result of said process inside `Gpu` unless
we put it inside e.g. an `Option`. But then the variant will always be
`Some` after `probe` returns, and yet we will have to perform a match
every time we want to access it.
The current separation sounds reasonable to me for the time being, with
`Gpu` containing purely hardware resources obtained without help from
user-space, while `Gsp` is the result of running a bunch of firmwares.
An alternative design would be to store `Gpu` inside `Gsp`, but `Gsp`
inside `Gpu` is trickier due to the build order. No matter what we do,
switching the layout later should be trivial if we don't choose the
best one now.
There is also an easy workaround to the sibling initialization issue,
which is to store `Gpu` and `Gsp` behind `Pin<KBox>` - that way we can
initialize both outside `try_pin_init!`, at the cost of two more heap
allocations over the whole lifetime of the device. If we don't have a
proper solution to the problem now, this might be better than using
`unsafe` as a temporary solution.
The same workaround could also be used for to `GspFirmware` and its page
tables - since `GspFirmware` is temporary and can apparently be
discarded after the GSP is booted, this shouldn't be a big issue. This
will allow the driver to probe, and we can add TODO items to fix that
later if a solution is in sight.
>
> I thought the intent was to keep temporary values local to start_gsp() and not
> store them next to Gpu in the same allocation?
It is not visible in the current patchset, but `start_gsp` will
eventually return the runtime data of the GSP - notably its log buffers
and command queue, which are needed to operate it. All the rest (notably
the loaded firmwares) will be local to `start_gsp` and discarded upon
its return.
Powered by blists - more mailing lists