lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D9KDYOVU0EG3.2TA8UJHMW66Q6@nvidia.com>
Date: Thu, 01 May 2025 09:09:28 +0900
From: "Alexandre Courbot" <acourbot@...dia.com>
To: "Danilo Krummrich" <dakr@...nel.org>, "Joel Fernandes"
 <joelagnelf@...dia.com>
Cc: "Miguel Ojeda" <ojeda@...nel.org>, "Alex Gaynor"
 <alex.gaynor@...il.com>, "Boqun Feng" <boqun.feng@...il.com>, "Gary Guo"
 <gary@...yguo.net>, Björn Roy Baron
 <bjorn3_gh@...tonmail.com>, "Benno Lossin" <benno.lossin@...ton.me>,
 "Andreas Hindborg" <a.hindborg@...nel.org>, "Alice Ryhl"
 <aliceryhl@...gle.com>, "Trevor Gross" <tmgross@...ch.edu>, "David Airlie"
 <airlied@...il.com>, "Simona Vetter" <simona@...ll.ch>, "Maarten Lankhorst"
 <maarten.lankhorst@...ux.intel.com>, "Maxime Ripard" <mripard@...nel.org>,
 "Thomas Zimmermann" <tzimmermann@...e.de>, "Jonathan Corbet"
 <corbet@....net>, "John Hubbard" <jhubbard@...dia.com>, "Ben Skeggs"
 <bskeggs@...dia.com>, "Timur Tabi" <ttabi@...dia.com>, "Alistair Popple"
 <apopple@...dia.com>, <linux-kernel@...r.kernel.org>,
 <rust-for-linux@...r.kernel.org>, <nouveau@...ts.freedesktop.org>,
 <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions
 and base code

On Thu May 1, 2025 at 3:16 AM JST, Danilo Krummrich wrote:
> On Wed, Apr 30, 2025 at 10:38:11AM -0400, Joel Fernandes wrote:
>> On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
>> > On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
>> 
>> >>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
>> >>> +///
>> >>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
>> >>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
>> >>> +/// requested HAL.
>> >>
>> >> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
>> >> relevant to FalconHal impls?
>> >>
>> >> Can't we do something like I do in the following example [1]?
>> >>
>> >> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
>> > 
>> > So are you have noticed there are two dimensions from which the falcons
>> > can be instantiated:
>> > 
>> > - The engine, which determines its register BASE,
>> > - The HAL, which is determined by the chipset.
>> > 
>> > For the engine, I want to keep things static for the main reason that if
>> > BASE was dynamic, we would have to do all our IO using
>> > try_read()/try_write() and check for an out-of-bounds error at each
>> > register access. The cost of monomorphization is limited as there are
>> > only a handful of engines.
>> > 
>> > But the HAL introduces a second dimension to this, and if we support N
>> > engines then the amount of monomorphized code would then increase by N
>> > for each new HAL we add. Chipsets are released at a good cadence, so
>> > this is the dimension that risks growing the most.
>
> I agree, avoiding the dynamic dispatch is probably not worth in this case
> considering the long term. However, I wanted to point out an alternative with
> [2].
>
>> > It is also the one that makes use of methods to abstract things (vs.
>> > fixed parameters), so it is a natural candidate for using virtual
>> > methods. I am not a fan of having ever-growing boilerplate match
>> > statements for each method that needs to be abstracted, especially since
>> > this is that virtual methods do without requiring extra code, and for a
>> > runtime penalty that is completely negligible in our context and IMHO
>> > completely balanced by the smaller binary size that results from their
>> > use.
>>
>> Adding to what Alex said, note that the runtime cost is still there even without
>> using dyn. Because at runtime, the match conditionals need to route function
>> calls to the right place.
>
> Honestly, I don't know how dynamic dispatch scales compared to static dispatch
> with conditionals.
>
> OOC, I briefly looked for a benchmark and found [3], which doesn't look
> unreasonable at a first glance.
>
> I modified it real quick to have more than 2 actions. [4]
>
> 2 Actions
> ---------
> Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
>  Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]
>
> 20 Actions
> ----------
> Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
>  Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]
>
> 100 Actions
> -----------
> Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
>  Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]
>
> Absolutely take it with a grain of salt, I neither spend a lot of brain power
> nor time on this, which usually is not a great combination with benchmarking
> things. :)
>
> However, I think it's probably not too important here. Hence, feel free to go
> with dynamic dispatch for this.

Indeed, it looks like the cost of dispatch will be completely shadowed
by the IO behind it anyway. And these HAL calls are like a few here and
there anyway, it's not like they are on a critical path.

>
>> I am just not seeing the benefits of not using dyn for
>> this use case and only drawbacks. IMHO, we should try to not be doing the
>> compiler's job.
>> 
>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
>
> That's not a huge concern for me, it's only one single allocation per Engine,
> correct?

Correct. Note that for other engines we will be able to store the HALs as
static singletons instead of building them on the heap like I am
currently doing. The reason for doing this on falcon is that the
dual-dimension of the instances makes it more complex to build and look
them up.

... or maybe I could just use a macro? Let me try that and see whether
it works.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ