lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBJo9qNDn8xDEwlk@pollux>
Date: Wed, 30 Apr 2025 20:16:22 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: Alexandre Courbot <acourbot@...dia.com>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	Björn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	Jonathan Corbet <corbet@....net>,
	John Hubbard <jhubbard@...dia.com>, Ben Skeggs <bskeggs@...dia.com>,
	Timur Tabi <ttabi@...dia.com>, Alistair Popple <apopple@...dia.com>,
	linux-kernel@...r.kernel.org, rust-for-linux@...r.kernel.org,
	nouveau@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH 11/16] gpu: nova-core: add falcon register definitions
 and base code

On Wed, Apr 30, 2025 at 10:38:11AM -0400, Joel Fernandes wrote:
> On 4/30/2025 9:25 AM, Alexandre Courbot wrote:
> > On Tue Apr 22, 2025 at 11:44 PM JST, Danilo Krummrich wrote:
> 
> >>> +/// Returns a boxed falcon HAL adequate for the passed `chipset`.
> >>> +///
> >>> +/// We use this function and a heap-allocated trait object instead of statically defined trait
> >>> +/// objects because of the two-dimensional (Chipset, Engine) lookup required to return the
> >>> +/// requested HAL.
> >>
> >> Do we really need the dynamic dispatch? AFAICS, there's only E::BASE that is
> >> relevant to FalconHal impls?
> >>
> >> Can't we do something like I do in the following example [1]?
> >>
> >> [1] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=bf7035a07e79a4047fb6834eac03a9f2
> > 
> > So are you have noticed there are two dimensions from which the falcons
> > can be instantiated:
> > 
> > - The engine, which determines its register BASE,
> > - The HAL, which is determined by the chipset.
> > 
> > For the engine, I want to keep things static for the main reason that if
> > BASE was dynamic, we would have to do all our IO using
> > try_read()/try_write() and check for an out-of-bounds error at each
> > register access. The cost of monomorphization is limited as there are
> > only a handful of engines.
> > 
> > But the HAL introduces a second dimension to this, and if we support N
> > engines then the amount of monomorphized code would then increase by N
> > for each new HAL we add. Chipsets are released at a good cadence, so
> > this is the dimension that risks growing the most.

I agree, avoiding the dynamic dispatch is probably not worth in this case
considering the long term. However, I wanted to point out an alternative with
[2].

> > It is also the one that makes use of methods to abstract things (vs.
> > fixed parameters), so it is a natural candidate for using virtual
> > methods. I am not a fan of having ever-growing boilerplate match
> > statements for each method that needs to be abstracted, especially since
> > this is that virtual methods do without requiring extra code, and for a
> > runtime penalty that is completely negligible in our context and IMHO
> > completely balanced by the smaller binary size that results from their
> > use.
>
> Adding to what Alex said, note that the runtime cost is still there even without
> using dyn. Because at runtime, the match conditionals need to route function
> calls to the right place.

Honestly, I don't know how dynamic dispatch scales compared to static dispatch
with conditionals.

OOC, I briefly looked for a benchmark and found [3], which doesn't look
unreasonable at a first glance.

I modified it real quick to have more than 2 actions. [4]

2 Actions
---------
Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
 Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]

20 Actions
----------
Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
 Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]

100 Actions
-----------
Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
 Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]

Absolutely take it with a grain of salt, I neither spend a lot of brain power
nor time on this, which usually is not a great combination with benchmarking
things. :)

However, I think it's probably not too important here. Hence, feel free to go
with dynamic dispatch for this.

> I am just not seeing the benefits of not using dyn for
> this use case and only drawbacks. IMHO, we should try to not be doing the
> compiler's job.
> 
> Maybe the only benefit is you don't need an Arc or Kbox wrapper?

That's not a huge concern for me, it's only one single allocation per Engine,
correct?

[2] https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=99ce0f12542488f78e35356c99a1e23f
[3] https://github.com/tailcallhq/rust-benchmarks
[4] https://pastebin.com/k0PqtQnq

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ