lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 25 Feb 2023 00:42:34 +0900
From:   Asahi Lina <lina@...hilina.net>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Miguel Ojeda <ojeda@...nel.org>,
        Alex Gaynor <alex.gaynor@...il.com>,
        Wedson Almeida Filho <wedsonaf@...il.com>,
        Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
        Björn Roy Baron <bjorn3_gh@...tonmail.com>,
        Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>,
        Joerg Roedel <joro@...tes.org>,
        Hector Martin <marcan@...can.st>,
        Sven Peter <sven@...npeter.dev>, Arnd Bergmann <arnd@...db.de>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Alyssa Rosenzweig <alyssa@...enzweig.io>,
        Neal Gompa <neal@...pa.dev>, rust-for-linux@...r.kernel.org,
        linux-kernel@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 25/02/2023 00.24, Greg Kroah-Hartman wrote:
>> What do you recommend for things that want to print device-associated
>> messages, if not holding a reference to the device?
> 
> If you aren't holding a reference to the device, that means you aren't
> associated to it at all, so you better not be printing out anything
> related to any device as that pointer could be invalid at any point in
> time.

The RTKit code talks to the firmware coprocessor that is part of the
device, so it definitely is associated to it... among other things it
prints out firmware logs from the device and crash logs, manages memory
buffers (which have default implementations but can be overridden by the
user driver), and more. It's essentially library code shared by all
device drivers that interact with devices with these coprocessors.

>> Or did I
>> misunderstand what you meant? Just pr_foo() isn't great because we have
>> a lot of instances of rtkit and then you wouldn't know which device the
>> messages are about...
> 
> Then the rtkit code needs to be changed to properly grab the reference
> and actually use it for something other than just a log message.  If it
> only wants it for a log message, then let's just drop it and have the
> rtkit code go quiet, as when kernel code is working properly, it should
> be quiet.  If something goes wrong, the code that called into rtkit can
> print out a message based on the error return values.

Keep in mind rtkit does things like print out crash logs, and you
wouldn't want the caller to be responsible for that (and we definitely
want those in dmesg since these coprocessors are non-recoverable, it's
almost as bad as a kernel panic: you will have to reboot to be able to
use the machine properly again). I find those crash logs very useful to
figure out what went wrong with the GPU (especially if combined with a
memory dump which we don't expose to regular users right now, but which
I have ideas for... but even without that, just assert messages from the
coprocessor or fault instruction pointers that I can correlate with the
firmware are very useful on their own).

Right now rtkit also prints out syslogs from the coprocessors. That's
noisy for some but I think very useful, since we're dealing with reverse
engineered drivers. We'll probably want to silence those for some noisy
coprocessors at some point, but I don't think we want to do that until
things are all upstream, stable, and with a larger user base... until
then I think we'd much rather be spammy and have a better chance of
debugging rare issues, which often happen with these coprocessors
running big firmware blobs... there are a lot of subtleties in getting
the interfaces right, never mind cache coherence issues!

> I have no idea what "rtkit" is, if it's an interface to hardware, why
> doesn't it have its own struct device that it creates and manages and
> uses instead?  In my quick glance, that feels like the real solution
> here instead of just "I hope this pointer is going to be valid" like it
> lives with today.  Odds are you can't remove a rtkit device at runtime,
> so no one has noticed this yet...

Well, they're all embedded into the SoC, yes.

RTKit is Apple's firmware RTOS, and also the name for the
semi-standardized mailbox/shared-memory interface shared by different
firmwares using it. The Linux code to drive it doesn't create its own
"struct device" because the rtkit code is just library code that is
extended by the downstream drivers (like mine). How each driver
interacts with rtkit varies widely... NVMe almost doesn't at all other
than for power management, there is actually a downstream "rtkit-helper"
driver that is a proper standalone device wrapper for one case (MTP)
where it really doesn't need to interact at all... in my case with the
GPU, almost everything is shared memory and doorbells over the mailbox.
Other drivers like DCP actually send pointers over multiple mailbox
endpoints, or do most of their data exchange directly over messages like
that (SMC).

So in a way, if we consider it driver library code, it's not
unreasonable for RTKit to require that the device you pass it outlives
it. Certainly, if the device is getting unbound from your driver, you'd
need to tear down RTKit as part of that in any reasonable situation.

>> I know it's hard to review without examples, but I also can't just post
>> the driver and everything else as one series now, there's still a lot to
>> be improved and fixed and I'm working with the Rust folks on figuring
>> out a roadmap for that... and waiting until "everything" is ready and
>> perfect would mean we don't get anything done in the meantime and fall
>> into a pit of endless rebasing and coordinating downstream trees, which
>> also isn't good...
> 
> Yeah, it's a chicken and egg issue right now, no worries, I understand.
> This is going to take some cycles to get right.
> 

Thank you ^^

~~ Lina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ