[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y/jWtVr7JK/YKcsE@kroah.com>
Date: Fri, 24 Feb 2023 16:24:37 +0100
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Asahi Lina <lina@...hilina.net>
Cc: Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>,
Wedson Almeida Filho <wedsonaf@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Joerg Roedel <joro@...tes.org>,
Hector Martin <marcan@...can.st>,
Sven Peter <sven@...npeter.dev>, Arnd Bergmann <arnd@...db.de>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Alyssa Rosenzweig <alyssa@...enzweig.io>,
Neal Gompa <neal@...pa.dev>, rust-for-linux@...r.kernel.org,
linux-kernel@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait
On Fri, Feb 24, 2023 at 11:43:54PM +0900, Asahi Lina wrote:
>
>
> On 2023/02/24 23:11, Greg Kroah-Hartman wrote:
> > Thanks for the detailed rust explainations, I'd like to just highlight
> > one thing:
> >
> > On Fri, Feb 24, 2023 at 10:15:12PM +0900, Asahi Lina wrote:
> >> On 24/02/2023 20.23, Greg Kroah-Hartman wrote:
> >>> And again, why are bindings needed for a "raw" struct device at all?
> >>> Shouldn't the bus-specific wrappings work better?
> >>
> >> Because lots of kernel subsystems need to be able to accept "any" device
> >> and don't care about the bus! That's what this is for.
> >
> > That's great, but:
> >
> >> All the bus
> >> wrappers would implement this so they can be used as an argument for all
> >> those subsystems (plus a generic one when you just need to pass around
> >> an actual owned generic reference and no longer need bus-specific
> >> operations - you can materialize that out of a RawDevice impl, which is
> >> when get_device() would be called). That's why I'm introducing this now,
> >> because both io_pgtable and rtkit need to take `struct device` pointers
> >> on the C side so we need some "generic struct device" view on the Rust side.
> >
> > In looking at both ftkit and io_pgtable, those seem to be good examples
> > of how "not to use a struct device", so trying to make safe bindings
> > from Rust to these frameworks is very ironic :)
>
> And this is why I want to use Rust, and why writing the abstractions for
> C code is so difficult... Rust encodes all these rules in the type
> system, but C doesn't, and so many kernel APIs don't document any of
> this or what the requirements are...
I totally agree, and is why I'm wanting to review this stuff.
> > rtkit takes a struct device pointer and then never increments it,
> > despite saving it off, which is unsafe. It then only uses it to print
> > out messages if things go wrong (or right in some cases), which is odd.
> > So it can get away from using a device pointer entirely, except for the
> > devm_apple_rtkit_init() call, which I doubt you want to call from rust
> > code, right?
>
> That sounds like we need to fix the C side to grab a reference ^^
I agree. Or remove the reference entirely.
> We do need to pass the device to the init function though
> (apple_rtkit_init(), this is in the SoC tree which I mentioned as a
> prequisite and already on the way to 6.3-rc1), since at the very least
> it has to pick up the mailbox and all that to initialize.
As I said, the rtkit code does not actually use the device pointer at
all except for some (I would say pointless) log messages. So we could
change the rtkit code either way, I would go for the removal.
> Alternatively we could say that the C API contract is that the user of
> rtkit has to own a reference, and then the Rust abstraction would have
> to take that reference to make a safe abstraction, but that doesn't
> sound like the better option.
Yeah, that feels odd.
> What do you recommend for things that want to print device-associated
> messages, if not holding a reference to the device?
If you aren't holding a reference to the device, that means you aren't
associated to it at all, so you better not be printing out anything
related to any device as that pointer could be invalid at any point in
time.
> Or did I
> misunderstand what you meant? Just pr_foo() isn't great because we have
> a lot of instances of rtkit and then you wouldn't know which device the
> messages are about...
Then the rtkit code needs to be changed to properly grab the reference
and actually use it for something other than just a log message. If it
only wants it for a log message, then let's just drop it and have the
rtkit code go quiet, as when kernel code is working properly, it should
be quiet. If something goes wrong, the code that called into rtkit can
print out a message based on the error return values.
I have no idea what "rtkit" is, if it's an interface to hardware, why
doesn't it have its own struct device that it creates and manages and
uses instead? In my quick glance, that feels like the real solution
here instead of just "I hope this pointer is going to be valid" like it
lives with today. Odds are you can't remove a rtkit device at runtime,
so no one has noticed this yet...
> > for io_pgtable, that's a bit messier, you want to pass in a device that
> > io_pgtable treats as a "device" but again, it is NEVER properly
> > reference counted, AND, it is only needed to try to figure out the bus
> > operations that dma memory should be allocated from for this device. So
> > what would be better to save off there would be a pointer to the bus,
> > which is constant and soon will be read-only so there are no lifetime
> > rules needed at all (see the major struct bus_type changes going into
> > 6.3-rc1 that will enable that to happen).
> >
> > So the two subsystems you want to call from rust code don't properly
> > handle the reference count of the object you are going to pass into it,
> > and only need it for debugging and iommu stuff, which is really only the
> > bus that the device is on, not good examples to start out with :)
>
> Well, they're two examples that are dependencies for the driver I wrote,
> and I don't think you want me picking easy examples with zero known
> upcoming users... ^^;;
Hey, this is good, I like it, no complaints!
> > Yeah, this is yack-shaving, sorry, but it's how we clean up core
> > subsystems for apis and implementations that are not really correct and
> > were not noticed at the time.
>
> I'm fine with helping fix all this, and I don't expect all the
> underlying C code to be perfect already either! I already fixed one
> locking bug in DRM and spent a lot of time trying to figure out lifetime
> rules there, but I didn't dig into rtkit/io_pgtable and didn't realize
> they don't take references properly...
The iomem code is under heavy-flux right now so let's see what 6.3-rc1
looks like to determine if this still needs to be resolved or not. I
hope the latest set of fixes/reworks for that subsystem land. Either
way, I'll look into what needs to be done there as obviously, the
current code is not correct.
> > Can we see some users of this code posted so I can see how struct device
> > is going to work in a rust driver? That's the thing I worry most about
> > the rust/C interaction here as we have two different ways of thinking
> > about reference counts from the two worlds and putting them together is
> > going to be "interesting", as can be seen here already.
>
> I linked a tree with everything in the cover letter ([4]), look in
> drivers/gpu/drm/asahi for the actual driver. But there are a lot of
> other dependencies that have to go in before that will compile
> (everything else in that branch...)
Sorry, it's hard to notice random github repos and branches, especially
when traveling, that's why we use email for review.
> I know it's hard to review without examples, but I also can't just post
> the driver and everything else as one series now, there's still a lot to
> be improved and fixed and I'm working with the Rust folks on figuring
> out a roadmap for that... and waiting until "everything" is ready and
> perfect would mean we don't get anything done in the meantime and fall
> into a pit of endless rebasing and coordinating downstream trees, which
> also isn't good...
Yeah, it's a chicken and egg issue right now, no worries, I understand.
This is going to take some cycles to get right.
thanks,
greg k-h
Powered by blists - more mailing lists