lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a00dcb1-8368-c3ce-6fcb-5932f5eb50a5@asahilina.net>
Date:   Sat, 25 Feb 2023 01:23:11 +0900
From:   Asahi Lina <lina@...hilina.net>
To:     Robin Murphy <robin.murphy@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Miguel Ojeda <ojeda@...nel.org>,
        Alex Gaynor <alex.gaynor@...il.com>,
        Wedson Almeida Filho <wedsonaf@...il.com>,
        Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
        Björn Roy Baron <bjorn3_gh@...tonmail.com>,
        Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
        Hector Martin <marcan@...can.st>,
        Sven Peter <sven@...npeter.dev>, Arnd Bergmann <arnd@...db.de>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Alyssa Rosenzweig <alyssa@...enzweig.io>,
        Neal Gompa <neal@...pa.dev>, rust-for-linux@...r.kernel.org,
        linux-kernel@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH 2/5] rust: device: Add a minimal RawDevice trait

On 25/02/2023 00.14, Robin Murphy wrote:
> On 2023-02-24 14:48, Asahi Lina wrote:
>>
>>
>> On 2023/02/24 23:32, Robin Murphy wrote:
>>> FWIW the DMA API *has* to know which specific device it's operating
>>> with, since the relevant properties can and do vary even between
>>> different devices within a single bus_type (e.g. DMA masks).
>>>
>>> In the case of io-pgtable at least, there's no explicit refcounting
>>> since the struct device must be the one representing the physical
>>> platform/PCI/etc. device consuming the pagetable, so if that were to
>>> disappear from underneath its driver while the pagetable is still in
>>> use, things would already have gone very very wrong indeed :)
>>
>> There's no terribly good way to encode this relationship in safe Rust as
>> far as I know. So although it might be "obvious" (and I think my driver
>> can never violate it as it is currently designed), this means the Rust
>> abstraction will have to take the device reference if the C side does
>> not, because safe rust abstractions have to actually make these bugs
>> impossible and nothing stops a Rust driver from, say, stashing an
>> io_pgtable reference into a global and letting the device go away.
> 
> If someone did that, then simply holding a struct device reference 
> wouldn't guarantee much, since it only prevents the pointer itself from 
> becoming invalid - it still doesn't say any of the data *in* the 
> structure is still valid and "safe" for what a DMA API call might do 
> with it.
> 
> At the very least you'd probably have to somehow also guarantee that the 
> device has a driver bound (which is the closest thing to a general 
> indication of valid DMA ops across all architectures) and block it from 
> unbinding for the lifetime of the reference, but that would then mean a 
> simple driver which expects to tear down its io-pgtable from its .remove 
> callback could never be unbound due to the circular dependency :/

I'd like to hear from the other Rust folks about ideas on how to solve
this ^^. I think it might fit into the Revocable model for device
resources, but it's going to be a bit of an issue for my driver because
for me, the io_pgtable objects are deep in the Vm object that has its
own refcounting, and those are created and destroyed dynamically at
runtime, not just during device probe...

But it's also possible that we have to resign ourselves to a somewhat
leaky/unsound abstraction under the understanding that the C API has
requirements that are difficult to encode in the Rust type system...

There's actually a much longer story to tell here though... I've done a
lot of thinking about what "safe" means in the context of the kernel,
what the useful safety boundaries are, and things like that. Especially
within complex drivers, the answer is a lot greyer than what you tend to
get with Rust userspace code. But at the end of the day, Rust lets you
make things much safer and reduce the chances of bugs creeping in
dramatically, even if you have some slightly unsound abstractions and
other issues. Maybe I should write a long form blog post about this,
would people find that useful?

For whatever this anecdote is worth, my GPU driver is about 16000 lines
of Rust and has ~100 unsafe blocks, and has been in a few downstream
distro repos since December with actual users and nobody has ever
managed to oops it as far as I know, including other Mesa developers who
have been feeding it all kinds of broken command buffers and things like
that. It has survived OOMs leading to untested error paths, UAPI
fuzzing, and endless GPU faults/timeouts... and I definitely am not a
genius who writes bug-free code! The firmware itself is also very easy
to crash (raw unsafe pointers everywhere in shared memory), and crashes
are unrecoverable (you need to reboot), but I try to use Rust's type
system and ownership model to extend safety to that interface as much as
possible and I think people other than me have only managed to get the
GPU firmware to crash twice (both cache coherency related, I *think*
I've finally eliminated that issue at the root now). The uptime on my M2
streaming machine running a decent GPU workload for 20 hours or so
weekly is 44 days and counting... I think that old kernel even has some
minor memory leaks that I've since fixed (you *can* get those in Rust
with circular references), but it's so minor it's going to be at least a
few more months before it becomes a problem.

So I guess what I'm saying is that at the end of the day, if we can't
get an interface to be 100% safe and sound and usable, that's probably
okay. We're still getting a lot of safe mileage out of the other 99%! ^^

~~ Lina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ