netdev - Re: [PATCH net-next v7 1/5] rust: core abstractions for network PHY drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <61f93419-396d-4592-b28b-9c681952a873@lunn.ch>
Date: Fri, 17 Nov 2023 14:34:10 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: fujita.tomonori@...il.com, benno.lossin@...ton.me,
	miguel.ojeda.sandonis@...il.com, netdev@...r.kernel.org,
	rust-for-linux@...r.kernel.org, tmgross@...ch.edu,
	wedsonaf@...il.com
Subject: Re: [PATCH net-next v7 1/5] rust: core abstractions for network PHY
 drivers

> And you would also update the struct invariant accordingly:
> 
> /// # Invariants
> ///
> /// Referencing a `phy_device` using this struct asserts that the X
> /// mutex is held.
> #[repr(transparent)]
> pub struct Device(Opaque<bindings::phy_device>);
> 
> 
> 
> 
> 
> > +// During the calls to most functions in [`Driver`], the C side (`PHYLIB`) holds a lock that is
> > +// unique for every instance of [`Device`]. `PHYLIB` uses a different serialization technique for
> > +// [`Driver::resume`] and [`Driver::suspend`]: `PHYLIB` updates `phy_device`'s state with
> > +// the lock held, thus guaranteeing that [`Driver::resume`] has exclusive access to the instance.
> > +// [`Driver::resume`] and [`Driver::suspend`] also are called where only one thread can access
> > +// to the instance.
> 
> I used "X mutex" as an example for the synchronization mechanism in the
> above snippets, but it sounds like its more complicated than that? Here
> are some possible alternatives I could come up with:

So X would be phy_device->lock.

Suspend and resume don't have this lock held. I don't actually
remember the details, but there is an email discussion with Russell
King which does explain the problem we had, and why we think it is
safe to not hold the lock.

> /// # Invariants
> ///
> /// Referencing a `phy_device` using this struct asserts that the X
> /// mutex is held, or that the reference has exclusive access to the
> /// entire `phy_device`.
> #[repr(transparent)]
> pub struct Device(Opaque<bindings::phy_device>);

You can never have exclusive access to the entire phy_device, because
it contains a mutex. Other threads can block on that mutex, which
involves changing the linked list in the mutex.

But that is also a pretty common pattern, put the mutex inside the
structure it protects. So when you say 'exclusive access to the entire
`phy_device`' you actually mean excluding mutex, spinlocks, atomic
variables, etc?

> /// Referencing a `phy_device` using this struct asserts exclusive
> /// access to the following fields: phy_id, state, speed, duplex. And
> /// read access to the following fields: link, autoneg_complete,
> /// autoneg.
> #[repr(transparent)]
> pub struct Device(Opaque<bindings::phy_device>);

For the Rust code, you can maybe do this. But the Rust code calls into
C helpers to get the real work done, and they expect to have access to
pretty much everything in phy_device.

> /// # Invariants
> ///
> /// Referencing a `phy_device` using this struct asserts that the user
> /// is inside a Y scope as defined in Documentation/foo/bar.
> #[repr(transparent)]
> pub struct Device(Opaque<bindings::phy_device>);

There is no such documentation that i know of, except it does get
repeated again and again on the mailling lists. Its tribal knowledge.

> But I don't know how these things are actually synchronized. Maybe
> it is some sixth option. I would be happy to help draft these safety
> comments once the actual synchronization mechanism is clear to me.

The model is: synchronisation is not the drivers problem.

With a few years of experience reviewing drivers, you notice that
there are a number of driver writers who have no idea about
locking. They don't even think about it. So where possible, its best
to hide all the locking from them in the core. The core is in theory
developed by developers who do mostly understand locking and have a
better chance of getting it right. Its also exercised a lot more,
since its shared by all drivers.

My experience with this one Rust driver so far is that Rust developers
have problems accepting that its not the drivers problem.

	Andrew