netdev - Re: [PATCH net-next v7 1/5] rust: core abstractions for network PHY drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231117154246.2571219-1-aliceryhl@google.com>
Date: Fri, 17 Nov 2023 15:42:46 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: andrew@...n.ch
Cc: aliceryhl@...gle.com, benno.lossin@...ton.me, fujita.tomonori@...il.com, 
	miguel.ojeda.sandonis@...il.com, netdev@...r.kernel.org, 
	rust-for-linux@...r.kernel.org, tmgross@...ch.edu, wedsonaf@...il.com
Subject: Re: [PATCH net-next v7 1/5] rust: core abstractions for network PHY drivers

Andrew Lunn <andrew@...n.ch> writes:
>> I used "X mutex" as an example for the synchronization mechanism in the
>> above snippets, but it sounds like its more complicated than that? Here
>> are some possible alternatives I could come up with:
> 
> So X would be phy_device->lock.
> 
> Suspend and resume don't have this lock held. I don't actually
> remember the details, but there is an email discussion with Russell
> King which does explain the problem we had, and why we think it is
> safe to not hold the lock.

So, what I would prefer to see as the struct invariant would be:

 * Either phy_device->lock is held,
 * or, we are in whatever situation you are in when you call suspend and
   resume.

The five suggestions I gave were my guesses as to what that situation
might be.

>> /// # Invariants
>> ///
>> /// Referencing a `phy_device` using this struct asserts that the X
>> /// mutex is held, or that the reference has exclusive access to the
>> /// entire `phy_device`.
>> #[repr(transparent)]
>> pub struct Device(Opaque<bindings::phy_device>);
> 
> You can never have exclusive access to the entire phy_device, because
> it contains a mutex. Other threads can block on that mutex, which
> involves changing the linked list in the mutex.
> 
> But that is also a pretty common pattern, put the mutex inside the
> structure it protects. So when you say 'exclusive access to the entire
> `phy_device`' you actually mean excluding mutex, spinlocks, atomic
> variables, etc?

No, I really meant exclusive access to everything. This suggestion is
where I guessed that the situation might be "we just created the
phy_device, and haven't yet shared it with anyone, so it's okay to
access it without the lock". But it sounds like that's not the case.

>> /// Referencing a `phy_device` using this struct asserts exclusive
>> /// access to the following fields: phy_id, state, speed, duplex. And
>> /// read access to the following fields: link, autoneg_complete,
>> /// autoneg.
>> #[repr(transparent)]
>> pub struct Device(Opaque<bindings::phy_device>);
> 
> For the Rust code, you can maybe do this. But the Rust code calls into
> C helpers to get the real work done, and they expect to have access to
> pretty much everything in phy_device.

Yeah, agreed, this is probably the suggestion I disliked the most.

>> /// # Invariants
>> ///
>> /// Referencing a `phy_device` using this struct asserts that the user
>> /// is inside a Y scope as defined in Documentation/foo/bar.
>> #[repr(transparent)]
>> pub struct Device(Opaque<bindings::phy_device>);
> 
> There is no such documentation that i know of, except it does get
> repeated again and again on the mailling lists. Its tribal knowledge.

Then, my suggestion would be to write down that tribal knowledge in the
safety comments.

>> But I don't know how these things are actually synchronized. Maybe
>> it is some sixth option. I would be happy to help draft these safety
>> comments once the actual synchronization mechanism is clear to me.
> 
> The model is: synchronisation is not the drivers problem.
> 
> With a few years of experience reviewing drivers, you notice that
> there are a number of driver writers who have no idea about
> locking. They don't even think about it. So where possible, its best
> to hide all the locking from them in the core. The core is in theory
> developed by developers who do mostly understand locking and have a
> better chance of getting it right. Its also exercised a lot more,
> since its shared by all drivers.
> 
> My experience with this one Rust driver so far is that Rust developers
> have problems accepting that its not the drivers problem.

I agree that locking should not be the driver's problem. If there's any
comment about locking in patch 5 of this patchset, then something has
gone wrong.

However, I don't see this file as part of the driver. It's a wrapper
around the core, which makes it part of the core. Like the C core, the
Rust abstractions will be shared by all Rust drivers. The correctness of
the unsafe code here is what ensures that drivers are not able to mess
up the locking in safe code.


Anyway. If you don't want to write down the tribal knowledge here, then
I suggest this simpler wording:

/// # Invariants
///
/// Referencing a `phy_device` using this struct asserts that you are in
/// a context where all methods defined on this struct are safe to call.
#[repr(transparent)]
pub struct Device(Opaque<bindings::phy_device>);

This invariant is much less precise, but at least it is correct.

Other safety comments may then be:

	/// Gets the id of the PHY.
	pub fn phy_id(&self) -> u32 {
	    let phydev = self.0.get();
	    // SAFETY: The struct invariant ensures that we may access
	    // this field without additional synchronization.
	    unsafe { (*phydev).phy_id }
	}

And:

	unsafe extern "C" fn soft_reset_callback(
	    phydev: *mut bindings::phy_device,
	) -> core::ffi::c_int {
	    from_result(|| {
	        // SAFETY: This callback is called only in contexts
		// where we hold `phy_device->lock`, so the accessors on
		// `Device` are okay to call.
	        let dev = unsafe { Device::from_raw(phydev) };
	        T::soft_reset(dev)?;
	        Ok(0)
	    })
	}

And:

	unsafe extern "C" fn resume_callback(phydev: *mut bindings::phy_device) -> core::ffi::c_int {
	    from_result(|| {
	        // SAFETY: The C core code ensures that the accessors on
		// `Device` are okay to call even though `phy_device->lock`
		// might not be held.
	        let dev = unsafe { Device::from_raw(phydev) };
	        T::resume(dev)?;
	        Ok(0)
	    })
	}

Alice