linux-kernel - Re: [PATCH v1 2/2] rust: Add read_poll_timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <DCD51BP7YXJV.3BLY6YJKGC58W@kernel.org>
Date: Wed, 27 Aug 2025 12:29:00 +0200
From: "Danilo Krummrich" <dakr@...nel.org>
To: "FUJITA Tomonori" <fujita.tomonori@...il.com>
Cc: <a.hindborg@...nel.org>, <alex.gaynor@...il.com>, <ojeda@...nel.org>,
 <aliceryhl@...gle.com>, <anna-maria@...utronix.de>,
 <bjorn3_gh@...tonmail.com>, <boqun.feng@...il.com>, <frederic@...nel.org>,
 <gary@...yguo.net>, <jstultz@...gle.com>, <linux-kernel@...r.kernel.org>,
 <lossin@...nel.org>, <lyude@...hat.com>, <rust-for-linux@...r.kernel.org>,
 <sboyd@...nel.org>, <tglx@...utronix.de>, <tmgross@...ch.edu>,
 <acourbot@...dia.com>, <daniel.almeida@...labora.com>
Subject: Re: [PATCH v1 2/2] rust: Add read_poll_timeout_atomic function

On Wed Aug 27, 2025 at 11:00 AM CEST, Danilo Krummrich wrote:
> On Wed Aug 27, 2025 at 2:14 AM CEST, FUJITA Tomonori wrote:
>> On Tue, 26 Aug 2025 16:12:44 +0200
>> "Danilo Krummrich" <dakr@...nel.org> wrote:
>>
>>> On Thu Aug 21, 2025 at 5:57 AM CEST, FUJITA Tomonori wrote:
>>>> +pub fn read_poll_timeout_atomic<Op, Cond, T>(
>>>> +    mut op: Op,
>>>> +    mut cond: Cond,
>>>> +    delay_delta: Delta,
>>>> +    timeout_delta: Delta,
>>>> +) -> Result<T>
>>>> +where
>>>> +    Op: FnMut() -> Result<T>,
>>>> +    Cond: FnMut(&T) -> bool,
>>>> +{
>>>> +    let mut left_ns = timeout_delta.as_nanos();
>>>> +    let delay_ns = delay_delta.as_nanos();
>>>> +
>>>> +    loop {
>>>> +        let val = op()?;
>>>> +        if cond(&val) {
>>>> +            // Unlike the C version, we immediately return.
>>>> +            // We know the condition is met so we don't need to check again.
>>>> +            return Ok(val);
>>>> +        }
>>>> +
>>>> +        if left_ns < 0 {
>>>> +            // Unlike the C version, we immediately return.
>>>> +            // We have just called `op()` so we don't need to call it again.
>>>> +            return Err(ETIMEDOUT);
>>>> +        }
>>>> +
>>>> +        if !delay_delta.is_zero() {
>>>> +            udelay(delay_delta);
>>>> +            left_ns -= delay_ns;
>>>> +        }
>>>> +
>>>> +        cpu_relax();
>>>> +        left_ns -= 1;
>>> 
>>> How do we know that each iteration costs 1ns? To make it even more obvious, we
>>> don't control the implementation of cond(). Shouldn't we use ktime for this?
>>
>> The C version used to use ktime but it has been changed not to:
>>
>> 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")
>
> Ick! That's pretty unfortunate -- no ktime then.
>
> But regardless of that, the current implementation (this and the C one) lack
> clarity.
>
> The nanosecond decrement is rather negligible, the real timeout reduction comes
> from the delay_delta. Given that, and the fact that we can't use ktime, this
> function shouldn't take a raw timeout value, since we can't guarantee the
> timeout anyways.

Actually, let me put it in other words:

	let val = read_poll_timeout_atomic(
	    || {
	        // Fetch the offset to read from from the HW.
	        let offset = io.read32(0x1000);
	
	        // HW needs a break for some odd reason.
	        udelay(100);
	
	        // Read the actual value.
	        io.try_read32(offset)
	    },
	    |val: &u32| *val == HW_READY,
	    Delta::from_micros(0),      // No delay, keep spinning.
	    Delta::from_millis(10),     // Timeout after 10ms.
	)?;

Seems like a fairly reasonable usage without knowing the implementation details
of read_poll_timeout_atomic(), right?

Except that if the hardware does not become ready, this will spin for 16.67
*minutes* -- in atomic context. Instead of the 10ms the user would expect.

This would be way less error prone if we do not provide a timeout value, but a
retry count.

> Instead, I think it makes much more sense to provide a retry count as function
> argument, such that the user can specify "I want a dealy of 100us, try it 100
> times".
>
> This way it is transparent to the caller that the timeout may be significantly
> more than 10ms depending on the user's implementation.
>
> As for doing this in C vs Rust: I don't think things have to align in every
> implementation detail. If we can improve things on the Rust side from the
> get-go, we should not stop ourselves from doing so, just because a similar C
> implementation is hard to refactor, due to having a lot of users already.