[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4dd93603-04fa-4da4-b867-bd12ece4b391@gmail.com>
Date: Sat, 12 Oct 2024 09:50:00 +0200
From: Dirk Behme <dirk.behme@...il.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Andreas Hindborg <a.hindborg@...nel.org>, Lyude Paul <lyude@...hat.com>,
Dirk Behme <dirk.behme@...bosch.com>, Miguel Ojeda <ojeda@...nel.org>,
Alex Gaynor <alex.gaynor@...il.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>, Alice Ryhl <aliceryhl@...gle.com>,
rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/14] hrtimer Rust API
On 12.10.24 09:41, Boqun Feng wrote:
> On Sat, Oct 12, 2024 at 07:19:41AM +0200, Dirk Behme wrote:
>> On 12.10.24 01:21, Boqun Feng wrote:
>>> On Fri, Oct 11, 2024 at 05:43:57PM +0200, Dirk Behme wrote:
>>>> Hi Andreas,
>>>>
>>>> Am 11.10.24 um 16:52 schrieb Andreas Hindborg:
>>>>>
>>>>> Dirk, thanks for reporting!
>>>>
>>>> :)
>>>>
>>>>> Boqun Feng <boqun.feng@...il.com> writes:
>>>>>
>>>>>> On Tue, Oct 01, 2024 at 02:37:46PM +0200, Dirk Behme wrote:
>>>>>>> On 18.09.2024 00:27, Andreas Hindborg wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> This series adds support for using the `hrtimer` subsystem from Rust code.
>>>>>>>>
>>>>>>>> I tried breaking up the code in some smaller patches, hopefully that will
>>>>>>>> ease the review process a bit.
>>>>>>>
>>>>>>> Just fyi, having all 14 patches applied I get [1] on the first (doctest)
>>>>>>> Example from hrtimer.rs.
>>>>>>>
>>>>>>> This is from lockdep:
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/locking/lockdep.c#n4785
>>>>>>>
>>>>>>> Having just a quick look I'm not sure what the root cause is. Maybe mutex in
>>>>>>> interrupt context? Or a more subtle one?
>>>>>>
>>>>>> I think it's calling mutex inside an interrupt context as shown by the
>>>>>> callstack:
>>>>>>
>>>>>> ] __mutex_lock+0xa0/0xa4
>>>>>> ] ...
>>>>>> ] hrtimer_interrupt+0x1d4/0x2ac
>>>>>>
>>>>>> , it is because:
>>>>>>
>>>>>> +//! struct ArcIntrusiveTimer {
>>>>>> +//! #[pin]
>>>>>> +//! timer: Timer<Self>,
>>>>>> +//! #[pin]
>>>>>> +//! flag: Mutex<bool>,
>>>>>> +//! #[pin]
>>>>>> +//! cond: CondVar,
>>>>>> +//! }
>>>>>>
>>>>>> has a Mutex<bool>, which actually should be a SpinLockIrq [1]. Note that
>>>>>> irq-off is needed for the lock, because otherwise we will hit a self
>>>>>> deadlock due to interrupts:
>>>>>>
>>>>>> spin_lock(&a);
>>>>>> > timer interrupt
>>>>>> spin_lock(&a);
>>>>>>
>>>>>> Also notice that the IrqDisabled<'_> token can be simply created by
>>>>>> ::new(), because irq contexts should guarantee interrupt disabled (i.e.
>>>>>> we don't support nested interrupts*).
>>>>>
>>>>> I updated the example based on the work in [1]. I think we need to
>>>>> update `CondVar::wait` to support waiting with irq disabled.
>>>>
>>>> Yes, I agree. This answers one of the open questions I had in the discussion
>>>> with Boqun :)
>>>>
>>>> What do you think regarding the other open question: In this *special* case
>>>> here, what do you think to go *without* any lock? I mean the 'while *guard
>>>> != 5' loop in the main thread is read only regarding guard. So it doesn't
>>>> matter if it *reads* the old or the new value. And the read/modify/write of
>>>> guard in the callback is done with interrupts disabled anyhow as it runs in
>>>> interrupt context. And with this can't be interrupted (excluding nested
>>>> interrupts). So this modification of guard doesn't need to be protected from
>>>> being interrupted by a lock if there is no modifcation of guard "outside"
>>>> the interupt locked context.
>>>>
>>>> What do you think?
>>>>
>>>
>>> Reading while there is another CPU is writing is data-race, which is UB.
>>
>> Could you help to understand where exactly you see UB in Andreas' 'while
>> *guard != 5' loop in case no locking is used? As mentioned I'm under the
>
> Sure, but could you provide the code of what you mean exactly, if you
> don't use a lock here, you cannot have a guard. I need to the exact code
> to point out where the compiler may "mis-compile" (a result of being
> UB).
I thought we are talking about anything like
#[pin_data]
struct ArcIntrusiveTimer {
#[pin]
timer: Timer<Self>,
#[pin]
- flag: SpinLockIrq<u64>,
+ flag: u64,
#[pin]
cond: CondVar,
}
?
Best regards
Dirk
>> impression that it doesn't matter if the old or new guard value is read in
>> this special case.
>>
>
> For one thing, if the compiler believes no one is accessing the value
> because the code uses an immutable reference, it can "optimize" the loop
> away:
>
> while *var != 5 {
> do_something();
> }
>
> into
>
> if *var != 5 {
> loop { do_something(); }
> }
>
> But as I said, I need to see the exact code to suggest a relevant
> mis-compile, and note that sometimes, even mis-compile seems impossible
> at the moment, a UB is a UB, compilers are free to do anything they
> want (or don't want). So "mis-compile" is only helping we understand the
> potential result of a UB.
>
> Regards,
> Boqun
>
>> Best regards
>>
>> Dirk
>>
>>
>>> Regards,
>>> Boqun
>>>
>>>> Thanks
>>>>
>>>> Dirk
>>>>
>>>>
>>>>> Without
>>>>> this, when we get back from `bindings::schedule_timeout` in
>>>>> `CondVar::wait_internal`, interrupts are enabled:
>>>>>
>>>>> ```rust
>>>>> use kernel::{
>>>>> hrtimer::{Timer, TimerCallback, TimerPointer, TimerRestart},
>>>>> impl_has_timer, new_condvar, new_spinlock, new_spinlock_irq,
>>>>> irq::IrqDisabled,
>>>>> prelude::*,
>>>>> sync::{Arc, ArcBorrow, CondVar, SpinLock, SpinLockIrq},
>>>>> time::Ktime,
>>>>> };
>>>>>
>>>>> #[pin_data]
>>>>> struct ArcIntrusiveTimer {
>>>>> #[pin]
>>>>> timer: Timer<Self>,
>>>>> #[pin]
>>>>> flag: SpinLockIrq<u64>,
>>>>> #[pin]
>>>>> cond: CondVar,
>>>>> }
>>>>>
>>>>> impl ArcIntrusiveTimer {
>>>>> fn new() -> impl PinInit<Self, kernel::error::Error> {
>>>>> try_pin_init!(Self {
>>>>> timer <- Timer::new(),
>>>>> flag <- new_spinlock_irq!(0),
>>>>> cond <- new_condvar!(),
>>>>> })
>>>>> }
>>>>> }
>>>>>
>>>>> impl TimerCallback for ArcIntrusiveTimer {
>>>>> type CallbackTarget<'a> = Arc<Self>;
>>>>> type CallbackTargetParameter<'a> = ArcBorrow<'a, Self>;
>>>>>
>>>>> fn run(this: Self::CallbackTargetParameter<'_>, irq: IrqDisabled<'_>) -> TimerRestart {
>>>>> pr_info!("Timer called\n");
>>>>> let mut guard = this.flag.lock_with(irq);
>>>>> *guard += 1;
>>>>> this.cond.notify_all();
>>>>> if *guard == 5 {
>>>>> TimerRestart::NoRestart
>>>>> }
>>>>> else {
>>>>> TimerRestart::Restart
>>>>>
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> impl_has_timer! {
>>>>> impl HasTimer<Self> for ArcIntrusiveTimer { self.timer }
>>>>> }
>>>>>
>>>>>
>>>>> let has_timer = Arc::pin_init(ArcIntrusiveTimer::new(), GFP_KERNEL)?;
>>>>> let _handle = has_timer.clone().schedule(Ktime::from_ns(200_000_000));
>>>>>
>>>>> kernel::irq::with_irqs_disabled(|irq| {
>>>>> let mut guard = has_timer.flag.lock_with(irq);
>>>>>
>>>>> while *guard != 5 {
>>>>> pr_info!("Not 5 yet, waiting\n");
>>>>> has_timer.cond.wait(&mut guard); // <-- we arrive back here with interrupts enabled!
>>>>> }
>>>>> });
>>>>> ```
>>>>>
>>>>> I think an update of `CondVar::wait` should be part of the patch set [1].
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Andreas
>>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/rust-for-linux/20240916213025.477225-1-lyude@redhat.com/
>>>>>
>>>>>
>>>>
>>
Powered by blists - more mailing lists