[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230816145818.GA989936@hirez.programming.kicks-ass.net>
Date: Wed, 16 Aug 2023 16:58:18 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: tglx@...utronix.de, linux-kernel@...r.kernel.org,
bsegall@...gle.com, boqun.feng@...il.com, swood@...hat.com,
bristot@...hat.com, dietmar.eggemann@....com, mingo@...hat.com,
jstultz@...gle.com, juri.lelli@...hat.com, mgorman@...e.de,
rostedt@...dmis.org, vschneid@...hat.com,
vincent.guittot@...aro.org, longman@...hat.com, will@...nel.org
Subject: Re: [PATCH 0/6] locking/rtmutex: Avoid PI state recursion through
sched_submit_work()
On Wed, Aug 16, 2023 at 03:46:30PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-08-16 12:19:04 [+0200], To Peter Zijlstra wrote:
> > On 2023-08-16 11:42:57 [+0200], Peter Zijlstra wrote:
> > > Not the same -- this is namespace_lock(), right? That's a regular rwsem
> > > afaict and that *should* be good. Clearly I messed something up.
> >
> > Most likely. I do see it also fom inode_lock() which does down_write().
> > I see it only to originate from rwbase_write_lock().
>
> I've been looking at what you did and what we had.
> I'm not sure if your additional debug/assert code figured it out or me
> looking at it, but in rwbase_write_lock() for down_write(), we had this
> beauty with a comment that you made go away:
>
> | * Take the rtmutex as a first step. For rwsem this will also
> | * invoke sched_submit_work() to flush IO and workers.
> | */
> | if (rwbase_rtmutex_lock_state(rtm, state))
>
Yeah, I can't quite remember why I killed that comment, I think because
it was either 'obvious' or confusing at the time. Or perhaps I was too
lazy to type, ... :-)
> for rw_semaphore we don't have any explicit rwbase_sched_submit_work()
> but relied on this one. Now that I look at it again,
> rwbase_rtmutex_lock_state() can succeed in the fast path so we don't
> flush/ invoke rwbase_pre_schedule().
> So you rightfully removed the comment as it was misleading but we do
> need that rwbase_pre_schedule() thingy before
> raw_spin_lock_irqsave(&rtm->wait_lock).
Right, it's both the fast-path and the fact that rt_mutex_slowlock()
will also do post_schedule() and reset the flag.
I've ended up with the below, but it is quite horrible.. but let me go
stare at the futex wreckage before trying to clean things up.
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -270,6 +270,7 @@ static int __sched rwbase_write_lock(str
out_unlock:
raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+ rt_mutex_post_schedule();
return 0;
}
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1412,8 +1412,30 @@ static inline void __downgrade_write(str
#define rwbase_restore_current_state() \
__set_current_state(TASK_RUNNING)
-#define rwbase_rtmutex_lock_state(rtm, state) \
- __rt_mutex_lock(rtm, state)
+/*
+ * Variant of __rt_mutex_lock() that unconditionally does
+ * rt_mutex_pre_schedule() and keeps it on success.
+ */
+static __always_inline int
+rwbase_rtmutex_lock_state(struct rt_mutex_base *lock, unsigned int state)
+{
+ unsigned long flags;
+ int ret;
+
+ rt_mutex_pre_schedule();
+
+ if (likely(rt_mutex_try_acquire(lock)))
+ return 0;
+
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ ret = __rt_mutex_slowlock_locked(lock, NULL, state);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+
+ if (ret)
+ rt_mutex_post_schedule();
+
+ return ret;
+}
#define rwbase_rtmutex_slowlock_locked(rtm, state) \
__rt_mutex_slowlock_locked(rtm, NULL, state)
Powered by blists - more mailing lists