linux-kernel - Re: [RFC][PATCH v14 2/7] locking/mutex: Rework task_struct::blocked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241217084643.GG35539@noisy.programming.kicks-ass.net>
Date: Tue, 17 Dec 2024 09:46:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelaf@...gle.com>,
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Valentin Schneider <vschneid@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
	Boqun Feng <boqun.feng@...il.com>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Metin Kaya <Metin.Kaya@....com>,
	Xuewen Yan <xuewen.yan94@...il.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Daniel Lezcano <daniel.lezcano@...aro.org>, kernel-team@...roid.com,
	Connor O'Brien <connoro@...gle.com>
Subject: Re: [RFC][PATCH v14 2/7] locking/mutex: Rework
 task_struct::blocked_on

On Mon, Dec 16, 2024 at 09:01:24PM -0800, John Stultz wrote:

> > +struct task_struct *proxy_handoff(struct mutex *lock);
> > +{
> > +       struct task_struct *next;
> > +
> > +       if (!sched_proxy_exec())
> > +               return NULL;
> > +
> > +       /*
> > +        * current->blocked_donor can't change if we can't schedule
> > +        * caller needs to do this, since its needs stabiliy of return value
> > +        */
> > +       lockdep_assert_preemption_disabled();
> > +       next = current->blocked_donor;
> > +       if (!next)
> > +               return NULL;
> > +
> > +       scoped_guard (task_rq_lock, next) {
> > +               /*
> > +                * current->blocked_donor had better be on the same CPU as current
> > +                */
> > +               WARN_ON_ONCE(scope.rq != this_rq());
> > +
> > +               scoped_guard (raw_spin_lock, next->blocked_lock) {
> > +                       /*
> > +                        * WARN_ON on this? How can this happen
> > +                        */
> > +                       if (next->blocked_on != lock)
> > +                               return NULL;
> > +               }
> > +
> > +               /*
> > +                * blocked_on relation is stable, since we hold both
> > +                * next->pi_lock and it's rq->lock
> > +                *
> > +                * OK -- we have a donor, it is blocked on the lock we're about
> > +                * to release and it cannot run on this CPU -- fixies are
> > +                * required.
> > +                *
> > +                * Dequeue the task, such that ttwu() can fix up the placement thing.
> > +                */
> > +               if (!is_cpu_allowed(next, cpu_of(scope.rq)))
> 
> nit, we'd want to check its not the wake_cpu so we try to return it so
> proxy migrations don't upset the tasks' original placement

I don't think that really matters, not doing this migration is probably
faster and then load balance will try and fix things again.

The thing is, you want this task to take over the lock and start running
ASAP, and we know *this* CPU is about to release the lock and then the
power of the block-chain is likely to make the next task run.

So less migration is more better, otherwise you then get to pull the
entire block chain over to whatever -- and you know how much 'fun' that
is.

> > +                       deactivate_task(scope.rq, next, DEQUEUE_SLEEP);
> > +       }
> > +
> > +       return next;
> > +}
> > +
> 
> Ok. I'll stare at all this a bit and see if I can give it a try.  I
> fret that this doesn't handle the case if wakeups on the task occur
> through other code paths? (So we still need BO_WAKING/NEEDS_RETURN to
> prevent us from running until we migrate back). I don't really have a
> specific case I can articulate, but my gut is telling me the problem
> will be w/ ww_mutexes as that was a major source of problems with the
> early versions of the patches that I believe tried to use logic
> similar to this.

Right, so I've not looked at ww_mutex specifically yet, but I thought to
have understood it was more or less the same problem there. If you've
got more details, please do share :-)