lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e17e3aa2c9ac1d6e410f66986da3c41efa9f7462.camel@redhat.com>
Date:   Tue, 09 May 2023 17:14:38 -0500
From:   Crystal Wood <swood@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, Ben Segall <bsegall@...gle.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>,
        John Stultz <jstultz@...gle.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>
Subject: Re: [PATCH v2 1/4] sched/core: Provide sched_rtmutex() and expose
 sched work helpers

On Wed, 2023-05-03 at 15:20 +0200, Peter Zijlstra wrote:
> On Thu, Apr 27, 2023 at 01:19:34PM +0200, Sebastian Andrzej Siewior wrote:
> > From: Thomas Gleixner <tglx@...utronix.de>
> > 
> > schedule() invokes sched_submit_work() before scheduling and
> > sched_update_worker() afterwards to ensure that queued block requests
> > are
> > flushed and the (IO)worker machineries can instantiate new workers if
> > required. This avoids deadlocks and starvation.
> > 
> > With rt_mutexes this can lead to subtle problem:
> > 
> >   When rtmutex blocks current::pi_blocked_on points to the rtmutex it
> >   blocks on. When one of the functions in sched_submit/resume_work()
> >   contends on a rtmutex based lock then that would corrupt
> >   current::pi_blocked_on.
> > 
> > Make it possible to let rtmutex issue the calls outside of the slowpath,
> > i.e. when it is guaranteed that current::pi_blocked_on is NULL, by:
> > 
> >   - Exposing sched_submit_work() and moving the task_running() condition
> >     into schedule()
> > 
> >   - Renamimg sched_update_worker() to sched_resume_work() and exposing
> > it
> >     too.
> > 
> >   - Providing sched_rtmutex() which just does the inner loop of
> > scheduling
> >     until need_resched() is not longer set. Split out the loop so this
> > does
> >     not create yet another copy.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> 
> Urgh, so I really don't like this.
> 
> The end result is something like:
> 
>         rt_mutex_lock()
>           sched_submit_work();
>             // a nested rt_mutex_lock() here will not clobber
>             // ->pi_blocked_on because it's not set yet.
> 
>           task_blocks_on_rt_mutex();
>             tsk->pi_blocked_on = waiter;
>             rt_mutex_enqueue(lock, waiter); <-- the real problem
> 
>           rt_mutex_slowlock_block();
>             schedule_rtmutex();
> 
>           sched_resume_work();
> 
> And all of this it not just because tsk->pi_blocked_on, but mostly
> because of task_blocks_on_rt_mutex() enqueueing the waiter. The whole
> enqueue thing is what makes the 'simple' solution of saving/restoring
> tsk->pi_blocked_on not work.
> 
> Basically the pi_blocked_on curruption is a side effect, not the
> fundamental issue. One task having two waiters registered is the bigger
> issue.

Where do you see pi_blocked_on being saved/restored?  The whole point of
this patchset is to deal with sched_submit_work() before anything has
been done on the "outer" lock acquisition (not just pi_blocked_on, but
also enqueuing) other than failing the fast path.

> Now, sched_submit_work() could also use (regular) mutex -- after all
> it's a fully preemptible context. And then we're subject to the 'same'
> problem but with tsk->blocked_on (DEBUG_MUTEXES=y).

It's fully preemptible but it still shouldn't be doing things that would
block on non-RT.  That'd already be broken for a number of reasons (task
state corruption, infinite recursion if current->plug isn't cleared
before doing whatever causes another standard schedule(), etc).

-Crystal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ