linux-kernel - Re: [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160614090934.GE5981@e106622-lin>
Date:	Tue, 14 Jun 2016 10:09:34 +0100
From:	Juri Lelli <juri.lelli@....com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	mingo@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
	xlpang@...hat.com, linux-kernel@...r.kernel.org,
	mathieu.desnoyers@...icios.com, jdesfossez@...icios.com,
	bristot@...hat.com, Ingo Molnar <mingo@...hat.com>
Subject: Re: [RFC][PATCH 1/8] rtmutex: Deboost before waking up the top waiter

Hi,

I've got only nitpicks for the changelog. Otherwise the patch looks good
to me (and yes, without it bw inheritance would be a problem).

On 07/06/16 21:56, Peter Zijlstra wrote:
> From: Xunlei Pang <xlpang@...hat.com>
> 
> We should deboost before waking the high-prio task, such that
> we don't run two tasks with the same "state"(priority, deadline,
                                              ^
                                            space

> sched_class, etc) during the period between the end of wake_up_q()
> and the end of rt_mutex_adjust_prio().
> 
> As "Peter Zijlstra" said:
> Its semantically icky to have the two tasks running off the same

s/Its/It's/

> state and practically icky when you consider bandwidth inheritance --
> where the boosted task wants to explicitly modify the state of the
> booster. In that latter case you really want to unboost before you
> let the booster run again.
> 
> But this however can lead to prio-inversion if current would get
> preempted after the deboost but before waking our high-prio task,
> hence we disable preemption before doing deboost, and enabling it

s/enabling/re-enable/

> after the wake up is over.
> 
> The patch fixed the logic, and introduced rt_mutex_postunlock()

s/The/This/
s/fixed/fixes/
s/introduced/introduces/

> to do some code refactor.
> 
> Most importantly however; this change ensures pointer stability for
> the next patch, where we have rt_mutex_setprio() cache a pointer to
> the top-most waiter task. If we, as before this change, do the wakeup
> first and then deboost, this pointer might point into thin air.
> 
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Juri Lelli <juri.lelli@....com>
> Suggested-by: Peter Zijlstra <peterz@...radead.org>
> [peterz: Changelog]
> Signed-off-by: Xunlei Pang <xlpang@...hat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Link: http://lkml.kernel.org/r/1461659449-19497-1-git-send-email-xlpang@redhat.com

Do we have any specific tests for this set? I'm running mine.

Best,

- Juri

> ---
> 
>  kernel/futex.c                  |    5 ++---
>  kernel/locking/rtmutex.c        |   28 ++++++++++++++++++++++++----
>  kernel/locking/rtmutex_common.h |    1 +
>  3 files changed, 27 insertions(+), 7 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1336,9 +1336,8 @@ static int wake_futex_pi(u32 __user *uad
>  	 * scheduled away before the wake up can take place.
>  	 */
>  	spin_unlock(&hb->lock);
> -	wake_up_q(&wake_q);
> -	if (deboost)
> -		rt_mutex_adjust_prio(current);
> +
> +	rt_mutex_postunlock(&wake_q, deboost);
>  
>  	return 0;
>  }
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1390,12 +1390,32 @@ rt_mutex_fastunlock(struct rt_mutex *loc
>  	} else {
>  		bool deboost = slowfn(lock, &wake_q);
>  
> -		wake_up_q(&wake_q);
> +		rt_mutex_postunlock(&wake_q, deboost);
> +	}
> +}
> +
>  
> -		/* Undo pi boosting if necessary: */
> -		if (deboost)
> -			rt_mutex_adjust_prio(current);
> +/*
> + * Undo pi boosting (if necessary) and wake top waiter.
> + */
> +void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost)
> +{
> +	/*
> +	 * We should deboost before waking the top waiter task such that
> +	 * we don't run two tasks with the 'same' priority. This however
> +	 * can lead to prio-inversion if we would get preempted after
> +	 * the deboost but before waking our high-prio task, hence the
> +	 * preempt_disable.
> +	 */
> +	if (deboost) {
> +		preempt_disable();
> +		rt_mutex_adjust_prio(current);
>  	}
> +
> +	wake_up_q(wake_q);
> +
> +	if (deboost)
> +		preempt_enable();
>  }
>  
>  /**
> --- a/kernel/locking/rtmutex_common.h
> +++ b/kernel/locking/rtmutex_common.h
> @@ -111,6 +111,7 @@ extern int rt_mutex_finish_proxy_lock(st
>  extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
>  extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
>  				  struct wake_q_head *wqh);
> +extern void rt_mutex_postunlock(struct wake_q_head *wake_q, bool deboost);
>  extern void rt_mutex_adjust_prio(struct task_struct *task);
>  
>  #ifdef CONFIG_DEBUG_RT_MUTEXES
> 
>