linux-kernel - Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190726211623.GP29109@jcartwri.amer.corp.natinst.com>
Date:   Fri, 26 Jul 2019 21:16:24 +0000
From:   Julia Cartwright <julia@...com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     LKML <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Siewior <bigeasy@...utronix.de>,
        Anna-Maria Gleixner <anna-maria@...utronix.de>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Juergen Gross <jgross@...e.com>
Subject: Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for
 hrtimer sleepers on RT

On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> 
> On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
> hard interrupt expiry mode are moved into soft interrupt context either for
> latency reasons or because the hrtimer callback takes regular spinlocks or
> invokes other functions which are not suitable for hard interrupt context
> on PREEMPT_RT.
> 
> The hrtimer_sleeper callback is RT compatible in hard interrupt context,
> but there is a latency concern: Untrusted userspace can spawn many threads
> which arm timers for the same expiry time on the same CPU. On expiry that
> causes a latency spike due to the wakeup of a gazillion threads.
> 
> OTOH, priviledged real-time user space applications rely on the low latency
> of hard interrupt wakeups. These syscall related wakeups are all based on
> hrtimer sleepers.
> 
> If the current task is in a real-time scheduling class, mark the mode for
> hard interrupt expiry.
> 
> [ tglx: Split out of a larger combo patch. Added changelog ]
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>  kernel/time/hrtimer.c |   24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
>  static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
>  				   clockid_t clock_id, enum hrtimer_mode mode)
>  {
> +	/*
> +	 * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> +	 * marked for hard interrupt expiry mode are moved into soft
> +	 * interrupt context either for latency reasons or because the
> +	 * hrtimer callback takes regular spinlocks or invokes other
> +	 * functions which are not suitable for hard interrupt context on
> +	 * PREEMPT_RT.
> +	 *
> +	 * The hrtimer_sleeper callback is RT compatible in hard interrupt
> +	 * context, but there is a latency concern: Untrusted userspace can
> +	 * spawn many threads which arm timers for the same expiry time on
> +	 * the same CPU. That causes a latency spike due to the wakeup of
> +	 * a gazillion threads.
> +	 *
> +	 * OTOH, priviledged real-time user space applications rely on the
> +	 * low latency of hard interrupt wakeups. If the current task is in
> +	 * a real-time scheduling class, mark the mode for hard interrupt
> +	 * expiry.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> +			mode |= HRTIMER_MODE_HARD;

Because this ends up sampling the tasks' scheduling parameters only at
the time of enqueue, it doesn't take into consideration whether or not
the task maybe holding a PI lock and later be boosted if contended by an
RT thread.

Am I correct in assuming there is an induced inversion here in this
case, because the deferred wakeup mechanism isn't part of the PI chain?

If so, is this just to be an accepted limitation at this point?  Is the
intent to argue this away as bad RT application design? :)

   Julia