linux-kernel - Re: [PATCH v4.4] x86/kconfig: Fall back to ticket spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181101091806.GB3178@hirez.programming.kicks-ass.net>
Date:   Thu, 1 Nov 2018 10:18:06 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Daniel Wagner <wagi@...om.org>
Cc:     stable@...r.kernel.org, linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Daniel Wagner <daniel.wagner@...mens.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v4.4] x86/kconfig: Fall back to ticket spinlocks

On Wed, Oct 31, 2018 at 09:14:58AM +0100, Daniel Wagner wrote:
> From: Daniel Wagner <daniel.wagner@...mens.com>
> 
> Sebastian writes:
> 
> """
> We reproducibly observe cache line starvation on a Core2Duo E6850 (2
> cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4
> cores).
> 
> The problem can be triggered with a v4.9-RT kernel by starting
> 
>     cyclictest -S -p98 -m  -i2000 -b 200
> 
> and as "load"
> 
>     stress-ng --ptrace 4
> 
> The reported maximal latency is usually less than 60us. If the problem
> triggers then values around 400us, 800us or even more are reported. The
> upperlimit is the -i parameter.
> 
> Reproduction with 4.9-RT is almost immediate on Core2Duo, ARM64 and SKL,
> but it took 7.5 hours to trigger on v4.14-RT on the Core2Duo.
> 
> Instrumentation show always the picture:
> 
> CPU0                                         CPU1
> => do_syscall_64                              => do_syscall_64
> => SyS_ptrace                                   => syscall_slow_exit_work
> => ptrace_check_attach                          => ptrace_do_notify / rt_read_unlock
> => wait_task_inactive                              rt_spin_lock_slowunlock()
>    -> while task_running()                         __rt_mutex_unlock_common()
>   /   check_task_state()                           mark_wakeup_next_waiter()
>  |     raw_spin_lock_irq(&p->pi_lock);             raw_spin_lock(&current->pi_lock);
>  |     .                                               .
>  |     raw_spin_unlock_irq(&p->pi_lock);               .
>   \  cpu_relax()                                       .
>    -                                                   .
>     *IRQ*                                          <lock acquired>
> 
> In the error case we observe that the while() loop is repeated more than
> 5000 times which indicates that the pi_lock can be acquired. CPU1 on the
> other side does not make progress waiting for the same lock with interrupts
> disabled.
> 
> This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ
> the other CPU is able to acquire pi_lock and the situation relaxes.
> """
> 
> This matches with the observeration for v4.4-rt on a Core2Duo E6850:
> 
> CPU 0:
> 
> - no progress for a very long time in rt_mutex_dequeue_pi):
> 
> stress-n-1931    0d..11  5060.891219: function:             __try_to_take_rt_mutex
> stress-n-1931    0d..11  5060.891219: function:                rt_mutex_dequeue
> stress-n-1931    0d..21  5060.891220: function:                rt_mutex_enqueue_pi
> stress-n-1931    0....2  5060.891220: signal_generate:      sig=17 errno=0 code=262148 comm=stress-ng-ptrac pid=1928 grp=1 res=1
> stress-n-1931    0d..21  5060.894114: function:             rt_mutex_dequeue_pi
> stress-n-1931    0d.h11  5060.894115: local_timer_entry:    vector=239
> 
> CPU 1:
> 
> - IRQ at 5060.894114 on CPU 1 followed by the IRQ on CPU 0
> 
> stress-n-1928    1....0  5060.891215: sys_enter:            NR 101 (18, 78b, 0, 0, 17, 788)
> stress-n-1928    1d..11  5060.891216: function:             __try_to_take_rt_mutex
> stress-n-1928    1d..21  5060.891216: function:                rt_mutex_enqueue_pi
> stress-n-1928    1d..21  5060.891217: function:             rt_mutex_dequeue_pi
> stress-n-1928    1....1  5060.891217: function:             rt_mutex_adjust_prio
> stress-n-1928    1d..11  5060.891218: function:                __rt_mutex_adjust_prio
> stress-n-1928    1d.h10  5060.894114: local_timer_entry:    vector=239
> 
> Thomas writes:
> 
> """
> This has nothing to do with RT. RT is merily exposing the
> problem in an observable way. The same issue happens with upstream, it's
> harder to trigger and it's harder to observe for obvious reasons.
> 
> If you read through the discussions [see the links below] then you
> really see that there is an upstream issue with the x86 qrlock
> implementation and Peter has posted fixes which resolve it, both at
> the practical and the theoretical level.
> """
> 
> Backporting all qspinlock related patches is very likely to introduce
> regressions on v4.4. Therefore, the recommended solution by Peter and
> Thomas is to drop back to ticket spinlocks for v4.4.
> 
> Link :https://lkml.kernel.org/r/20180921120226.6xjgr4oiho22ex75@linutronix.de
> Link: https://lkml.kernel.org/r/20180926110117.405325143@infradead.org
> Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Daniel Wagner <daniel.wagner@...mens.com>

Acked-by: Peter Zijlstra (Intel) <peterz@...radead.org>