[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1809261002510.2974@nanos.tec.linutronix.de>
Date: Wed, 26 Sep 2018 10:04:38 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>
cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org,
Daniel Wagner <daniel.wagner@...mens.com>,
Will Deacon <will.deacon@....com>, x86@...nel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>,
Boqun Feng <boqun.feng@...il.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [Problem] Cache line starvation
On Wed, 26 Sep 2018, Peter Zijlstra wrote:
> On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote:
> > Instrumentation show always the picture:
> >
> > CPU0 CPU1
> > => do_syscall_64 => do_syscall_64
> > => SyS_ptrace => syscall_slow_exit_work
> > => ptrace_check_attach => ptrace_do_notify / rt_read_unlock
> > => wait_task_inactive rt_spin_lock_slowunlock()
> > -> while task_running() __rt_mutex_unlock_common()
> > / check_task_state() mark_wakeup_next_waiter()
> > | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock);
> > | . .
> > | raw_spin_unlock_irq(&p->pi_lock); .
> > \ cpu_relax() .
> > - .
> > *IRQ* <lock acquired>
> >
> > In the error case we observe that the while() loop is repeated more than
> > 5000 times which indicates that the pi_lock can be acquired. CPU1 on the
> > other side does not make progress waiting for the same lock with interrupts
> > disabled.
>
> I've tried really hard to reproduce this in userspace, but so far have
> not had any luck. Looks to be a real tricky thing to make happen.
It's probably equally tricky to write a reproducer as it was to instrument
the thing. I assume it's a combination of code sequences on both CPUs which
involve other (unrelated) lock instructions on the way.
Thanks,
tglx
Powered by blists - more mailing lists