lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Sep 2018 17:26:46 +0200
From:   Kurt Kanzenbach <kurt.kanzenbach@...utronix.de>
To:     Will Deacon <will.deacon@....com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Daniel Wagner <daniel.wagner@...mens.com>,
        Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Mark Rutland <mark.rutland@....com>
Subject: Re: [Problem] Cache line starvation

On Fri, Sep 28, 2018 at 11:05:21AM +0200, Kurt Kanzenbach wrote:
> Hi Thomas,
>
> On Thu, Sep 27, 2018 at 04:47:47PM +0200, Thomas Gleixner wrote:
> > On Thu, 27 Sep 2018, Kurt Kanzenbach wrote:
> > > On Thu, Sep 27, 2018 at 04:25:47PM +0200, Kurt Kanzenbach wrote:
> > > > However, the issue still triggers fine. With stress-ng we're able to
> > > > generate latency in millisecond range. The only workaround we've found
> > > > so far is to add a "delay" in cpu_relax().
> > >
> > > It might interesting for you, how we added the delay. We've used:
> > >
> > > static inline void cpu_relax(void)
> > > {
> > > 	volatile int i = 0;
> > >
> > > 	asm volatile("yield" ::: "memory");
> > > 	while (i++ <= 1000);
> > > }
> > >
> > > Of course it's not efficient, but it works.
> >
> > I wonder if it's just the store on the stack which makes it work. I've seen
> > that when instrumenting x86. When the careful instrumentation just stayed
> > in registers it failed. Once it was too much and stack got involved it
> > vanished away.
>
> I've performed more tests: Adding a store to a global variable just
> before calling cpu_relax() doesn't help. Furthermore, adding up to 20
> yield instructions (just like you did on x86) didn't work either.

In addition, the stress-ng test triggers on v4.14-rt and v4.18-rt as
well.

As v4.18-rt still uses the old spin lock implementation, I've backported
the qspinlock implementation to v4.18-rt. The commits I've identified
are:

 - 598865c5f32d ("arm64: barrier: Implement smp_cond_load_relaxed")
 - c11090474d70 ("arm64: locking: Replace ticket lock implementation with qspinlock")

Using these commits it's still possible to trigger the issue. But it
takes longer.

Did I miss anything?

Thanks,
Kurt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ