lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260126204914.GB217302@noisy.programming.kicks-ass.net>
Date: Mon, 26 Jan 2026 21:49:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: David Matlack <dmatlack@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Michael Jeanson <mjeanson@...icios.com>,
	Jens Axboe <axboe@...nel.dk>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	"Paul E. McKenney" <paulmck@...nel.org>, X86 ML <x86@...nel.org>,
	Sean Christopherson <seanjc@...gle.com>,
	Wei Liu <wei.liu@...nel.org>
Subject: Re: SIGSEGVs after 39a167560a61 ("rseq: Optimize event setting")

On Mon, Jan 26, 2026 at 09:47:45PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 26, 2026 at 11:46:27AM -0800, David Matlack wrote:
> > Hi Thomas,
> > 
> > I started seeing SIGSEGVs in Google's remote test executor when
> > running on hosts at v6.19-rc6. Bisecting led me to this commit:
> > 
> >   39a167560a61 ("rseq: Optimize event setting")
> > 
> > I discovered this issue while running VFIO selftests against v6.19-rc6,
> > but realized the issue has nothing to do with the selftests themselves.
> > Even running "sleep" as the test is enough to trigger this issue in the
> > executor.
> > 
> > I know that Google uses rseq in its userspace software stack, so I
> > assume this is some bad interaction between that implementation and
> > commit 39a167560a61.
> > 
> > Unfortunately, the remote test executor that is receiving the SIGSEGV is
> > not open source so I don't have a repro I can share. But I can easily
> > reproduce the issue with my setup so I'd be happy to help with testing
> > any fixes or debug patches.
> > 
> > I've attached the .config that I used when reproducing this issue. The
> > host I am using is an Intel server with EMR CPUs in case that matters.
> 
> Is this using tcmalloc? If so, that is somewhat expected because
> tcmalloc is known to violate upstream rseq ABI. IIRC you should get a
> nice splat if you enable rseq debug mode (echo 1 > /debug/rseq/debug).
> 
> Perhaps this is the nudge Google needs to go fix this.

FWIW, if that is indeed the issue, it should already be generating
noise if you build the old kernels with CONFIG_DEBUG_RSEQ=y

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ