lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 21 Dec 2023 01:06:28 +0000
From: Mark Brown <broonie@...nel.org>
To: Daniel Díaz <daniel.diaz@...aro.org>
Cc: Naresh Kamboju <naresh.kamboju@...aro.org>,
	Linux ARM <linux-arm-kernel@...ts.infradead.org>,
	open list <linux-kernel@...r.kernel.org>,
	lkft-triage@...ts.linaro.org, linux-stable <stable@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Marc Zyngier <maz@...nel.org>
Subject: Re: selftests: ftrace: Internal error: Oops: sve_save_state

On Wed, Dec 20, 2023 at 06:06:53PM -0600, Daniel Díaz wrote:

> We have been seeing this problem in other instances, specifically on
> the following kernels:
> * 5.15.132, 5.15.134-rc1, 5.15.135, 5.15.136-rc1, 5.15.142, 5.15.145-rc1
> * 6.1.42, 6.1.43, 6.1.51-rc1, 6.1.56-rc1, 6.1.59-rc1, 6.1.63
> * 6.3.10, 6.3.11
> * 6.4.7
> * 6.5.2, 6.5.10-rc2

This is a huge range of kernels with some substantial reworkings of
the FP code, and I do note that v5.15 appears to have backported only
one change there (an incidental one related to ESR handling).  This
makes me think this is likely to be something that's been sitting there
for a very long time and is unrelated to those versions and any changes
that went into them.  I see you're still testing back to v4.19 which
suggests an issue introduced between v5.10 and v5.15, my change
cccb78ce89c45a4 ("arm64/sve: Rework SVE access trap to convert state in
registers") does jump out there though I don't immediately see what the
issue would be.

Looking at the list of versions you've posted the earliest is from the
very end of June with others in July, was there something that changed
in your test environment in broadly that time?  I see that the 
logs you and Naresh posted are both using a Debian 12/Bookworm based
root filesystem and that was released a couple of weeks before this
started appearing, Bookworm introduced glibc usage of SVE which makes
usage much more common.  Is this perhaps tied to you upgrading your root
filesystems to Bookworm or were you tracking testing before then?

> Most recent case is for the current 5.15 RC. Decoded stack trace is here:
> -----8<-----
>   <4>[   29.297166] ------------[ cut here ]------------
>   <4>[ 29.298039] WARNING: CPU: 1 PID: 220 at
> arch/arm64/kernel/fpsimd.c:950 do_sve_acc
> (/builds/linux/arch/arm64/kernel/fpsimd.c:950 (discriminator 1))

That's an assert that we shouldn't take a SVE trap when SVE is
alreadly enabled for the thread.  The backtrace Naresh originally
supplied was a NULL pointer dereference attempting to save SVE state 
(indicating that we think we're trying to save SVE state but don't have
any storage allocated for it) during thread switch.  It's very plausible
that the two are the same underlying issue but it's also not 100% a
given.  Can you double check exactly how similar the various issues you
are seeing are please?

I have coincidentally been chasing some other stuff in the past week or
two which might potentially be different manifestations of the same
underlying issue with current code, broadly in the area of the register
state and task state getting out of sync.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ