linux-kernel - Re: [regression] segfault in Qt apps running on Linux kernel 6.10.8 ARM with LPAE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZvaRJK8GQR7GYHnZ@arm.com>
Date: Fri, 27 Sep 2024 12:04:04 +0100
From: Catalin Marinas <catalin.marinas@....com>
To: Linux regressions mailing list <regressions@...ts.linux.dev>
Cc: Linus Walleij <linus.walleij@...aro.org>,
	"Russell King (Oracle)" <rmk+kernel@...linux.org.uk>,
	linux-arm-kernel@...ts.infradead.org,
	LKML <linux-kernel@...r.kernel.org>, Andrew <quark@...root.org>
Subject: Re: [regression] segfault in Qt apps running on Linux kernel 6.10.8
 ARM with LPAE

On Wed, Sep 25, 2024 at 02:23:35PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Catalin, Linus, I noticed a report about a regression in
> bugzilla.kernel.org that appears to be caused by a change of yours:
> 
> 7af5b901e84743 ("ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table
> walks disablement") [v6.10-rc1]
> 
> As many (most?) kernel developers don't keep an eye on the bug tracker,
> I decided to write this mail. To quote from
> https://bugzilla.kernel.org/show_bug.cgi?id=219247 :
> 
> > Trying to run LxQt on a Chromebook XE303C12 with Devuan 4 and Linux
> > kernel 6.10.8 results in a segmentation fault (for LxQt). There are
> > no such problems with Linux kernel 6.9.12 or earlier. With Linux
> > kernel 6.10.8 it is possible to run Xfce4, but trying to run for
> > example Kate ends in a segmentation fault. Mesa 20.3.5, patched for
> > partial hardware acceleration, preserves this acceleration in Xfce4.
> > The mpv works using acceleration regardless of the Linux kernel
> > version. dmesg does not give anything significantly new compared to
> > previous kernel version.
> 
> See the ticket for more details and the bisection log. The reporter is CCed.

I had a quick look and the fault seems to be a level 2 translation fault
while in user space (code 0x206). I can't tell whether the fault address
is valid and we just messed up the pairing of user access enable/disable
or something else happened. Having TTBCR.PD0 == 1 does lead to
translation faults, though not sure what the DFSR register says. Anyway,
normally I'd expect do_page_fault() to get stuck in a continuous fault
loop if the vma was valid rather than end up with SIGSEGV.

Andrew, could you please share the .config file you have, maybe attach
it to the bugzilla report? Also, could you try the kernel with
CONFIG_CPU_TTBR0_PAN disabled without any patches reverted?

Another thing to try is invalidate the TLBs before returning to user,
just in case those TTBCR bits are cached in the TLB in a way we did not
envisage. Untested diff below:

----------------8<--------------------------------
diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h
index 4bccd895d954..c00b400b7f4d 100644
--- a/arch/arm/include/asm/uaccess-asm.h
+++ b/arch/arm/include/asm/uaccess-asm.h
@@ -91,6 +91,9 @@
 	bic	\tmp, \tmp, #TTBCR_EPD0 | TTBCR_T0SZ_MASK
 	bic	\tmp, \tmp, #TTBCR_A1
 	mcr	p15, 0, \tmp, c2, c0, 2		@ write TTBCR
+	isb
+	mcr	p15, 0, \tmp, c8, c7, 0		@ invalidate TLBs
+	dsb
 	.if	\isb
 	instr_sync
 	.endif
----------------8<--------------------------------

Thanks.

-- 
Catalin