lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250829181007.GA468030@ax162>
Date: Fri, 29 Aug 2025 11:10:07 -0700
From: Nathan Chancellor <nathan@...nel.org>
To: Jinghao Jia <jinghao7@...inois.edu>,
	Wentao Zhang <wentaoz5@...inois.edu>,
	Sasha Levin <sashal@...nel.org>
Cc: Matt.Kelly2@...ing.com, akpm@...ux-foundation.org,
	andrew.j.oppelt@...ing.com, anton.ivanov@...bridgegreys.com,
	ardb@...nel.org, arnd@...db.de, bhelgaas@...gle.com, bp@...en8.de,
	chuck.wolber@...ing.com, dave.hansen@...ux.intel.com,
	dvyukov@...gle.com, hpa@...or.com, johannes@...solutions.net,
	jpoimboe@...nel.org, justinstitt@...gle.com, kees@...nel.org,
	kent.overstreet@...ux.dev, linux-arch@...r.kernel.org,
	linux-efi@...r.kernel.org, Wentao Zhang <wentaoz5@...inois.edu>,
	linux-kbuild@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org, linux-um@...ts.infradead.org,
	llvm@...ts.linux.dev, luto@...nel.org, marinov@...inois.edu,
	masahiroy@...nel.org, maskray@...gle.com,
	mathieu.desnoyers@...icios.com, matthew.l.weber3@...ing.com,
	mhiramat@...nel.org, mingo@...hat.com, morbo@...gle.com,
	ndesaulniers@...gle.com, oberpar@...ux.ibm.com, paulmck@...nel.org,
	peterz@...radead.org, richard@....at, rostedt@...dmis.org,
	samitolvanen@...gle.com, samuel.sarkisian@...ing.com,
	steven.h.vanderleest@...ing.com, tglx@...utronix.de,
	tingxur@...inois.edu, tyxu@...inois.edu, x86@...nel.org
Subject: Re: [PATCH v2 0/4] Enable measuring the kernel's Source-based Code
 Coverage and MC/DC with Clang

Hi Jinghao and Wentao,

On Thu, Nov 21, 2024 at 11:05:14PM -0600, Jinghao Jia wrote:
...
> On 10/3/24 6:29 PM, Nathan Chancellor wrote:
> > I seem to have narrowed down it to a few different configurations on top
> > of x86_64_defconfig but I will include the full bad configuration as an
> > attachment just in case anything else is relevant.
...
> > $ qemu-system-x86_64 \
> >     -display none \
> >     -nodefaults \
> >     -M q35 \
> >     -d unimp,guest_errors \
> >     -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
> >     -kernel arch/x86/boot/bzImage
> >     -initrd rootfs.cpio \
> >     -cpu host \
> >     -enable-kvm \
> >     -m 8G \
> >     -smp 8 \
> >     -serial mon:stdio
> > <hangs with no output>
> 
> This hang is caused by an early boot exception -- gdb shows the execution
> reaches the halt loop in early_fixup_exception().  Dumping regs->ip associated
> with this exception points us to the following instruction:
> 
> ffffffff89b58074:       48 ff 05 85 7f 4a 76    incq   0x764a7f85(%rip)        # 0 <fixed_percpu_data>
> 
> This is apparently an incorrect access to the per-cpu variable (the cpu offset
> in %gs is needed) and triggers a null-ptr-deref. Without CONFIG_AMD_MEM_ENCRYPT
> (one of the bad configs), it turns out the instruction is actually accessing
> the llvm prof-counter of strscpy():
> 
> ffffffff89b85a04:       48 ff 05 6d 94 7d fa    incq   -0x5826b93(%rip)        # ffffffff8435ee78 <__profc__Z13sized_strscpyPcU25pass_dynamic_object_size1PKcU25pass_dynamic_object_size1m>
> 
> This symbol is left undefined in the bad vmlinux, which explains why the
> faulting instruction is accessing address 0.  Tracing through the kernel
> linking process shows that the symbol is still defined (as a weak symbol) in
> vmlinux.a and vmlinux.o, but becomes undefined after the first round of linking
> of the kernel image (.tmp_vmlinux1).
> 
> After playing with it a little bit, we found the creation of vmlinux.o to be
> the problem. Specifically, if we use mold[1] instead of lld to create the
> object and pass it to the later stages of kernel linking, the symbol will be
> properly defined as a data symbol (and the kernel can boot).
> 
> It seems that the issue does not reproduce with LLVM-20. Nevertheless we have
> reported[2] this to upstream llvm.
> 
> [1]: https://github.com/rui314/mold
> [2]: https://github.com/llvm/llvm-project/issues/116575

Sasha pinged me on IRC earlier this week about this series and this
issue, noting that he was unable to reproduce it with a similar
toolchain version and the instructions above. I was able to confirm that
at 6.17-rc1 with this patch set applied (after fixing a couple of minor
conflicts), I no longer see this boot issue but it is still reproducible
on 6.12.

In attempting to narrow my bisect window to find what patch fixes this
issue, I noticed that this configuration actually fails to build with

  Absolute reference to symbol '__llvm_prf_cnts' not permitted in .head.text

in 6.15 and 6.14 as a result of Ard's commit faf0ed487415 ("x86/boot:
Reject absolute references in .head.text"). Bisecting between 6.15 and
6.16 reveals Ard's commit a3cbbb4717e1 ("x86/boot: Move SEV startup code
into startup/") resolves the build error and that kernel boots, which
seems to make sense to me given what code was involved here. It is
possible that arch/x86/boot/startup will want 'LLVM_COV_PROFILE := n'
since all other instrumentation is disabled.

I built v6.17-rc1 + this series with a fuller distribution configuration
and CONFIG_LLVM_COV_PROFILE_ALL=y. That kernel boots fine in QEMU but I
have done no further evaluation.

Cheers,
Nathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ