linux-kernel - Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5803d2623278c7516406534b035a641abfdecee6.camel@redhat.com>
Date: Tue, 29 Jul 2025 16:06:17 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>, 
	linux-trace-kernel@...r.kernel.org, linux-doc@...r.kernel.org, Ingo Molnar
	 <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Tomas Glozar
	 <tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>, Clark Williams
	 <williams@...hat.com>, John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more
 complete sts

On Tue, 2025-07-29 at 11:37 +0200, Nam Cao wrote:
> On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote:
> > Kernel:
> >   - base: ftrace/for-next

I assume you mean rv/for-next ? The one that includes all changes as of
yesterday.

> >   - config: defconfig + mod2noconfig + PREEMPT_RT + monitors
> > 
> > Hardware:
> > 	qemu-system-riscv64 -machine virt \
> > 	-kernel ../linux/arch/riscv/boot/Image \
> > 	-append "console=ttyS0 root=/dev/vda rw" \
> > 	-nographic \
> > 	-drive if=virtio,format=raw,file=riscv64.img \
> > 	-smp 4 -m 4G
> > 
> > 	riscv64.img is a Debian trixie image from debootstrap
> > 
> > Test:
> > 	echo 0 > /proc/sys/debug/exception-trace
> > 	./testall # see attached
> 
> I should note that this takes a few tries before something shows up.
> 

Thanks for all the details, but I still can't reproduce nor understand
what can be triggering the issue.

I tried enabling sts and setting panic as the reactor (to avoid missing
it with all the rubbish that gets printed on the dmesg) and run
testall. Still cannot see the error.

What might help would be to see the trace with irq_enable and
irq_disable around the error, something like (not tested):

  trace-cmd stream -e irq_enable -e irq_disable -e error_sts -e
irq_handler_entry -- sh testall | grep -B 10 error

The problem here is not when the error occurs, but a couple of events
earlier (where I possibly miss something that looks like an interrupt).

Thanks,
Gabriele

> Below is the backtrace, in case it helps:
> 
> illegal    3246 [000]  1020.132675: rv:error_sts: event sched_switch
> not expected in the state enable_to_exit
>         ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
>         ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
>         ffffffff80138aa4 da_event_sts+0x198 ([kernel.kallsyms])
>         ffffffff80138cf0 handle_sched_switch+0x46 ([kernel.kallsyms])
>         ffffffff80aaf222 __schedule+0x4ba ([kernel.kallsyms])
>         ffffffff80aafb80 preempt_schedule_irq+0x32
> ([kernel.kallsyms])
>         ffffffff80aac714 irqentry_exit+0x76 ([kernel.kallsyms])
>         ffffffff80aac1dc do_irq+0x38 ([kernel.kallsyms])
>         ffffffff80ab7da6 __lock_text_end+0x12e ([kernel.kallsyms])
>         ffffffff80a93e50 mas_find+0x0 ([kernel.kallsyms])
>         ffffffff8021ea60 vms_clear_ptes+0xe8 ([kernel.kallsyms])
>         ffffffff8021f81a vms_complete_munmap_vmas+0x58
> ([kernel.kallsyms])
>         ffffffff80220706 do_vmi_align_munmap+0x15c
> ([kernel.kallsyms])
>         ffffffff802207d0 do_vmi_munmap+0xa6 ([kernel.kallsyms])
>         ffffffff80221f3c __vm_munmap+0xa2 ([kernel.kallsyms])
>         ffffffff8020be7c vm_munmap+0xe ([kernel.kallsyms])
>         ffffffff802bbdbe elf_load+0x14c ([kernel.kallsyms])
>         ffffffff802bc1f4 load_elf_binary+0x36e ([kernel.kallsyms])
>         ffffffff80264426 bprm_execve+0x254 ([kernel.kallsyms])
>         ffffffff8026570c do_execveat_common.isra.0+0x11e
> ([kernel.kallsyms])
>         ffffffff802664de __riscv_sys_execve+0x32 ([kernel.kallsyms])
>         ffffffff80aabf84 do_trap_ecall_u+0x1bc ([kernel.kallsyms])
>         ffffffff80ab7dc8 __lock_text_end+0x150 ([kernel.kallsyms])