[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5803d2623278c7516406534b035a641abfdecee6.camel@redhat.com>
Date: Tue, 29 Jul 2025 16:06:17 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
linux-trace-kernel@...r.kernel.org, linux-doc@...r.kernel.org, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Tomas Glozar
<tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>, Clark Williams
<williams@...hat.com>, John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more
complete sts
On Tue, 2025-07-29 at 11:37 +0200, Nam Cao wrote:
> On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote:
> > Kernel:
> > - base: ftrace/for-next
I assume you mean rv/for-next ? The one that includes all changes as of
yesterday.
> > - config: defconfig + mod2noconfig + PREEMPT_RT + monitors
> >
> > Hardware:
> > qemu-system-riscv64 -machine virt \
> > -kernel ../linux/arch/riscv/boot/Image \
> > -append "console=ttyS0 root=/dev/vda rw" \
> > -nographic \
> > -drive if=virtio,format=raw,file=riscv64.img \
> > -smp 4 -m 4G
> >
> > riscv64.img is a Debian trixie image from debootstrap
> >
> > Test:
> > echo 0 > /proc/sys/debug/exception-trace
> > ./testall # see attached
>
> I should note that this takes a few tries before something shows up.
>
Thanks for all the details, but I still can't reproduce nor understand
what can be triggering the issue.
I tried enabling sts and setting panic as the reactor (to avoid missing
it with all the rubbish that gets printed on the dmesg) and run
testall. Still cannot see the error.
What might help would be to see the trace with irq_enable and
irq_disable around the error, something like (not tested):
trace-cmd stream -e irq_enable -e irq_disable -e error_sts -e
irq_handler_entry -- sh testall | grep -B 10 error
The problem here is not when the error occurs, but a couple of events
earlier (where I possibly miss something that looks like an interrupt).
Thanks,
Gabriele
> Below is the backtrace, in case it helps:
>
> illegal 3246 [000] 1020.132675: rv:error_sts: event sched_switch
> not expected in the state enable_to_exit
> ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
> ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
> ffffffff80138aa4 da_event_sts+0x198 ([kernel.kallsyms])
> ffffffff80138cf0 handle_sched_switch+0x46 ([kernel.kallsyms])
> ffffffff80aaf222 __schedule+0x4ba ([kernel.kallsyms])
> ffffffff80aafb80 preempt_schedule_irq+0x32
> ([kernel.kallsyms])
> ffffffff80aac714 irqentry_exit+0x76 ([kernel.kallsyms])
> ffffffff80aac1dc do_irq+0x38 ([kernel.kallsyms])
> ffffffff80ab7da6 __lock_text_end+0x12e ([kernel.kallsyms])
> ffffffff80a93e50 mas_find+0x0 ([kernel.kallsyms])
> ffffffff8021ea60 vms_clear_ptes+0xe8 ([kernel.kallsyms])
> ffffffff8021f81a vms_complete_munmap_vmas+0x58
> ([kernel.kallsyms])
> ffffffff80220706 do_vmi_align_munmap+0x15c
> ([kernel.kallsyms])
> ffffffff802207d0 do_vmi_munmap+0xa6 ([kernel.kallsyms])
> ffffffff80221f3c __vm_munmap+0xa2 ([kernel.kallsyms])
> ffffffff8020be7c vm_munmap+0xe ([kernel.kallsyms])
> ffffffff802bbdbe elf_load+0x14c ([kernel.kallsyms])
> ffffffff802bc1f4 load_elf_binary+0x36e ([kernel.kallsyms])
> ffffffff80264426 bprm_execve+0x254 ([kernel.kallsyms])
> ffffffff8026570c do_execveat_common.isra.0+0x11e
> ([kernel.kallsyms])
> ffffffff802664de __riscv_sys_execve+0x32 ([kernel.kallsyms])
> ffffffff80aabf84 do_trap_ecall_u+0x1bc ([kernel.kallsyms])
> ffffffff80ab7dc8 __lock_text_end+0x150 ([kernel.kallsyms])
Powered by blists - more mailing lists