[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200429105941.GQ30814@suse.de>
Date: Wed, 29 Apr 2020 12:59:41 +0200
From: Joerg Roedel <jroedel@...e.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Shile Zhang <shile.zhang@...ux.alibaba.com>,
Andy Lutomirski <luto@...capital.net>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Tzvetomir Stoyanov <tz.stoyanov@...il.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
Hi Steven,
On Wed, Apr 29, 2020 at 05:48:57AM -0400, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) <rostedt@...dmis.org>
>
> Tzvetomir was adding a feature to trace-cmd that would allow the user
> to specify filtering on process IDs within a tracing instance (or
> buffer). When he added this feature and tested it on tracing PIDs 1 and
> 2, it caused his kernel to hang.
>
> He sent me his code and I was able to reproduce the hang as well. I
> bisected it down to this commit 763802b53a42 ("x86/mm: split
> vmalloc_sync_all()"). It was 100% reproducible. With the commit it
> would hang, and reverting the commit, it would work.
>
> Adding a bunch of printk()s, I found where it locked up. It was after
> the recording was finished, and a write of "0" to
> tracefs/instance/foo/events/enable. And in the code, it was:
>
> (you may skip to the end of the chain)
>
> system_enable_write() {
> __ftrace_set_clr_event() {
> __ftrace_set_clr_event_nolock() {
> ftrace_event_enable_disable() {
> __ftrace_event_enable_disable() {
> call->class->reg() <trace_event_reg()> {
> trace_point_probe_unregister() {
> tracepoint_remove_func() {
> static_key_slow_dec() {
> __static_key_slow_dec() {
>
> <continued>
>
> __static_key_slow_dec_cpus_locked() {
> jump_label_update() {
> __jump_label_update()
> arch_jump_label_transform() {
> jump_label_transform() {
> __jump_label_transform() {
> text_poke_bp() {
> text_poke_bp_batch() {
> text_poke() {
> __text_poke() {
>
> <continued> (This is where you want to see)
>
> use_temporary_mm() {
> switch_mm_irqs_off() {
> load_new_mm_cr3() {
> write_cr3() <<--- Lock up!
I don't see how it could lock up in write_cr3(), at least on bare-metal.
What is the environment this happens, 32 or 64 bit, in a VM or
bare-metal?
I think it is more likely that your lockup is actually a page-fault
loop, where the #PF handler does not map the faulting address correctly.
But I have to look closer into how text_poke() works before I can say
more.
Btw, in case it happens on x86-64, does it also happen without
vmalloc-stacks?
Regards,
Joerg
Powered by blists - more mailing lists