linux-kernel - Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1232963730.4921.12.camel@kulgan.wumi.org.au>
Date:	Mon, 26 Jan 2009 20:25:30 +1030
From:	Kevin Shanahan <kmshanah@...b.org.au>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Avi Kivity <avi@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Mike Galbraith <efault@....de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	bugme-daemon@...zilla.kernel.org
Subject: Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@...hat.com> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y
> 
> Once you boot into such a kernel, you can switch on function tracing via:
> 
>   cd /debug/tracing/
> 
>   echo 0 > tracing_enabled
>   echo function_graph > current_tracer
>   echo funcgraph-proc > trace_options 
> 
> It does not run yet, first find a suitable set of functions to trace. For 
> example this will be a pretty good starting point for scheduler+KVM 
> problems:
> 
>   echo ''         > set_ftrace_filter  # clear filter functions
>   echo '*sched*' >> set_ftrace_filter 
>   echo '*wake*'  >> set_ftrace_filter
>   echo '*kvm*'   >> set_ftrace_filter
>   echo 1 > tracing_enabled             # let the tracer go
> 
> You can see your current selection of functions to trace via 'cat 
> set_ftrace_filter', and you can see all functions via 'cat 
> available_filter_functions'.
> 
> You can also trace all functions via:
> 
>   echo '*' > set_ftrace_filter
> 
> Tracer output can be captured from the 'trace' file. It should look like 
> this:
> 
>  15)   cc1-28106    |   0.263 us    |    page_evictable();
>  15)   cc1-28106    |               |    lru_cache_add_lru() {
>  15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
>  15)   cc1-28106    |   0.738 us    |    }
>  15)   cc1-28106    | + 74.026 us   |  }
>  15)   cc1-28106    |               |  up_read() {
>  15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
>  15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
>  15)   cc1-28106    |   1.329 us    |  }
> 
> To capture a continuous stream of all trace data you can do:
> 
>   cat trace_pipe > /tmp/trace.txt
> 
> (this will also drain the trace ringbuffers.)
> 
> Note that this can be quite expensive if there are a lot of functions that 
> are traced - so it makes sense to trim down the set of traced functions to 
> only the interesting ones. Which are the interesting ones can be 
> determined from looking at the traces. You should see your KVM threads 
> getting active every second as the ping happens.
> 
> If you get lost events you can increase the trace buffer size via the 
> buffer_size_kb control - the default is around 1.4 MB.
> 
> Let me know if any of these steps is causing problems or if interpreting 
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

  http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@...xo:~$ pstree -p 4353
qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
                      ├─{qemu-system-x86}(4355)
                      └─{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@...xo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
 0)   ksoftir-4    | ! 3010470 us |  }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }
 0)  qemu-sy-4354  | ! 362946.3 us |    }
 0)  qemu-sy-4354  | ! 362947.0 us |  }
 0)  qemu-sy-4177  | ! 780480.3 us |  }
 0)  qemu-sy-4354  | ! 117685.7 us |    }
 0)  qemu-sy-4354  | ! 117686.5 us |  }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/