lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090318001955.GB5143@nowhere>
Date:	Wed, 18 Mar 2009 01:20:00 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Kevin Shanahan <kmshanah@...b.org.au>
Cc:	Avi Kivity <avi@...hat.com>, "Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>, Mike Galbraith <efault@....de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > I've looked a bit at your traces.
> > I think it's probably too wide to find something inside.
> > Latest -tip is provided with a new set of events tracing, meaning
> > that you will be able to produce function graph traces with various
> > sched events included.
> > 
> > Another thing, is it possible to reproduce it with only one ping?
> > Or testing perioding pings and keep only one that raised a relevant
> > threshold of latency? I think we could do a script that can do that.
> > It would make the trace much clearer.
> 
> Yeah, I think that should be possible. If you can come up with such a
> script, that would be great.

Ok, I've made a small script based on yours which could do this job.
You will just have to set yourself a threshold of latency
that you consider as buggy. I don't remember the latency you observed.
About 5 secs right?

It's the "thres" variable in the script.

The resulting trace should be a mixup of the function graph traces
and scheduler events which look like this:

 gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
  xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
  xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
            Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>

+ is a wakeup and ==> is a context switch.


The script will loop trying some pings and will only keep the trace that matches
the latency threshold you defined.

Tell if the following script work for you.

You will need to pull the latest -tip tree and enable the following:

CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y

Thanks!

Ah and you will need python too (since bash can't do floating point
operation, it uses python here).

#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr=""

# Set here a threshold of latency in sec
thres="5"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

while [ "$found" != "True" ]
do
	# Flush the previous buffer
	echo nop > $prefix/current_tracer

	# Reset the function_graph tracer
	echo function_graph > $prefix/current_tracer

	echo 1 > $prefix/tracing_enabled
	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
	echo 0 > $prefix/tracing_enabled

	found=$(python -c "print float(str($lat).strip()) > $thres")
	sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



> 
> > Just wait a bit, I'm looking at which event could be relevant to enable
> > and I come back to you with a set of commands to test.
> 
> Excellent! Thanks for looking into this.
> 
> Cheers,
> Kevin.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ