lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110612235555.GD11580@hexapodia.org>
Date:	Sun, 12 Jun 2011 16:55:55 -0700
From:	Andy Isaacson <adi@...apodia.org>
To:	"Paul E. McKenney" <paulmck@...ibm.com>
Cc:	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	linux-pm@...ts.linux-foundation.org
Subject: Re: rcu_sched_state detected stall on CPU 0, 3.0-rc2

Let's CC netdev and linux-pm since this is obviously a suspend issue,
and may have something to do with ethtool.

On Sun, Jun 12, 2011 at 04:11:43PM -0700, Andy Isaacson wrote:
> On Sun, Jun 12, 2011 at 12:58:56PM -0700, Andy Isaacson wrote:
> > My Thinkpad x201s threw some errors (?) a few minutes after resuming
> > from suspend-to-ram this morning.
> > 
> > [56415.672140] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies)
> > 
> > Nothing jumps out of the backtraces at me.  Full dmesg and config
> > attached.  This was my first StR since upgrading from 2.6.39, let's see
> > if it fails again when I suspend after sending this email. :)
> 
> I haven't had a fully successful StR cycle yet (in 5 tries), although I
> can't pin them all on RCU.  On try 2 it hung completely about 10 seconds
> after I unlocked the screensaver, on try 3 it came back to a black
> console, and on try 4 it didn't suspend at all (blinking moon LED but
> battery LED and CPU fan still on).

Of course now that I'm trying to debug, I am seeing many successful
suspend-resume cycles.  I don't see any signs of difference between the
cases that hung and the cases that are now succeeding.

CCing netdev, because I suspend by running pm-suspend, and in at least
one failure, an ethtool running under pm-suspend seemed to be the
problem:

root 11558 pts/8    S+ \_ /bin/sh /usr/lib/pm-utils/sleep.d/00powers
root 11559 pts/8    S+     \_ /bin/sh /usr/sbin/pm-powersave
root 11576 pts/8    S+         \_ /bin/sh /usr/lib/pm-utils/power.d/
root 11577 pts/8    D+             \_ ethtool -s eth0 wol g

many processes were stuck in D:

USER     PID    VSZ   RSS STAT START COMMAND
root   11493      0     0 D    16:11  \_ [kworker/u:15]
nobody  1707  21472   992 D    14:31 dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
adi    11606  41004  2424 D+   16:13  |       \_ ssh hex
root   11577   4092   324 D+   16:11  |                               \_ ethtool -s eth0 wol g
root   11595  22108   892 D+   16:12  |       \_ sudo cat /proc/11577/stack
root   11604  22108   900 D+   16:13  |       \_ sudo cat /proc/11577/stack

==> /proc/11577/wchan <==
synchronize_sched

-andy

Download attachment "trace.gz" of type "application/octet-stream" (27327 bytes)

Download attachment "config-trim.gz" of type "application/octet-stream" (8350 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ