lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Apr 2012 10:51:19 +0600
From:	"Alexander E. Patrakov" <patrakov@...il.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org
Subject: Re: BUG: sleeping function called from invalid context at
 kernel/mutex.c:271

Paul E. McKenney wrote:
> On Tue, Apr 17, 2012 at 10:56:27PM -0400, Steven Rostedt wrote:
> > On Mon, Apr 16, 2012 at 01:27:24PM +0600, Alexander E. Patrakov wrote:
> > > Hello.
> > > 
> > > With linux-3.3.0, my computer at work is rather unstable. The kernel
> > 
> > Have you tried other kernels? The leak just started with 3.3?

Yes, I have found a 3.3.1-plus-ath9k-revert configuration that works for
me if I don't use systemd. I have not tested this with systemd on the
work computer (i.e. the most affected one), but both my home desktop and
the laptop are stable with 3.3.1 even with systemd.

As for older kernels, it is difficult to say. It is essentially a full
reinstall of gentoo, together with a move from XFCE to GNOME3, from
OpenRC to systemd and then back (because it was a primary suspect for
the bug), and with added NetworkManager - and thus with a different set
of paths regularly excercised. So a bug might exist in the old kernels
too, without me noticing it.

> > 
> > > seems to leak memory, however, kmemleak finds nothing significant.
> > > 
> > > Finally, the computer started swapping heavily and responded only via
> > > ssh. In dmesg, I found this (repeated every two seconds):
> > > 
> > > [ 6709.483956] BUG: sleeping function called from invalid context at
> > > kernel/mutex.c:271
> > > [ 6709.483968] in_atomic(): 0, irqs_disabled(): 0, pid: 1210, name:
> > 
> > I'm a little baffled here, as preempt_count() is zero (in_atomic) and
> > irqs are not disabled.
> > 
> > > NetworkManager
> > 
> > Ah there's your problem! (just kidding)
> > 
> > 
> > > [ 6709.483974] INFO: lockdep is turned off.
> > > [ 6709.483981] Pid: 1210, comm: NetworkManager Tainted: G          I
> > > 3.3.0-gentoo #4
> > > [ 6709.483987] Call Trace:
> > > [ 6709.484006]  [<ffffffff810683fc>] __might_sleep+0xff/0x103
> > > [ 6709.484019]  [<ffffffff81548783>] mutex_lock_nested+0x2a/0x2ff
> > > [ 6709.484031]  [<ffffffff811261b5>] ? fget_light+0x6a/0x118
> > > [ 6709.484043]  [<ffffffff8115a738>] inotify_poll+0x35/0x53
> > > [ 6709.484052]  [<ffffffff81135eb1>] do_sys_poll+0x266/0x3f2
> > > [ 6709.484060]  [<ffffffff81134e33>] ? poll_freewait+0x8f/0x8f
> > > [ 6709.484069]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484076]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484084]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484092]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484099]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484107]  [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484118]  [<ffffffff8112fde0>] ? putname+0x2d/0x36
> > > [ 6709.484127]  [<ffffffff8112fde0>] ? putname+0x2d/0x36
> > > [ 6709.484138]  [<ffffffff81045cb7>] ? timespec_add_safe+0x32/0x5f
> > > [ 6709.484146]  [<ffffffff81018078>] ? read_tsc+0x9/0x1b
> > > [ 6709.484155]  [<ffffffff8113509a>] ? poll_select_set_timeout+0x61/0x75
> > > [ 6709.484163]  [<ffffffff811360d8>] sys_poll+0x4e/0xb7
> > > [ 6709.484173]  [<ffffffff81552896>] sysenter_dispatch+0x7/0x21
> > > 
> > 
> > Hmm, this can also be reported if you have an rcu leak. Which would also
> > explain your memory leak. RCU is the kernel's "garbage collector" and if
> > it gets stuck, then you will definitely start seeing memory leaks, as
> > memory wont be freed.
> 
> If RCU is stuck, you should see RCU CPU stall warnings, which can
> give clues as to what is causing RCU to get stuck.

Yes, there were such warnings, but all of them said that the stall ended
before dump start. So nothing reportable here, but I have just found
that https://bugzilla.novell.com/show_bug.cgi?id=754186 looks quite
similar in the sense that it (and duplicates) also has undumpable stalls
of increasing length, and IPv6 is in use both at home and at work.

> 
> > Paul, know of any fixes in RCU that could have caused this?
> > 
> > -- Steve
> > 
> > > The taint is due to nouveau. I am sure that this
> > > "sleeping-in-invalid-context" report is a consequence of the memory leak
> > > that I could not convert into something reportable. But still, it is
> > > something that the kernel wants me to report, that's why this e-mail.
> > > 
> > > The kernel is configured as CONFIG_PREEMPT=y, if this is relevant.
> 
> Could you please send along your full .config?

Yes, two configs are attached. The bad one is from 3.3.0 and has a lot
of debug options that I added to catch the lockups before attributing
them to the leak. The good one is from 3.3.1 with c1afdaff (ath9k)
reverted, and without debug options. I have not tested 3.3.1 with debug
options, because I (incorrectly) assumed that they are at fault. Both
have a lot of unused modules. So here is the lsmod from the work
computer:

$ lsmod
Module                  Size  Used by
fuse                   65161  2 
ip6table_filter         1308  0 
ip6_tables             17694  1 ip6table_filter
ebtable_nat             1732  0 
ebtables               23823  1 ebtable_nat
ipt_MASQUERADE          1618  3 
iptable_nat             3904  1 
nf_nat                 13956  2 iptable_nat,ipt_MASQUERADE
nf_conntrack_ipv4      11013  4 nf_nat,iptable_nat
nf_defrag_ipv4          1219  1 nf_conntrack_ipv4
xt_state                1175  1 
nf_conntrack           57810  5
xt_state,nf_conntrack_ipv4,nf_nat,iptable_nat,ipt_MASQUERADE
ipt_REJECT              2193  2 
iptable_mangle          1496  0 
xt_tcpudp               2447  4 
iptable_filter          1336  1 
ip_tables              16565  3
iptable_filter,iptable_mangle,iptable_nat
x_tables               16735  11
ip_tables,iptable_filter,xt_tcpudp,iptable_mangle,ipt_REJECT,xt_state,iptable_nat,ipt_MASQUERADE,ebtables,ip6_tables,ip6table_filter
bridge                 73858  0 
stp                     1520  1 bridge
llc                     3665  2 stp,bridge
dm_mod                 67510  0 
kvm_intel             123137  0 
kvm                   343689  1 kvm_intel
fschmd                 16123  0 
nvram                   5621  0 
fujitsu_ts              1922  0 
i2c_dev                 5491  0 
tun                    14408  1 
coretemp                5406  0 
snd_hda_codec_hdmi     23217  4 
snd_hda_codec_realtek   112191  1 
snd_hda_intel          22833  5 
snd_hda_codec          86026  3
snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec_hdmi
snd_hwdep               5894  1 snd_hda_codec
nouveau               763731  3 
arc4                    1322  2 
snd_pcsp                7549  0 
snd_pcm                71074  4
snd_pcsp,snd_hda_codec,snd_hda_intel,snd_hda_codec_hdmi
ath9k                  88564  0 
snd_page_alloc          6873  2 snd_pcm,snd_hda_intel
ath9k_common            1944  1 ath9k
ath9k_hw              360585  2 ath9k_common,ath9k
ath                    14756  3 ath9k_hw,ath9k_common,ath9k
mac80211              343816  1 ath9k
cfg80211              168639  3 mac80211,ath,ath9k
snd_timer              18281  1 snd_pcm
snd                    56538  18
snd_timer,snd_pcm,snd_pcsp,snd_hwdep,snd_hda_codec,snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec_hdmi
rfkill                 14919  1 cfg80211
iTCO_wdt               12749  0 
mxm_wmi                 1337  1 nouveau
iTCO_vendor_support     1809  1 iTCO_wdt
r8169                  47792  0 
mii                     3907  1 r8169
i2c_i801                8062  0 
wmi                     8099  2 mxm_wmi,nouveau
usbhid                 35069  0 
serio_raw               4437  0 
xhci_hcd               78370  0 
ehci_hcd               38892  0 
usbcore               140844  4 ehci_hcd,xhci_hcd,usbhid
usb_common               866  1 usbcore


-- 
Alexander E. Patrakov

View attachment "config-3.3.0-bad" of type "text/x-mpsub" (114336 bytes)

View attachment "config-3.3.1-good" of type "text/x-mpsub" (111831 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ