[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1334724679.2019.20.camel@aep-desktop>
Date: Wed, 18 Apr 2012 10:51:19 +0600
From: "Alexander E. Patrakov" <patrakov@...il.com>
To: paulmck@...ux.vnet.ibm.com
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org
Subject: Re: BUG: sleeping function called from invalid context at
kernel/mutex.c:271
Paul E. McKenney wrote:
> On Tue, Apr 17, 2012 at 10:56:27PM -0400, Steven Rostedt wrote:
> > On Mon, Apr 16, 2012 at 01:27:24PM +0600, Alexander E. Patrakov wrote:
> > > Hello.
> > >
> > > With linux-3.3.0, my computer at work is rather unstable. The kernel
> >
> > Have you tried other kernels? The leak just started with 3.3?
Yes, I have found a 3.3.1-plus-ath9k-revert configuration that works for
me if I don't use systemd. I have not tested this with systemd on the
work computer (i.e. the most affected one), but both my home desktop and
the laptop are stable with 3.3.1 even with systemd.
As for older kernels, it is difficult to say. It is essentially a full
reinstall of gentoo, together with a move from XFCE to GNOME3, from
OpenRC to systemd and then back (because it was a primary suspect for
the bug), and with added NetworkManager - and thus with a different set
of paths regularly excercised. So a bug might exist in the old kernels
too, without me noticing it.
> >
> > > seems to leak memory, however, kmemleak finds nothing significant.
> > >
> > > Finally, the computer started swapping heavily and responded only via
> > > ssh. In dmesg, I found this (repeated every two seconds):
> > >
> > > [ 6709.483956] BUG: sleeping function called from invalid context at
> > > kernel/mutex.c:271
> > > [ 6709.483968] in_atomic(): 0, irqs_disabled(): 0, pid: 1210, name:
> >
> > I'm a little baffled here, as preempt_count() is zero (in_atomic) and
> > irqs are not disabled.
> >
> > > NetworkManager
> >
> > Ah there's your problem! (just kidding)
> >
> >
> > > [ 6709.483974] INFO: lockdep is turned off.
> > > [ 6709.483981] Pid: 1210, comm: NetworkManager Tainted: G I
> > > 3.3.0-gentoo #4
> > > [ 6709.483987] Call Trace:
> > > [ 6709.484006] [<ffffffff810683fc>] __might_sleep+0xff/0x103
> > > [ 6709.484019] [<ffffffff81548783>] mutex_lock_nested+0x2a/0x2ff
> > > [ 6709.484031] [<ffffffff811261b5>] ? fget_light+0x6a/0x118
> > > [ 6709.484043] [<ffffffff8115a738>] inotify_poll+0x35/0x53
> > > [ 6709.484052] [<ffffffff81135eb1>] do_sys_poll+0x266/0x3f2
> > > [ 6709.484060] [<ffffffff81134e33>] ? poll_freewait+0x8f/0x8f
> > > [ 6709.484069] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484076] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484084] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484092] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484099] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484107] [<ffffffff81134efa>] ? __pollwait+0xc7/0xc7
> > > [ 6709.484118] [<ffffffff8112fde0>] ? putname+0x2d/0x36
> > > [ 6709.484127] [<ffffffff8112fde0>] ? putname+0x2d/0x36
> > > [ 6709.484138] [<ffffffff81045cb7>] ? timespec_add_safe+0x32/0x5f
> > > [ 6709.484146] [<ffffffff81018078>] ? read_tsc+0x9/0x1b
> > > [ 6709.484155] [<ffffffff8113509a>] ? poll_select_set_timeout+0x61/0x75
> > > [ 6709.484163] [<ffffffff811360d8>] sys_poll+0x4e/0xb7
> > > [ 6709.484173] [<ffffffff81552896>] sysenter_dispatch+0x7/0x21
> > >
> >
> > Hmm, this can also be reported if you have an rcu leak. Which would also
> > explain your memory leak. RCU is the kernel's "garbage collector" and if
> > it gets stuck, then you will definitely start seeing memory leaks, as
> > memory wont be freed.
>
> If RCU is stuck, you should see RCU CPU stall warnings, which can
> give clues as to what is causing RCU to get stuck.
Yes, there were such warnings, but all of them said that the stall ended
before dump start. So nothing reportable here, but I have just found
that https://bugzilla.novell.com/show_bug.cgi?id=754186 looks quite
similar in the sense that it (and duplicates) also has undumpable stalls
of increasing length, and IPv6 is in use both at home and at work.
>
> > Paul, know of any fixes in RCU that could have caused this?
> >
> > -- Steve
> >
> > > The taint is due to nouveau. I am sure that this
> > > "sleeping-in-invalid-context" report is a consequence of the memory leak
> > > that I could not convert into something reportable. But still, it is
> > > something that the kernel wants me to report, that's why this e-mail.
> > >
> > > The kernel is configured as CONFIG_PREEMPT=y, if this is relevant.
>
> Could you please send along your full .config?
Yes, two configs are attached. The bad one is from 3.3.0 and has a lot
of debug options that I added to catch the lockups before attributing
them to the leak. The good one is from 3.3.1 with c1afdaff (ath9k)
reverted, and without debug options. I have not tested 3.3.1 with debug
options, because I (incorrectly) assumed that they are at fault. Both
have a lot of unused modules. So here is the lsmod from the work
computer:
$ lsmod
Module Size Used by
fuse 65161 2
ip6table_filter 1308 0
ip6_tables 17694 1 ip6table_filter
ebtable_nat 1732 0
ebtables 23823 1 ebtable_nat
ipt_MASQUERADE 1618 3
iptable_nat 3904 1
nf_nat 13956 2 iptable_nat,ipt_MASQUERADE
nf_conntrack_ipv4 11013 4 nf_nat,iptable_nat
nf_defrag_ipv4 1219 1 nf_conntrack_ipv4
xt_state 1175 1
nf_conntrack 57810 5
xt_state,nf_conntrack_ipv4,nf_nat,iptable_nat,ipt_MASQUERADE
ipt_REJECT 2193 2
iptable_mangle 1496 0
xt_tcpudp 2447 4
iptable_filter 1336 1
ip_tables 16565 3
iptable_filter,iptable_mangle,iptable_nat
x_tables 16735 11
ip_tables,iptable_filter,xt_tcpudp,iptable_mangle,ipt_REJECT,xt_state,iptable_nat,ipt_MASQUERADE,ebtables,ip6_tables,ip6table_filter
bridge 73858 0
stp 1520 1 bridge
llc 3665 2 stp,bridge
dm_mod 67510 0
kvm_intel 123137 0
kvm 343689 1 kvm_intel
fschmd 16123 0
nvram 5621 0
fujitsu_ts 1922 0
i2c_dev 5491 0
tun 14408 1
coretemp 5406 0
snd_hda_codec_hdmi 23217 4
snd_hda_codec_realtek 112191 1
snd_hda_intel 22833 5
snd_hda_codec 86026 3
snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec_hdmi
snd_hwdep 5894 1 snd_hda_codec
nouveau 763731 3
arc4 1322 2
snd_pcsp 7549 0
snd_pcm 71074 4
snd_pcsp,snd_hda_codec,snd_hda_intel,snd_hda_codec_hdmi
ath9k 88564 0
snd_page_alloc 6873 2 snd_pcm,snd_hda_intel
ath9k_common 1944 1 ath9k
ath9k_hw 360585 2 ath9k_common,ath9k
ath 14756 3 ath9k_hw,ath9k_common,ath9k
mac80211 343816 1 ath9k
cfg80211 168639 3 mac80211,ath,ath9k
snd_timer 18281 1 snd_pcm
snd 56538 18
snd_timer,snd_pcm,snd_pcsp,snd_hwdep,snd_hda_codec,snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec_hdmi
rfkill 14919 1 cfg80211
iTCO_wdt 12749 0
mxm_wmi 1337 1 nouveau
iTCO_vendor_support 1809 1 iTCO_wdt
r8169 47792 0
mii 3907 1 r8169
i2c_i801 8062 0
wmi 8099 2 mxm_wmi,nouveau
usbhid 35069 0
serio_raw 4437 0
xhci_hcd 78370 0
ehci_hcd 38892 0
usbcore 140844 4 ehci_hcd,xhci_hcd,usbhid
usb_common 866 1 usbcore
--
Alexander E. Patrakov
View attachment "config-3.3.0-bad" of type "text/x-mpsub" (114336 bytes)
View attachment "config-3.3.1-good" of type "text/x-mpsub" (111831 bytes)
Powered by blists - more mailing lists