linux-kernel - Re: kernel panic - not syncing: out of memory and no killable processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.00.0909181246410.27556@chino.kir.corp.google.com>
Date:	Fri, 18 Sep 2009 12:58:01 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Eric Paris <eparis@...hat.com>
cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Greg KH <greg@...ah.com>, linux-hotplug@...r.kernel.org
Subject: Re: kernel panic - not syncing: out of memory and no killable
 processes

On Fri, 18 Sep 2009, Eric Paris wrote:

> > Isolating udevd down to an interactivity scheduling change isn't _that_ 
> > bizarre.  I think the setting of UDEVD_PRIORITY is already mostly 
> > arbitrary anyway and it'll allow 192 children on your 512M machine by 
> > default unless you changed UDEVD_MAX_CHILDS for uid 0.
> > 
> > The default timeout for idle workers is 3 seconds, which may just happen 
> > to be long enough to panic your machine because of low memory.  If that's 
> > the case, I don't believe that it's a scheduler issue but rather a root 
> > abuse of setting all udevd threads to be OOM_DISABLE.
> > 
> > What is your udevd --version?  The latest is udev-146 released last month.
> 
> 145
> 
> Let me try and clone the vm some I don't break my reproducer.  I'll see
> if adding more memory fixes it.  Doesn't look like Fedora has built a
> -146 yet, I'll see if I can get one of those as well.
> 
> udev bug, configuration issue, whatever, or not, it's a regression that
> I used to be able to boot and updating my kernel leaves me unable to
> boot.  I think we all agree when 512M of memory isn't enough to boot to
> runlevel 3 we've got a problem   :)
> 

I totally agree, and my hypothesis is that the idle child workers are not 
being killed in time that they quickly accumulate approaching 
UDEVD_MAX_CHILDS and when the oom killer is called because of a write to 
shared memory, it can't kill any of these threads either since udevd sets 
them all to OOM_DISABLE and everything else is an unkillable kthread.

Bisecting that to a scheduler change would suggest that each udevd thread 
isn't returning from its poll() timeout fast enough; there's essentially a 
street race between udevd killing its own threads off because the poll 
timeout was exceeded and all your memory being used up and the machine 
panicking.  The scheduling change seems to have affected the speed of the 
former.

UDEVD_MAX_CHILDS defaults to 192 on your 512M machine unless overridden by 
an environment variable of the same name, so you may find it helpful to 
reduce this to a saner value.  I'd suggest a value lower than the number 
of udevd threads that were shown in your latest oom killer dump.

If that turns out to fix the issue for you, perhaps max_childs needs to be 
calculated in a slightly more conservative way in the userspace package 
since all threads come with the prerequisite of being OOM_DISABLE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/