lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 15 Oct 2007 07:40:03 -0400
From:	Theodore Tso <tytso@....edu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Rob Landley <rob@...dley.net>,
	James Bottomley <James.Bottomley@...eleye.com>,
	Matthew Wilcox <matthew@....cx>, linux-kernel@...r.kernel.org,
	linux-scsi@...r.kernel.org, Jens Axboe <axboe@...e.de>,
	Suparna Bhattacharya <suparna@...ibm.com>,
	Nick Piggin <piggin@...erone.com.au>
Subject: Re: OOM killer gripe (was Re: What still uses the block layer?)

On Mon, Oct 15, 2007 at 11:37:44PM +1000, Nick Piggin wrote:
> I hate to go completely offtopic here, but disks are so incredibly
> slow when compared to RAM that there is really nothing the kernel
> can do about this. Presumably the job will finish, given infinite
> time.

About 6 weeks ago, on a 2.6.23-rc kernel, I accidentally typed "make
-j", and left off the 4 before I hit the return key.  About 2-3
minutes later, the box locked pretty tight.  I managed to switch to a
VT console before I lost total control of X (took many, many minutes
to do the switch), but after many minutes, managed to get logged into
the console, but I wasn't able to get a ps command to complete so I
could start killing processes.  (I probably should have just done a
"killall make" right away, but hindsight is 20/20.)

The console was showing that the OOM killer was attempting to kill
processes, but apparently not fast enough to stem the tide of all of
the new processes getting generated by the make -j.  (I'm guessing
that it was killing the gcc processes and not the make processes.)

> Would an oom-kill-someone-now sysrq be of help, I wonder?

I tried sysrq-f (oom_kill), but no dice.  Given that the oom killer
was active and apparently triggering on its own, this wasn't all that
surprising.

The interesting thing is I tried to do an sysrq-e (send SIGTERM to all
processes except), waited 5 minutes or so, then tried an alt-sysrq-i
(send SIGKILL to all processes except init), and the system was still
thrashing itself to death, even after giving it plenty of time to try
to recover.

I finally gave up and held down the power button.  This was on a box
with 4 gigs memory (but only 3 gigs visible thanks a cheap
BIOS/chipset) and 4 gigs swap (mainly intended for suspend/resume).

I chalked it up to me being stupid (I should have noticed and
Ctrl-C'ed the make -j much more quickly, or if I were a sysadmin on a
time-sharing system with users I didn't trust, configured RLIMIT_NPROC
and/or per-user container resource limits) and the OOM killer not
being aggressive enough in such a situation.  But having better things
to do, I didn't go whining on LKML about it, although I have to say
that the kernel behavior isn't exactly ideal.  One of these days when
I have time, I'll try investigating it with a few memlocked processes
running at real-time priorities and Systemtap and figure out what the
heck was going on....

I suppose I should just configure suspending to a file instead of a
swap partition, but I've just historically trusted suspend/resume to a
swap partition much more than to a file.  Or maybe I should hack in a
sysctl to prevent any swapping even though the swap partition is
configured (so only suspend/resume will use it).

					- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ