lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111202015921.GZ7046@dastard>
Date:	Fri, 2 Dec 2011 12:59:21 +1100
From:	Dave Chinner <david@...morbit.com>
To:	David Rientjes <rientjes@...gle.com>
Cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [3.2-rc3] OOM killer doesn't kill the obvious memory hog

On Thu, Dec 01, 2011 at 02:35:31PM -0800, David Rientjes wrote:
> On Thu, 1 Dec 2011, Dave Chinner wrote:
> 
> > > /*
> > >  * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
> > >  * pid.
> > >  */
> > > #define OOM_SCORE_ADJ_MIN       (-1000)
> > > 
> > >  
> > > IIUC, this task cannot be killed by oom-killer because of oom_score_adj settings.
> > 
> > It's not me or the test suite that setting this, so it's something
> > the kernel must be doing automagically.
> > 
> 
> The kernel does not set oom_score_adj to ever disable oom killing for a 
> thread.  The only time the kernel touches oom_score_adj is when setting it 
> to "1000" in ksm and swap to actually prefer a memory allocator for oom 
> killing.
> 
> It's also possible to change this value via the deprecated 
> /proc/pid/oom_adj interface until it is removed next year.  Check your 
> dmesg for warnings about using the deprecated oom_adj interface or change 
> the printk_once() in oom_adjust_write() to a normal printk() to catch it.

No warnings at all, as I've already said. If it is userspace,
whatever is doing it is using the oom_score_adj interface correctly.

Hmmm - google is finding reports of sshd randomly inheriting -17 at
startup depending modules loaded on debian systems. Except, I'm not
using a modular kernel and it's running in a VM so there's no
firmware being loaded.

Yup, all my systems end up with a random value for sessions logged
in via ssh:

$ ssh -X test-2
Linux test-2 3.2.0-rc3-dgc+ #114 SMP Thu Dec 1 22:14:55 EST 2011 x86_64
No mail.
Last login: Fri Dec  2 11:34:44 2011 from deranged
$ cat /proc/self/oom_adj
-17
$ sudo reboot;exit
[sudo] password for dave:

Broadcast message from root@...t-2 (pts/0) (Fri Dec  2 12:39:39 2011):

The system is going down for reboot NOW!
logout
Connection to test-2 closed.
$ ssh -X test-2
Linux test-2 3.2.0-rc3-dgc+ #114 SMP Thu Dec 1 22:14:55 EST 2011 x86_64
No mail.
Last login: Fri Dec  2 12:40:15 2011 from deranged
$ cat /proc/self/oom_adj 
0
$ 

That'll be the root cause of the problem - I just caused an OOM
panic with test 019....

<sigh>

The reports all cycle around this loop:

	linux-mm says userspace/distro problem
	distro says openssh problem
	openssh says kernel problem

And there doesn't appear to be any resolution in any of the reports,
just circular finger pointing and frustrated users.

I can't find anything in the distro startup or udev scripts that
modify the oom parameters, and the openssh guys say they only
pass on the value inhereted from ssh's parent process, so it clearly
not obvious where the bug lies at this point. It's been around for
some time, though...

More digging to do...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ