lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0907170225570.15373@chino.kir.corp.google.com>
Date:	Fri, 17 Jul 2009 02:34:09 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Paul Menage <menage@...gle.com>
cc:	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, mel@....ul.ie, npiggin@...e.de
Subject: Re: [PATCH] copy over oom_adj value at fork time

On Thu, 16 Jul 2009, Paul Menage wrote:

> How about if instead of having the oom_adj be per-mm, we kept an array
> of counters in the mm, tracking how many users were at each oom_adj
> level; the OOM killer could then use the level of the mm's highest
> oom_adj user when deciding how to calculate the badness of a thread
> using that mm.
> 

That would lead to the same inconsistencies that we had before: consider 
two tasks sharing the same mm_struct, taskA and taskB.  It was previously 
possible for taskA to have an oom_adj value of -15 and taskB to have an 
oom_adj value of +15.  This would cause /proc/pid/oom_score to be very 
small for taskA and oom_score would be very large for taskB.  With your 
proposal, taskB's badness score would implicitly be very small, yet it is 
reported to userspace as very high.

The only way to workaround that is by using the highest oom_adj user for 
the mm_struct from the array in reporting /proc/pid/oom_score, as well.  
But that would lead to /proc/pid/oom_adj not affecting oom_score at all, 
which isn't consistent.

I think you'll find that having oom_adj values purely be an attribute of 
the memory it represents is the cleanest solution since it most accurately 
describes how the oom killer interprets it when deciding on which task to 
kill.

> That would preserve the previous semantics of letting a spawned child
> inherit a per-thread oom_adj value, while avoiding the specific
> problem of the OOM killer getting livelocked (that David's patch
> originally addressed) and the more general case of the inconsistency
> in determining the oom_adj level of an mm depending on which thread
> you look at.
> 

Right, it's still a little strange that changing /proc/pid/oom_adj for one 
thread will change it for another if they share memory, even if they are 
in different thread groups, but that shouldn't happen if the admin 
understands that the oom killer must kill _all_ threads sharing memory 
with the target to lead to future memory freeing.

The inheritance issue should be fixed with Rik's patch with the exception 
of vfork -> change /proc/pid-of-child/oom_adj -> execve.  If scripts were 
written to do that with the old behavior, they'll have to adjust to change 
oom_adj _after_ the execve to avoid changing the oom_adj value of the 
vfork parent.  If there is no execve, or we're just doing CLONE_VM, then 
the child shares memory with the parent and, thus, their oom_adj values 
will be the same.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ