lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Jan 2024 08:36:06 -0800
From: "Zach O'Keefe" <zokeefe@...gle.com>
To: Charan Teja Kalla <quic_charante@...cinc.com>
Cc: Michal Hocko <mhocko@...e.com>, akpm@...ux-foundation.org, mgorman@...hsingularity.net, 
	david@...hat.com, vbabka@...e.cz, hannes@...xchg.org, 
	quic_pkondeti@...cinc.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Axel Rasmussen <axelrasmussen@...gle.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH V3 3/3] mm: page_alloc: drain pcp lists before oom kill

Thanks for the patch, Charan, and thanks to Yosry for pointing me towards it.

I took a look at data from our fleet, and there are many cases on
high-cpu-count machines where we find multi-GiB worth of data sitting
on pcpu free lists at the time of system oom-kill, when free memory
for the relevant zones are below min watermarks. I.e. clear cases
where this patch could have prevented OOM.

This kind of issue scales with the number of cpus, so presumably this
patch will only become increasingly valuable to both datacenters and
desktops alike going forward. Can we revamp it as a standalone patch?

Thanks,
Zach


On Tue, Nov 14, 2023 at 8:37 AM Charan Teja Kalla
<quic_charante@...cinc.com> wrote:
>
> Thanks Michal!!
>
> On 11/14/2023 4:18 PM, Michal Hocko wrote:
> >> At least in my particular stress test case it just delayed the OOM as i
> >> can see that at the time of OOM kill, there are no free pcp pages. My
> >> understanding of the OOM is that it should be the last resort and only
> >> after doing the enough reclaim retries. CMIW here.
> > Yes it is a last resort but it is a heuristic as well. So the real
> > questoin is whether this makes any practical difference outside of
> > artificial workloads. I do not see anything particularly worrying to
> > drain the pcp cache but it should be noted that this won't be 100%
> > either as racing freeing of memory will end up on pcp lists first.
>
> Okay, I don't have any practical scenario where this helped me in
> avoiding the OOM.  Will comeback If I ever encounter this issue in
> practical scenario.
>
> Also If you have any comments on [PATCH V2 2/3] mm: page_alloc: correct
> high atomic reserve calculations will help me.
>
> Thanks.
>

Powered by blists - more mailing lists