linux-kernel - Re: [PATCH V3 3/3] mm: page_alloc: drain pcp lists before oom kill

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5adb12eb-8403-5860-28eb-5f6ab12f3c04@quicinc.com>
Date: Fri, 26 Jan 2024 16:17:04 +0530
From: Charan Teja Kalla <quic_charante@...cinc.com>
To: Zach O'Keefe <zokeefe@...gle.com>
CC: Michal Hocko <mhocko@...e.com>, <akpm@...ux-foundation.org>,
        <mgorman@...hsingularity.net>, <david@...hat.com>, <vbabka@...e.cz>,
        <hannes@...xchg.org>, <quic_pkondeti@...cinc.com>,
        <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH V3 3/3] mm: page_alloc: drain pcp lists before oom kill

Hi Michal/Zach,

On 1/25/2024 10:06 PM, Zach O'Keefe wrote:
> Thanks for the patch, Charan, and thanks to Yosry for pointing me towards it.
> 
> I took a look at data from our fleet, and there are many cases on
> high-cpu-count machines where we find multi-GiB worth of data sitting
> on pcpu free lists at the time of system oom-kill, when free memory
> for the relevant zones are below min watermarks. I.e. clear cases
> where this patch could have prevented OOM.
> 
> This kind of issue scales with the number of cpus, so presumably this
> patch will only become increasingly valuable to both datacenters and
> desktops alike going forward. Can we revamp it as a standalone patch?
> 

Glad to see a real world use case for this. We too have observed OOM for
 every now and then with relatively significant PCP cache, but in all
such cases OOM is imminent.

AFAICS, Your use case description to be seen like a premature OOM
scenario despite lot of free memory sitting on the pcp lists, where this
patch should've helped.

@Michal: This usecase seems to be a practical scenario that you were
asking below.
Other concern of racing freeing of memory ending up in pcp lists first
-- will that be such a big issue? This patch enables, drain the current
pcp lists now that can avoid the oom altogether. If this racing free is
a major concern, should that be taken as a separate discussion?

Will revamp this as a separate patch if no more concerns here.

> Thanks,
> Zach
> 
> 
> On Tue, Nov 14, 2023 at 8:37 AM Charan Teja Kalla
> <quic_charante@...cinc.com> wrote:
>>
>> Thanks Michal!!
>>
>> On 11/14/2023 4:18 PM, Michal Hocko wrote:
>>>> At least in my particular stress test case it just delayed the OOM as i
>>>> can see that at the time of OOM kill, there are no free pcp pages. My
>>>> understanding of the OOM is that it should be the last resort and only
>>>> after doing the enough reclaim retries. CMIW here.
>>> Yes it is a last resort but it is a heuristic as well. So the real
>>> questoin is whether this makes any practical difference outside of
>>> artificial workloads. I do not see anything particularly worrying to
>>> drain the pcp cache but it should be noted that this won't be 100%
>>> either as racing freeing of memory will end up on pcp lists first.
>>
>> Okay, I don't have any practical scenario where this helped me in
>> avoiding the OOM.  Will comeback If I ever encounter this issue in
>> practical scenario.
>>
>> Also If you have any comments on [PATCH V2 2/3] mm: page_alloc: correct
>> high atomic reserve calculations will help me.
>>
>> Thanks.
>>