linux-kernel - Re: [PATCHv5] mm: skip CMA pages when they are not available

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a562bae0-d779-620a-98bc-6102468aecae@redhat.com>
Date:   Mon, 12 Jun 2023 11:29:18 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        "zhaoyang.huang" <zhaoyang.huang@...soc.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Zhaoyang Huang <huangzhaoyang@...il.com>, ke.wang@...soc.com
Subject: Re: [PATCHv5] mm: skip CMA pages when they are not available

On 10.06.23 00:35, Andrew Morton wrote:
> On Wed, 31 May 2023 10:51:01 +0800 "zhaoyang.huang" <zhaoyang.huang@...soc.com> wrote:
> 
>> From: Zhaoyang Huang <zhaoyang.huang@...soc.com>
>>
>> This patch fixes unproductive reclaiming of CMA pages by skipping them when they
>> are not available for current context. It is arise from bellowing OOM issue, which
>> caused by large proportion of MIGRATE_CMA pages among free pages.
>>
>> [   36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
>> [   36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
>> [   36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
>> ...
>> [   36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
>> [   36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
>> [   36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0
>>
> 
> We saw plenty of feedback for earlier versions, but now silence.  Does
> this mean we're all OK with v5?

The logic kind-of makes sense to me (but the kswapd special-casing 
already shows that it might be a bit fragile for future use), but I did 
not yet figure out if this actually fixes something or is a pure 
performance improvement.

As we phrased it in the comment "It is waste of effort", but in the 
patch description "This patch fixes unproductive reclaiming" + a scary 
dmesg.

Am I correct that this is a pure performance optimization (and the issue 
revealed itself in that OOM report), or does this actually *fix* something?

If it's a performance improvement, it would be good to show that it is 
an actual improvement worth the churn ...

-- 
Cheers,

David / dhildenb