lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8459b884-5877-41bd-a882-546e046b9dad@suse.cz>
Date: Sun, 27 Oct 2024 21:36:39 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Yu Zhao <yuzhao@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>,
 Mel Gorman <mgorman@...hsingularity.net>,
 Matt Fleming <mfleming@...udflare.com>, David Rientjes
 <rientjes@...gle.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Link Lin <linkl@...gle.com>
Subject: Re: [PATCH mm-unstable v2] mm/page_alloc: keep track of free
 highatomic

On 10/27/24 21:17, Yu Zhao wrote:
> On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@...e.cz> wrote:
>>
>> On 10/26/24 05:36, Yu Zhao wrote:
>> > OOM kills due to vastly overestimated free highatomic reserves were
>> > observed:
>> >
>> >   ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
>> >   Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
>> >   Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
>> >
>> > The second line above shows that the OOM kill was due to the following
>> > condition:
>> >
>> >   free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
>> >
>> > And the third line shows there were no free pages in any
>> > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type
>> > 'H'. Therefore __zone_watermark_unusable_free() underestimated the
>> > usable free memory by over 1GB, which resulted in the unnecessary OOM
>> > kill above.
>> >
>> > The comments in __zone_watermark_unusable_free() warns about the
>> > potential risk, i.e.,
>> >
>> >   If the caller does not have rights to reserves below the min
>> >   watermark then subtract the high-atomic reserves. This will
>> >   over-estimate the size of the atomic reserve but it avoids a search.
>> >
>> > However, it is possible to keep track of free pages in reserved
>> > highatomic pageblocks with a new per-zone counter nr_free_highatomic
>> > protected by the zone lock, to avoid a search when calculating the
>>
>> It's only possible to track this reliably since the "mm: page_alloc:
>> freelist migratetype hygiene" patchset was merged, which explains why
>> nr_reserved_highatomic was used until now, even if it's imprecise.
> 
> I just refreshed my memory by quickly going through the discussion
> around that series and didn't find anything that helps me understand
> the above. More pointers please?

For example:

- a page is on pcplist in MIGRATE_MOVABLE list
- we reserve its pageblock as highatomic, which does nothing to the page on
the pcplist
- page above is flushed from pcplist to zone freelist, but it remembers it
was MIGRATE_MOVABLE, merges with another buddy/buddies from the
now-highatomic list, the resulting order-X page ends up on the movable
freelist despite being in highatomic pageblock. The counter of free
highatomic is now wrong wrt the freelist reality

The series has addressed various scenarios like that, where page can end up
on the wrong freelist.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ