lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufbHVXNZpW1mVhuF+4p8PbPq44w4chQX7Q6QYVDCjSqa1Q@mail.gmail.com>
Date: Sun, 27 Oct 2024 14:51:42 -0600
From: Yu Zhao <yuzhao@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>, 
	Zi Yan <ziy@...dia.com>, Mel Gorman <mgorman@...hsingularity.net>, 
	Matt Fleming <mfleming@...udflare.com>, David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Link Lin <linkl@...gle.com>
Subject: Re: [PATCH mm-unstable v2] mm/page_alloc: keep track of free highatomic

On Sun, Oct 27, 2024 at 2:36 PM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 10/27/24 21:17, Yu Zhao wrote:
> > On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >>
> >> On 10/26/24 05:36, Yu Zhao wrote:
> >> > OOM kills due to vastly overestimated free highatomic reserves were
> >> > observed:
> >> >
> >> >   ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
> >> >   Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
> >> >   Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
> >> >
> >> > The second line above shows that the OOM kill was due to the following
> >> > condition:
> >> >
> >> >   free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
> >> >
> >> > And the third line shows there were no free pages in any
> >> > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type
> >> > 'H'. Therefore __zone_watermark_unusable_free() underestimated the
> >> > usable free memory by over 1GB, which resulted in the unnecessary OOM
> >> > kill above.
> >> >
> >> > The comments in __zone_watermark_unusable_free() warns about the
> >> > potential risk, i.e.,
> >> >
> >> >   If the caller does not have rights to reserves below the min
> >> >   watermark then subtract the high-atomic reserves. This will
> >> >   over-estimate the size of the atomic reserve but it avoids a search.
> >> >
> >> > However, it is possible to keep track of free pages in reserved
> >> > highatomic pageblocks with a new per-zone counter nr_free_highatomic
> >> > protected by the zone lock, to avoid a search when calculating the
> >>
> >> It's only possible to track this reliably since the "mm: page_alloc:
> >> freelist migratetype hygiene" patchset was merged, which explains why
> >> nr_reserved_highatomic was used until now, even if it's imprecise.
> >
> > I just refreshed my memory by quickly going through the discussion
> > around that series and didn't find anything that helps me understand
> > the above. More pointers please?
>
> For example:
>
> - a page is on pcplist in MIGRATE_MOVABLE list
> - we reserve its pageblock as highatomic, which does nothing to the page on
> the pcplist
> - page above is flushed from pcplist to zone freelist, but it remembers it
> was MIGRATE_MOVABLE, merges with another buddy/buddies from the
> now-highatomic list, the resulting order-X page ends up on the movable
> freelist despite being in highatomic pageblock. The counter of free
> highatomic is now wrong wrt the freelist reality

This is the part I don't follow: how is it wrong w.r.t. the freelist
reality? The new nr_free_highatomic should reflect how many pages are
exactly on free_list[MIGRATE_HIGHATOMIC], because it's updated
accordingly.

(My current understanding is that, in this case, the reservation
itself is messed up, i.e., under-reserved.)

> The series has addressed various scenarios like that, where page can end up
> on the wrong freelist.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ