lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufZB8zCwO4nT3aeZWxJO99SD9vUJxhkGedWrmmz3J-Sczw@mail.gmail.com>
Date: Mon, 28 Oct 2024 11:54:27 -0600
From: Yu Zhao <yuzhao@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>, 
	Zi Yan <ziy@...dia.com>, Mel Gorman <mgorman@...hsingularity.net>, 
	Matt Fleming <mfleming@...udflare.com>, David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Link Lin <linkl@...gle.com>
Subject: Re: [PATCH mm-unstable v2] mm/page_alloc: keep track of free highatomic

On Mon, Oct 28, 2024 at 5:01 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 10/28/24 1:24 AM, Yu Zhao wrote:
> > On Sun, Oct 27, 2024 at 3:05 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >>
> >> On 10/27/24 21:51, Yu Zhao wrote:
> >>> On Sun, Oct 27, 2024 at 2:36 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >>>>
> >>>> On 10/27/24 21:17, Yu Zhao wrote:
> >>>>> On Sun, Oct 27, 2024 at 1:53 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> >>>>>>
> >>>>
> >>>> For example:
> >>>>
> >>>> - a page is on pcplist in MIGRATE_MOVABLE list
> >>>> - we reserve its pageblock as highatomic, which does nothing to the page on
> >>>> the pcplist
> >>>> - page above is flushed from pcplist to zone freelist, but it remembers it
> >>>> was MIGRATE_MOVABLE, merges with another buddy/buddies from the
> >>>> now-highatomic list, the resulting order-X page ends up on the movable
> >>>> freelist despite being in highatomic pageblock. The counter of free
> >>>> highatomic is now wrong wrt the freelist reality
> >>>
> >>> This is the part I don't follow: how is it wrong w.r.t. the freelist
> >>> reality? The new nr_free_highatomic should reflect how many pages are
> >>> exactly on free_list[MIGRATE_HIGHATOMIC], because it's updated
> >>> accordingly.
> >>
> >> You'd have to try implementing your change in the kernel without that
> >> migratetype hygiene series, and see how it would either not work, or you'd
> >> end up implementing the series as part of that.
> >
> > A proper backport would need to track the MT of the free_list a page
> > is deleted from, and I see no reason why in such a proper backport
> > "the counter could drift easily" or "the counter of free highatomic is
> > now wrong wrt the freelist reality". So I assume you actually mean
> > this patch can't be backported cleanly? (Which I do agree.)
>
> Yes you're right. But since we don't plan to backport it beyond 6.12,
> sorry for sidetracking the discussion unnecessarily. More importantly,
> is it possible to change the implementation as I suggested?

The only reason I didn't fold account_highatomic_freepages() into
account_freepages() is because the former must be called under the
zone lock, which is also how the latter is called but not as a
requirement.

I understand where you come from when suggesting a new per-cpu counter
for free highatomic. I have to disagree with that because 1) free
highatomic is relatively small and drifting might defeat its purpose;
2) per-cpu memory is among the top kernel memory overhead in our fleet
-- it really adds up. So I prefer not to use per-cpu counters unless
necessary.

So if it's ok with you, I'll just fold account_highatomic_freepages()
into account_freepages(), but keep the counter as per zone, not per
cpu.

> [1] Hooking
> to __del_page_from_free_list() and __add_to_free_list() means extra work
> in every loop iteration in expand() and __free_one_page(). The
> migratetype hygiene should ensure it's not necessary to intercept every
> freelist add/move and hooking to account_freepages() should be
> sufficient and in line with the intended design.

Agreed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ