linux-kernel - Re: [RFC PATCH 0/1] Buddy allocator like folio split

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHbLzkqDo7ADnwfvQsdKVqE18sUuz+98SVuk1nerW+vsE=nFSQ@mail.gmail.com>
Date: Fri, 18 Oct 2024 12:44:13 -0700
From: Yang Shi <shy828301@...il.com>
To: Zi Yan <ziy@...dia.com>
Cc: David Hildenbrand <david@...hat.com>, linux-mm@...ck.org, 
	"Matthew Wilcox (Oracle)" <willy@...radead.org>, Ryan Roberts <ryan.roberts@....com>, 
	Hugh Dickins <hughd@...gle.com>, "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>, 
	Yang Shi <yang@...amperecomputing.com>, Miaohe Lin <linmiaohe@...wei.com>, 
	Kefeng Wang <wangkefeng.wang@...wei.com>, Yu Zhao <yuzhao@...gle.com>, 
	John Hubbard <jhubbard@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/1] Buddy allocator like folio split

On Fri, Oct 18, 2024 at 12:11 PM Zi Yan <ziy@...dia.com> wrote:
>
> On 18 Oct 2024, at 14:42, David Hildenbrand wrote:
>
> > On 09.10.24 00:37, Zi Yan wrote:
> >> Hi all,
> >
> > Hi!
> >
> >>
> >> Matthew and I have discussed about a different way of splitting large
> >> folios. Instead of split one folio uniformly into the same order smaller
> >> ones, doing buddy allocator like split can reduce the total number of
> >> resulting folios, the amount of memory needed for multi-index xarray
> >> split, and keep more large folios after a split. In addition, both
> >> Hugh[1] and Ryan[2] had similar suggestions before.
> >>
> >> The patch is an initial implementation. It passes simple order-9 to
> >> lower order split tests for anonymous folios and pagecache folios.
> >> There are still a lot of TODOs to make it upstream. But I would like to gather
> >> feedbacks before that.
> >
> > Interesting, but I don't see any actual users besides the debug/test interface wired up.
>
> Right. I am working on it now, since two potential users, anon large folios
> and truncate, might need more sophisticated implementation to fully take
> advantage of this new split.
>
> For anon large folios, this might be open to debate, if only a subset of
> orders are enabled, I assume folio_split() can only split to smaller
> folios with the enabled orders. For example, to get one order-0 from
> an order-9, and only order-4 (64KB on x86) is enabled, folio_split()
> can only split the order-9 to 16 order-0s, 31 order-4s, unless we are
> OK with anon large folios with not enabled orders appear in the system.

For anon large folios, deferred split may be a problem too. The
deferred split is typically used to free the unmapped subpages by, for
example, MADV_DONTNEED. But we don't know which subpages are unmapped
without reading their _mapcount by iterating every subpages.

>
> For truncate, the example you give below is an easy one. For cases like
> punching from 3rd to 5th order-0 of a order-3, [O0, O0, __, __, __, O0, O0, O0],
> I am thinking which approach is better:
>
> 1. two folio_split()s,
>   1) split second order-1 from order-3, 2) split order-0 from the second order-2;
>
> 2. one folio_split() by making folio_split() to support arbitrary range split,
> so two steps in 1 can be done in one shot, which saves unmapping and remapping
> cost.
>
> Maybe I should go for 1 first as an easy route, but I still need an algorithm
> in truncate to figure out the way of calling folio_split()s.
>
> >
> > I assume ftruncate() / fallocate(PUNCH_HOLE) might be good use cases? For example, when punching 1M of a 2M folio, we can just leave a 1M folio in the pagecache.
>
> Yes, I am trying to make this work.
>
> >
> > Any other obvious users you have in mind?
>
> Presumably, folio_split() should replace all split_huge*() to reduce total
> number of folios after a split. But for swapcache folios, I need to figure
> out if swap system works well with buddy allocator like splits.
>
>
>
> Best Regards,
> Yan, Zi