linux-kernel - Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for zRAM-like swapfile

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGsJ_4yqLVvCUFpHjWmNAYvPRMzGK8JJWYMXJLR7d9UhKp+QDA@mail.gmail.com>
Date: Tue, 30 Jul 2024 08:03:05 +1200
From: Barry Song <21cnbao@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, ying.huang@...el.com, 
	baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com, 
	hannes@...xchg.org, hughd@...gle.com, kaleshsingh@...gle.com, 
	kasong@...cent.com, linux-kernel@...r.kernel.org, mhocko@...e.com, 
	minchan@...nel.org, nphamcs@...il.com, ryan.roberts@....com, 
	senozhatsky@...omium.org, shakeel.butt@...ux.dev, shy828301@...il.com, 
	surenb@...gle.com, v-songbaohua@...o.com, xiang@...nel.org, 
	yosryahmed@...gle.com, Chuanhua Han <hanchuanhua@...o.com>
Subject: Re: [PATCH v5 3/4] mm: support large folios swapin as a whole for
 zRAM-like swapfile

On Tue, Jul 30, 2024 at 3:13 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Tue, Jul 30, 2024 at 01:11:31AM +1200, Barry Song wrote:
> > for this zRAM case, it is a new allocated large folio, only
> > while all conditions are met, we will allocate and map
> > the whole folio. you can check can_swapin_thp() and
> > thp_swap_suitable_orders().
>
> YOU ARE DOING THIS WRONGLY!
>
> All of you anonymous memory people are utterly fixated on TLBs AND THIS
> IS WRONG.  Yes, TLB performance is important, particularly with crappy
> ARM designs, which I know a lot of you are paid to work on.  But you
> seem to think this is the only consideration, and you're making bad
> design choices as a result.  It's overly complicated, and you're leaving
> performance on the table.
>
> Look back at the results Ryan showed in the early days of working on
> large anonymous folios.  Half of the performance win on his system came
> from using larger TLBs.  But the other half came from _reduced software
> overhead_.  The LRU lock is a huge problem, and using large folios cuts
> the length of the LRU list, hence LRU lock hold time.
>
> Your _own_ data on how hard it is to get hold of a large folio due to
> fragmentation should be enough to convince you that the more large folios
> in the system, the better the whole system runs.  We should not decline to
> allocate large folios just because they can't be mapped with a single TLB!

I am not convinced. for a new allocated large folio, even alloc_anon_folio()
of do_anonymous_page() does the exactly same thing

alloc_anon_folio()
{
        /*
         * Get a list of all the (large) orders below PMD_ORDER that are enabled
         * for this vma. Then filter out the orders that can't be allocated over
         * the faulting address and still be fully contained in the vma.
         */
        orders = thp_vma_allowable_orders(vma, vma->vm_flags,
                        TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1);
        orders = thp_vma_suitable_orders(vma, vmf->address, orders);

}

you are not going to allocate a mTHP for an unaligned address for a new
PF.
Please point out where it is wrong.

Thanks
Barry