linux-kernel - Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGsJ_4xWyA1qtTR=iNukJ7=RqqJAxc2hXZSX6LtP3WbcEK5g9Q@mail.gmail.com>
Date: Mon, 11 Mar 2024 18:01:14 +0800
From: Barry Song <21cnbao@...il.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: David Hildenbrand <david@...hat.com>, Lance Yang <ioworker0@...il.com>, 
	Vishal Moola <vishal.moola@...il.com>, akpm@...ux-foundation.org, zokeefe@...gle.com, 
	shy828301@...il.com, mhocko@...e.com, fengwei.yin@...el.com, 
	xiehuan09@...il.com, wangkefeng.wang@...wei.com, songmuchun@...edance.com, 
	peterx@...hat.com, minchan@...nel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free

On Mon, Mar 11, 2024 at 5:55 PM Ryan Roberts <ryan.roberts@....com> wrote:
>
> [...]
>
> >>>>> we don't want reclamation overhead later. and we want memories immediately
> >>>>> available to others.
> >>>>
> >>>> But by that logic, you also don't want to leave the large folio partially mapped
> >>>> all the way until the last subpage is CoWed. Surely you would want to reclaim it
> >>>> when you reach partial map status?
> >>>
> >>> To some extent, I agree. But then we will have two many copies. The last
> >>> subpage is small, and a safe place to copy instead.
> >>>
> >>> We actually had to tune userspace to decrease partial map as too much
> >>> partial map both unfolded CONT-PTE and wasted too much memory. if a
> >>> vma had too much partial map, we disabled mTHP on this VMA.
> >>
> >> I actually had a whacky idea around introducing selectable page size ABI
> >> per-process that might help here. I know Android is doing work to make the
> >> system 16K page compatible. You could run most of the system processes with 16K
> >> ABI on top of 4K kernel. Then those processes don't even have the ability to
> >> madvise/munmap/mprotect/mremap anything less than 16K alignment so that acts as
> >> an anti-fragmentation mechanism while allowing non-16K capable processes to run
> >> side-by-side. Just a passing thought...
> >
> > Right, this project faces a challenge in supporting legacy
> > 4KiB-aligned applications.
> > but I don't find it will be an issue to run 16KiB-aligned applications
> > on a kernel whose
> > page size is 4KiB.
>
> Yes, agreed that a 16K-aligned (or 64K-aligned) app will work without issue on
> 4K kernel, but it will also use getpagesize() and know what the page size is.
> I'm suggesting you could actually run these apps on a 4K kernel but with a 16K
> ABI and potentially get close to the native 16K performance out of them. It's
> just a thought though - I don't have any data that actually shows this is better
> than just running on a 4K kernel with a 4K ABI, and using 16K or 64K mTHP
> opportunistically.

 I fully agree with this as my Ubuntu filesystem can run on 4KiB, 16KiB and
64KiB basepage size as its elf files are 64KiB aligned. so I would expect
new Android apps/middleware move to 64KiB ABI though it might want to
change the base page size to 16KiB instead.
I believe this is the case.

Thanks
Barry