linux-kernel - Re: [PATCH v5 0/2] mm/madvise: enhance lazyfreeing with mTHP in madvise

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240410145033.5cdb8a41f3a6894a62191f42@linux-foundation.org>
Date: Wed, 10 Apr 2024 14:50:33 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Lance Yang <ioworker0@...il.com>
Cc: ryan.roberts@....com, david@...hat.com, 21cnbao@...il.com,
 mhocko@...e.com, fengwei.yin@...el.com, zokeefe@...gle.com,
 shy828301@...il.com, xiehuan09@...il.com, wangkefeng.wang@...wei.com,
 songmuchun@...edance.com, peterx@...hat.com, minchan@...nel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] mm/madvise: enhance lazyfreeing with mTHP in
 madvise_free

On Mon,  8 Apr 2024 12:24:35 +0800 Lance Yang <ioworker0@...il.com> wrote:

> Hi All,
> 
> This patchset adds support for lazyfreeing multi-size THP (mTHP) without
> needing to first split the large folio via split_folio(). However, we
> still need to split a large folio that is not fully mapped within the
> target range.
> 
> If a large folio is locked or shared, or if we fail to split it, we just
> leave it in place and advance to the next PTE in the range. But note that
> the behavior is changed; previously, any failure of this sort would cause
> the entire operation to give up. As large folios become more common,
> sticking to the old way could result in wasted opportunities.
> 
> Performance Testing
> ===================
> 
> On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of
> the same size results in the following runtimes for madvise(MADV_FREE)
> in seconds (shorter is better):
> 
> Folio Size |   Old    |   New    | Change
> ------------------------------------------
>       4KiB | 0.590251 | 0.590259 |    0%
>      16KiB | 2.990447 | 0.185655 |  -94%
>      32KiB | 2.547831 | 0.104870 |  -95%
>      64KiB | 2.457796 | 0.052812 |  -97%
>     128KiB | 2.281034 | 0.032777 |  -99%
>     256KiB | 2.230387 | 0.017496 |  -99%
>     512KiB | 2.189106 | 0.010781 |  -99%
>    1024KiB | 2.183949 | 0.007753 |  -99%
>    2048KiB | 0.002799 | 0.002804 |    0%

That looks nice but punting work to another thread can slightly
increase overall system load and can mess up utilization accounting by
attributing work to threads which didn't initiate that work.

And there's a corner-case risk where the thread running madvise() has
realtime policy (SCHED_RR/SCHED_FIFO) on a single-CPU system,
preventing any other threads from executing, resulting in indefinitely
deferred freeing resulting in memory squeezes or even OOM conditions.

It would be good if the changelog(s) were to show some consideration of
such matters and some demonstration that the benefits exceed the risks
and costs.