lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0c7f061-6284-4fe5-8cbf-93281070895b@arm.com>
Date: Tue, 30 Jul 2024 09:36:39 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Matthew Wilcox <willy@...radead.org>, Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, ying.huang@...el.com,
 baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com,
 hannes@...xchg.org, hughd@...gle.com, kaleshsingh@...gle.com,
 kasong@...cent.com, linux-kernel@...r.kernel.org, mhocko@...e.com,
 minchan@...nel.org, nphamcs@...il.com, senozhatsky@...omium.org,
 shakeel.butt@...ux.dev, shy828301@...il.com, surenb@...gle.com,
 v-songbaohua@...o.com, xiang@...nel.org, yosryahmed@...gle.com
Subject: Re: [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy

On 29/07/2024 04:52, Matthew Wilcox wrote:
> On Fri, Jul 26, 2024 at 09:46:18PM +1200, Barry Song wrote:
>> A user space interface can be implemented to select different swap-in
>> order policies, similar to the mTHP allocation order policy. We need
>> a distinct policy because the performance characteristics of memory
>> allocation differ significantly from those of swap-in. For example,
>> SSD read speeds can be much slower than memory allocation. With
>> policy selection, I believe we can implement mTHP swap-in for
>> non-SWAP_SYNCHRONOUS scenarios as well. However, users need to understand
>> the implications of their choices. I think that it's better to start
>> with at least always never. I believe that we will add auto in the
>> future to tune automatically, which can be used as default finally.
> 
> I strongly disagree.  Use the same sysctl as the other anonymous memory
> allocations.

I vaguely recall arguing in the past that just because the user has requested 2M
THP that doesn't mean its the right thing to do for performance to swap-in the
whole 2M in one go. That's potentially a pretty huge latency, depending on where
the backend is, and it could be a waste of IO if the application never touches
most of the 2M. Although the fact that the application hinted for a 2M THP in
the first place hopefully means that they are storing objects that need to be
accessed at similar times. Today it will be swapped in page-by-page then
eventually collapsed by khugepaged.

But I think those arguments become weaker as the THP size gets smaller. 16K/64K
swap-in will likely yield significant performance improvements, and I think
Barry has numbers for this?

So I guess we have a few options:

 - Just use the same sysfs interface as for anon allocation, And see if anyone
reports performance regressions. Investigate one of the options below if an
issue is raised. That's the simplest and cleanest approach, I think.

 - New sysfs interface as Barry has implemented; nobody really wants more
controls if it can be helped.

 - Hardcode a size limit (e.g. 64K); I've tried this in a few different contexts
and never got any traction.

 - Secret option 4: Can we allocate a full-size folio but only choose to swap-in
to it bit-by-bit? You would need a way to mark which pages of the folio are
valid (e.g. per-page flag) but guess that's a non-starter given the strategy to
remove per-page flags?

Thanks,
Ryan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ