lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8734oqhr4c.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Thu, 04 Jul 2024 09:40:03 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org,  linux-mm@...ck.org,  chrisl@...nel.org,
  david@...hat.com,  hannes@...xchg.org,  kasong@...cent.com,
  linux-kernel@...r.kernel.org,  mhocko@...e.com,  nphamcs@...il.com,
  ryan.roberts@....com,  shy828301@...il.com,  surenb@...gle.com,
  kaleshsingh@...gle.com,  hughd@...gle.com,  v-songbaohua@...o.com,
  willy@...radead.org,  xiang@...nel.org,  yosryahmed@...gle.com,
  baolin.wang@...ux.alibaba.com,  shakeel.butt@...ux.dev,
  senozhatsky@...omium.org,  minchan@...nel.org
Subject: Re: [PATCH RFC v4 0/2] mm: support mTHP swap-in for zRAM-like swapfile

Barry Song <21cnbao@...il.com> writes:

> On Wed, Jul 3, 2024 at 6:33 PM Huang, Ying <ying.huang@...el.com> wrote:
>>
>
> Ying, thanks!
>
>> Barry Song <21cnbao@...il.com> writes:

[snip]

>> > This patch introduces mTHP swap-in support. For now, we limit mTHP
>> > swap-ins to contiguous swaps that were likely swapped out from mTHP as
>> > a whole.
>> >
>> > Additionally, the current implementation only covers the SWAP_SYNCHRONOUS
>> > case. This is the simplest and most common use case, benefiting millions
>>
>> I admit that Android is an important target platform of Linux kernel.
>> But I will not advocate that it's MOST common ...
>
> Okay, I understand that there are still many embedded systems similar
> to Android, even if
> they are not Android :-)
>
>>
>> > of Android phones and similar devices with minimal implementation
>> > cost. In this straightforward scenario, large folios are always exclusive,
>> > eliminating the need to handle complex rmap and swapcache issues.
>> >
>> > It offers several benefits:
>> > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after
>> >    swap-out and swap-in.
>> > 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT
>> >    without fragmentation. Based on the observed data [1] on Chris's and Ryan's
>> >    THP swap allocation optimization, aligned swap-in plays a crucial role
>> >    in the success of THP_SWPOUT.
>> > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage
>> >    and enhancing compression ratios significantly. We have another patchset
>> >    to enable mTHP compression and decompression in zsmalloc/zRAM[2].
>> >
>> > Using the readahead mechanism to decide whether to swap in mTHP doesn't seem
>> > to be an optimal approach. There's a critical distinction between pagecache
>> > and anonymous pages: pagecache can be evicted and later retrieved from disk,
>> > potentially becoming a mTHP upon retrieval, whereas anonymous pages must
>> > always reside in memory or swapfile. If we swap in small folios and identify
>> > adjacent memory suitable for swapping in as mTHP, those pages that have been
>> > converted to small folios may never transition to mTHP. The process of
>> > converting mTHP into small folios remains irreversible. This introduces
>> > the risk of losing all mTHP through several swap-out and swap-in cycles,
>> > let alone losing the benefits of defragmentation, improved compression
>> > ratios, and reduced CPU usage based on mTHP compression/decompression.
>>
>> I understand that the most optimal policy in your use cases may be
>> always swapping-in mTHP in highest order.  But, it may be not in some
>> other use cases.  For example, relative slow swap devices, non-fault
>> sub-pages swapped out again before usage, etc.
>>
>> So, IMO, the default policy should be the one that can adapt to the
>> requirements automatically.  For example, if most non-fault sub-pages
>> will be read/written before being swapped out again, we should swap-in
>> in larger order, otherwise in smaller order.  Swap readahead is one
>> possible way to do that.  But, I admit that this may not work perfectly
>> in your use cases.
>>
>> Previously I hope that we can start with this automatic policy that
>> helps everyone, then check whether it can satisfy your requirements
>> before implementing the optimal policy for you.  But it appears that you
>> don't agree with this.
>>
>> Based on the above, IMO, we should not use your policy as default at
>> least for now.  A user space interface can be implemented to select
>> different swap-in order policy similar as that of mTHP allocation order
>> policy.  We need a different policy because the performance characters
>> of the memory allocation is quite different from that of swap-in.  For
>> example, the SSD reading could be much slower than the memory
>> allocation.  With the policy selection, I think that we can implement
>> mTHP swap-in for non-SWAP_SYNCHRONOUS too.  Users need to know what they
>> are doing.
>
> Agreed. Ryan also suggested something similar before.
> Could we add this user policy by:
>
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/swapin_enabled
> which could be 0 or 1, I assume we don't need so many "always inherit
> madvise never"?
>
> Do you have any suggestions regarding the user interface?

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/swapin_enabled

looks good to me.  To be consistent with "enabled" in the same
directory, and more importantly, to be extensible, I think that it's
better to start with at least "always never".  I believe that we will
add "auto" in the future to tune automatically.  Which can be used as
default finally.

--
Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ