lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8545b4d8-ba21-4607-8217-2b7b02ccb4d8@linux.dev>
Date: Fri, 30 Aug 2024 12:52:06 +0800
From: Chengming Zhou <chengming.zhou@...ux.dev>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>,
 Nhat Pham <nphamcs@...il.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "hannes@...xchg.org" <hannes@...xchg.org>,
 "yosryahmed@...gle.com" <yosryahmed@...gle.com>,
 "ryan.roberts@....com" <ryan.roberts@....com>,
 "Huang, Ying" <ying.huang@...el.com>, "21cnbao@...il.com"
 <21cnbao@...il.com>, "akpm@...ux-foundation.org"
 <akpm@...ux-foundation.org>, "Zou, Nanhai" <nanhai.zou@...el.com>,
 "Feghali, Wajdi K" <wajdi.k.feghali@...el.com>,
 "Gopal, Vinodh" <vinodh.gopal@...el.com>, Usama Arif <usamaarif642@...il.com>
Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

On 2024/8/30 03:38, Sridhar, Kanchana P wrote:
> Hi Nhat,
> 
>> -----Original Message-----
>> From: Nhat Pham <nphamcs@...il.com>
>> Sent: Thursday, August 29, 2024 10:11 AM
>> To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
>> Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
>> hannes@...xchg.org; yosryahmed@...gle.com; ryan.roberts@....com;
>> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
>> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
>> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>;
>> Usama Arif <usamaarif642@...il.com>; Chengming Zhou
>> <chengming.zhou@...ux.dev>
>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
>>
>> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
>> <kanchana.p.sridhar@...el.com> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Nhat Pham <nphamcs@...il.com>
>>>> Sent: Wednesday, August 28, 2024 2:35 PM
>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
>>>> Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
>>>> hannes@...xchg.org; yosryahmed@...gle.com;
>> ryan.roberts@....com;
>>>> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
>>>> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
>>>> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
>>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
>>>>
>>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
>>>> <kanchana.p.sridhar@...el.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> This patch-series enables zswap_store() to accept and store mTHP
>>>>> folios. The most significant contribution in this series is from the
>>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
>>>>> migrated to v6.11-rc3 in patch 2/4 of this series.
>>>>>
>>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
>>>>>       https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
>>>> ryan.roberts@....com/T/#u
>>>>>
>>>>> Additionally, there is an attempt to modularize some of the functionality
>>>>> in zswap_store(), to make it more amenable to supporting any-order
>>>>> mTHPs. For instance, the function zswap_store_entry() stores a
>>>> zswap_entry
>>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
>>>>> delete all offsets corresponding to a higher order folio stored in zswap.
>>>>>
>>>>
>>>> Will this have any conflict with mTHP swap work? Especially with mTHP
>>>> swap-in and zswap writeback.
>>>>
>>>> My understanding is from zswap's perspective, the large folio is
>>>> broken apart into independent subpages, correct? What happens when
>> we
>>>> have partially written back mTHP (i.e some subpages are in zswap
>>>> still, whereas others are written back to swap). Would this
>>>> automatically prevent mTHP swapin?
>>>
>>> That is a good point. To begin with, this patch-series would make the default
>>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
>> with
>>> ZRAM. From zswap's perspective, imo this is a significant step forward
>> towards
>>> realizing cold memory storage with mTHP folios. However, it is only a
>> starting
>>> point that makes the behavior uniform across zswap/zram. Initially,
>> workloads
>>> would see a one-time benefit with reclaim being able to swapout mTHP
>>> folios without splitting, to zswap. If the mTHPs were cold memory, then we
>>> would have derived latency gains towards memory savings (with zswap).
>>>
>>> However, if the mTHP were part of "not so cold" memory, this would result
>>> in a one-way mTHP conversion to 4K folios. Depending on workloads and
>> their
>>> access patterns, we could either see individual 4K folios being swapped in,
>>> or entire chunks if not the entire (original) mTHP needing to be swapped in.
>>>
>>> It should be noted that this is more of a performance vs. cold memory
>>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin
>> and
>>> writeback policy. Different workloads could require different policies.
>> However,
>>> even though this patch is only a starting point, it is still functionally correct
>>> by being equivalent to zram-mTHP, and compatible with the rest of mm and
>>> swap as far as mTHP. Another important functionality/data consistency
>> decision
>>> I made in this patch series is error handling during zswap_store() of mTHP:
>>> in case of any errors, all swap offsets for the mTHP are deleted from the
>>> zswap xarray/zpool, since we know that the mTHP will now have to be
>> stored
>>> in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
>>> or entirely not stored in zswap.
>>>
>>> To answer your question, we would need to come up with what the
>> semantics
>>> would need to be for zswap zpool storage granularity, swapin granularity,
>>> readahead granularity and writeback wrt mTHP and how the overall swap
>>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
>> order
>>> folios during swapout. Once we have a good understanding of these policies,
>>> we could implement them in zswap. Alternately, develop an abstraction that
>> is
>>> one level above zswap/zram and makes things easier and shareable
>> between
>>> zswap and zram. By this, I mean fundamental assumptions such as
>> consecutive
>>> swap offsets (for instance). To some extent, this implies that an mTHP as a
>>> swap entity is defined by consecutiveness of swap offsets. Maybe the policy
>>> to keep mTHPs in the system over extended duration might be to assemble
>>> them dynamically based on swapin_readahead() decisions (which is based
>> on
>>> workload access patterns). In other words, mTHPs could be a useful
>> abstraction
>>> that can be static or even dynamic based on working set characteristics, and
>>> cold memory preservation. This is quite a complex topic imho.
>>>
>>> As we know, Barry Song and Chuanhua Han have started the discussion on
>>> this in their zram mTHP swapin series [1].
>>
>> Yeah I'm a bit more concerned with the correctness aspect. As long as
>> it's not buggy, then we can implement mTHP zswapout first, and force
>> individual subpage (z)swapin for now (since we cannot control
>> writeback from writing individual subpages).
> 
> Absolutely, this sounds like the way to go!
> 
>>
>> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
>> we go along.
> 
> Sounds great :)
> 
>>
>> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
>> not working properly... Let me manually add him in - please include
>> him in future submission and responses, as he is also a zswap reviewer
>> :)
> 
> I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
> Chengming in future submissions and responses :)

Maybe a little late for the party, will take a look ASAP.
It's an interesting and great work.

Thanks!

> 
>>
>> Also cc-ing Usama who is interested in this work.
> 
> Sounds great.
> 
> Thanks,
> Kanchana
> 
>>
>>>
>>> [1] https://lore.kernel.org/all/20240821074541.516249-3-
>> hanchuanhua@...o.com/T/#u
>>>
>>> Thanks,
>>> Kanchana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ