linux-kernel - Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wmmw6w9e.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Tue, 11 Jun 2024 10:36:29 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Chris Li <chrisl@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,  Kairui Song
 <kasong@...cent.com>,  Ryan Roberts <ryan.roberts@....com>,
  linux-kernel@...r.kernel.org,  linux-mm@...ck.org,  Barry Song
 <baohua@...nel.org>
Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster
 order

Chris Li <chrisl@...nel.org> writes:

> On Wed, Jun 5, 2024 at 7:02 PM Huang, Ying <ying.huang@...el.com> wrote:
>>
>> Chris Li <chrisl@...nel.org> writes:
>>
>
>> > In the page allocation side, we have the hugetlbfs which reserve some
>> > memory for high order pages.
>> > We should have similar things to allow reserve some high order swap
>> > entries without getting polluted by low order one.
>>
>> TBH, I don't like the idea of high order swap entries reservation.
> May I know more if you don't like the idea? I understand this can be
> controversial, because previously we like to take the THP as the best
> effort approach. If there is some reason we can't make THP, we use the
> order 0 as fall back.
>
> For discussion purpose, I want break it down to smaller steps:
>
> First, can we agree that the following usage case is reasonable:
> The usage case is that, as Barry has shown, zsmalloc compresses bigger
> size than 4K and can have both better compress ratio and CPU
> performance gain.
> https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/
>
> So the goal is to make THP/mTHP have some reasonable success rate
> running in the mix size swap allocation, after either low order or
> high order swap requests can overflow the swap file size. The allocate
> can still recover from that, after some swap entries got free.
>
> Please let me know if you think the above usage case and goal are not
> reasonable for the kernel.

I think that it's reasonable to improve the success rate of high-order
swap entries allocation.  I just think that it's hard to use the
reservation based method.  For example, how much should be reserved?
Why system OOM when there's still swap space available?  And so forth.
So, I prefer the transparent methods.  Just like THP vs. hugetlbfs.

>> that's really important for you, I think that it's better to design
>> something like hugetlbfs vs core mm, that is, be separated from the
>> normal swap subsystem as much as possible.
>
> I am giving hugetlbfs just to make the point using reservation, or
> isolation of the resource to prevent mixing fragmentation existing in
> core mm.
> I am not suggesting copying the hugetlbfs implementation to the swap
> system. Unlike hugetlbfs, the swap allocation is typically done from
> the kernel, it is transparent from the application. I don't think
> separate from the swap subsystem is a good way to go.
>
> This comes down to why you don't like the reservation. e.g. if we use
> two swapfile, one swapfile is purely allocate for high order, would
> that be better?

Sorry, my words weren't accurate.  Personally, I just think that it's
better to make reservation related code not too intrusive.

And, before reservation, we need to consider something else firstly.
Whether is it generally good to swap-in with swap-out order?  Should we
consider memory wastage too?  One static policy doesn't fit all, we may
need either a dynamic policy, or make policy configurable.

In general, I think that we need to do this step by step.

>> >> > Do you see another way to protect the high order cluster polluted by
>> >> > lower order one?
>> >>
>> >> If we use high-order page allocation as reference, we need something
>> >> like compaction to guarantee high-order allocation finally.  But we are
>> >> too far from that.
>> >
>> > We should consider reservation for high-order swap entry allocation
>> > similar to hugetlbfs for memory.
>> > Swap compaction will be very complicated because it needs to scan the
>> > PTE to migrate the swap entry. It might be easier to support folio
>> > write out compound discontiguous swap entries. That is another way to
>> > address the fragmentation issue. We are also too far from that as
>> > right now.
>>
>> That's not easy to write out compound discontiguous swap entries too.
>> For example, how to put folios in swap cache?
>
> I propose the idea in the recent LSF/MM discussion, the last few
> slides are for the discontiguous swap and it has the discontiguous
> entries in swap cache.
> https://drive.google.com/file/d/10wN4WgEekaiTDiAx2AND97CYLgfDJXAD/view
>
> Agree it is not an easy change. The cache cache would have to change
> the assumption all offset are contiguous.
> For swap, we kind of have some in memory data associated with per
> offset already, so it might provide an opportunity to combine the
> offset related data structure for swap together. Another alternative
> might be using xarray without the multi entry property. , just treat
> each offset like a single entry. I haven't dug deep into this
> direction yet.

Thanks!  I will study your idea.

> We can have more discussion, maybe arrange an upstream alignment
> meeting if there is interest.

Sure.

--
Best Regards,
Huang, Ying