[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o76k3dkt.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Fri, 26 Jul 2024 13:52:02 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Chris Li <chrisl@...nel.org>
Cc: Ryan Roberts <ryan.roberts@....com>, Andrew Morton
<akpm@...ux-foundation.org>, Kairui Song <kasong@...cent.com>, Hugh
Dickins <hughd@...gle.com>, Kalesh Singh <kaleshsingh@...gle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, Barry Song
<baohua@...nel.org>
Subject: Re: [PATCH v4 2/3] mm: swap: mTHP allocate swap entries from
nonfull list
Chris Li <chrisl@...nel.org> writes:
> On Thu, Jul 25, 2024 at 7:07 PM Huang, Ying <ying.huang@...el.com> wrote:
>> > If the freeing of swap entry is random distribution. You need 16
>> > continuous swap entries free at the same time at aligned 16 base
>> > locations. The total number of order 4 free swap space add up together
>> > is much lower than the order 0 allocatable swap space.
>> > If having one entry free is 50% probability(swapfile half full), then
>> > having 16 swap entries is continually free is (0.5) EXP 16 = 1.5 E-5.
>> > If the swapfile is 80% full, that number drops to 6.5 E -12.
>>
>> This depends on workloads. Quite some workloads will show some degree
>> of spatial locality. For a workload with no spatial locality at all as
>> above, mTHP may be not a good choice at the first place.
>
> The fragmentation comes from the order 0 entry not from the mTHP. mTHP
> have their own valid usage case, and should be separate from how you
> use the order 0 entry. That is why I consider this kind of strategy
> only works on the lucky case. I would much prefer the strategy that
> can guarantee work not depend on luck.
It seems that you have some perfect solution. Will learn it when you
post it.
>> >> - Order-4 pages need to be swapped out, but no enough order-4 non-full
>> >> clusters available.
>> >
>> > Exactly.
>> >
>> >>
>> >> So, we need a way to migrate non-full clusters among orders to adjust to
>> >> the various situations automatically.
>> >
>> > There is no easy way to migrate swap entries to different locations.
>> > That is why I like to have discontiguous swap entries allocation for
>> > mTHP.
>>
>> We suggest to migrate non-full swap clsuters among different lists, not
>> swap entries.
>
> Then you have the down side of reducing the number of total high order
> clusters. By chance it is much easier to fragment the cluster than
> anti-fragment a cluster. The orders of clusters have a natural
> tendency to move down rather than move up, given long enough time of
> random access. It will likely run out of high order clusters in the
> long run if we don't have any separation of orders.
As my example above, you may have almost 0 high-order clusters forever.
So, your solution only works for very specific use cases. It's not a
general solution.
>> >> But yes, data is needed for any performance related change.
>>
>> BTW: I think non-full cluster isn't a good name. Partial cluster is
>> much better and follows the same convention as partial slab.
>
> I am not opposed to it. The only reason I hold off on the rename is
> because there are patches from Kairui I am testing depending on it.
> Let's finish up the V5 patch with the swap cache reclaim code path
> then do the renaming as one batch job. We actually have more than one
> list that has the clusters partially full. It helps reduce the repeat
> scan of the cluster that is not full but also not able to allocate
> swap entries for this order. Just the name of one of them as
> "partial" is not precise either. Because the other lists are also
> partially full. We'd better give them precise meaning systematically.
I don't think that it's hard to do a search/replace before the next
version.
--
Best Regards,
Huang, Ying
Powered by blists - more mailing lists