[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9f105d9-77ba-427c-9958-92710f70716b@arm.com>
Date: Wed, 19 Jun 2024 10:17:08 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: "Huang, Ying" <ying.huang@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>,
Kalesh Singh <kaleshsingh@...gle.com>, Barry Song <baohua@...nel.org>,
Hugh Dickins <hughd@...gle.com>, David Hildenbrand <david@...hat.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 0/5] Alternative mTHP swap allocator improvements
On 19/06/2024 08:19, Huang, Ying wrote:
> Hi, Ryan,
>
> Ryan Roberts <ryan.roberts@....com> writes:
>
>> Hi All,
>>
>> Chris has been doing great work at [1] to clean up my mess in the mTHP swap
>> entry allocator.
>
> I don't think the original behavior is something like mess. It's just
> the first step in the correct direction. It's straightforward and
> obviously correctly. Then, we can optimize it step by step with data to
> justify the increased complexity.
OK, perhaps I was over-egging it by calling it a "mess". What you're describing
was my initial opinion too, but I saw Andrew complaining that we shouldn't be
merging a feature if it doesn't work. This series fixes the problem in a minimal
way - if you ignore the last patch, which is really is just a performance
optimization and could be dropped.
If we can ultimately get Chris's series to 0% fallback like this one, and
everyone is happy with the current state for v6.10, then agreed - let's
concentrate on Chris's series for v6.11.
Thanks,
Ryan
>
>> But Barry posted a test program and results at [2] showing that
>> even with Chris's changes, there are still some fallbacks (around 5% - 25% in
>> some cases). I was interested in why that might be and ended up putting this PoC
>> patch set together to try to get a better understanding. This series ends up
>> achieving 0% fallback, even with small folios ("-s") enabled. I haven't done
>> much testing beyond that (yet) but thought it was worth posting on the strength
>> of that result alone.
>>
>> At a high level this works in a similar way to Chris's series; it marks a
>> cluster as being for a particular order and if a new cluster cannot be allocated
>> then it scans through the existing non-full clusters. But it does it by scanning
>> through the clusters rather than assembling them into a list. Cluster flags are
>> used to mark clusters that have been scanned and are known not to have enough
>> contiguous space, so the efficiency should be similar in practice.
>>
>> Because its not based around a linked list, there is less churn and I'm
>> wondering if this is perhaps easier to review and potentially even get into
>> v6.10-rcX to fix up what's already there, rather than having to wait until v6.11
>> for Chris's series? I know Chris has a larger roadmap of improvements, so at
>> best I see this as a tactical fix that will ultimately be superseeded by Chris's
>> work.
>
> I don't think we need any mTHP swap entry allocation optimization to go
> into v6.10-rcX. There's no functionality or performance regression.
> Per my understanding, we merge optimization when it's ready.
>
> Hi, Andrew,
>
> Please correct me if you don't agree.
>
> [snip]
>
> --
> Best Regards,
> Huang, Ying
Powered by blists - more mailing lists