[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <80c53bfc-891d-af09-9b69-2954d7fe625e@linux.alibaba.com>
Date: Wed, 12 Apr 2023 11:15:13 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: akpm@...ux-foundation.org, osalvador@...e.de, vbabka@...e.cz,
william.lam@...edance.com, mike.kravetz@...cle.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm: compaction: consider the number of scanning
compound pages in isolate fail path
On 4/5/2023 6:31 PM, Mel Gorman wrote:
> On Thu, Mar 16, 2023 at 07:06:46PM +0800, Baolin Wang wrote:
>> The commit b717d6b93b54 ("mm: compaction: include compound page count
>> for scanning in pageblock isolation") had added compound page statistics
>> for scanning in pageblock isolation, to make sure the number of scanned
>> pages are always larger than the number of isolated pages when isolating
>> mirgratable or free pageblock.
>>
>> However, when failed to isolate the pages when scanning the mirgratable or
>> free pageblock, the isolation failure path did not consider the scanning
>> statistics of the compound pages, which can show the incorrect number of
>> scanned pages in tracepoints or the vmstats to make people confusing about
>> the page scanning pressure in memory compaction.
>>
>> Thus we should take into account the number of scanning pages when failed
>> to isolate the compound pages to make the statistics accurate.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>
> Acked-by: Mel Gorman <mgorman@...hsingularity.net>
Thanks Mel.
>
> However, the patch highlights weakeness in the tracepoints and how
> useful they are.
>
> Minimally, I think that the change might be misleading when comparing
> tracepoints across kernel versions as it'll be necessary to check the exact
> meaning of nr_scanned for a given kernel version. That's not a killer problem
> as such, just a hazard if using an analysis tool comparing kernel versions.
>
> As an example, consider this
>
> if (PageCompound(page)) {
> const unsigned int order = compound_order(page);
>
> if (likely(order < MAX_ORDER)) {
> blockpfn += (1UL << order) - 1;
> cursor += (1UL << order) - 1;
> nr_scanned += compound_nr(page) - 1; <<< patch adds
> }
> goto isolate_fail;
> }
>
> Only the head page is "scanned", the tail pages are not scanned so
> accounting for them as "scanned" is not an accurate reflection of the
> amount of work done. Isolation is different because the compound pages
> isolated is a prediction of how much work is necessary to migrate that
> page as it's obviously more work to copy 2M of data than 4K. The migrated
> pages combined with isolation then can measure efficiency of isolation
> vs migration although imperfectly as isolation is a span while migration
> probably fails at the head page.
>
> The same applies when skipping buddies, the tail pages are not scanned
> so skipping them is not additional work.
>
> Everything depends on what the tracepoint is being used for. If it's a
> measure of work done, then accounting for skipped tail pages over-estimates
> the amount of work. However, if the intent is to measure efficiency of
> isolation vs migration then the "span" scanned is more useful.
Yes, we are more concered about the efficiency of isolation vs migration.
> None of this kills the patch, it only notes that the tracepoints as-is
> probably cannot answer all relevant questions, most of which are only
> relevant when making a modification to compaction in general. The patch
> means that an unspecified pressure metric can be derived (maybe interesting
> to sysadmins) but loses a metric about time spent on scanning (maybe
> interesting to developers writing a patch). Of those concerns, sysadmins
> are probably more common so the patch is acceptable but some care will be
> need if modifying the tracepoints further if it enables one type of
> analysis at the cost of another.
I learned, and thanks for your excellent explaination.
Powered by blists - more mailing lists