lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <80c53bfc-891d-af09-9b69-2954d7fe625e@linux.alibaba.com>
Date:   Wed, 12 Apr 2023 11:15:13 +0800
From:   Baolin Wang <baolin.wang@...ux.alibaba.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     akpm@...ux-foundation.org, osalvador@...e.de, vbabka@...e.cz,
        william.lam@...edance.com, mike.kravetz@...cle.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm: compaction: consider the number of scanning
 compound pages in isolate fail path



On 4/5/2023 6:31 PM, Mel Gorman wrote:
> On Thu, Mar 16, 2023 at 07:06:46PM +0800, Baolin Wang wrote:
>> The commit b717d6b93b54 ("mm: compaction: include compound page count
>> for scanning in pageblock isolation") had added compound page statistics
>> for scanning in pageblock isolation, to make sure the number of scanned
>> pages are always larger than the number of isolated pages when isolating
>> mirgratable or free pageblock.
>>
>> However, when failed to isolate the pages when scanning the mirgratable or
>> free pageblock, the isolation failure path did not consider the scanning
>> statistics of the compound pages, which can show the incorrect number of
>> scanned pages in tracepoints or the vmstats to make people confusing about
>> the page scanning pressure in memory compaction.
>>
>> Thus we should take into account the number of scanning pages when failed
>> to isolate the compound pages to make the statistics accurate.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
> 
> Acked-by: Mel Gorman <mgorman@...hsingularity.net>

Thanks Mel.

> 
> However, the patch highlights weakeness in the tracepoints and how
> useful they are.
> 
> Minimally, I think that the change might be misleading when comparing
> tracepoints across kernel versions as it'll be necessary to check the exact
> meaning of nr_scanned for a given kernel version. That's not a killer problem
> as such, just a hazard if using an analysis tool comparing kernel versions.
> 
> As an example, consider this
> 
>                  if (PageCompound(page)) {
>                          const unsigned int order = compound_order(page);
>                                  
>                          if (likely(order < MAX_ORDER)) {
>                                  blockpfn += (1UL << order) - 1;
>                                  cursor += (1UL << order) - 1;
> 				nr_scanned += compound_nr(page) - 1;		 <<< patch adds
>                          }
>                          goto isolate_fail;
>                  }
> 
> Only the head page is "scanned", the tail pages are not scanned so
> accounting for them as "scanned" is not an accurate reflection of the
> amount of work done. Isolation is different because the compound pages
> isolated is a prediction of how much work is necessary to migrate that
> page as it's obviously more work to copy 2M of data than 4K. The migrated
> pages combined with isolation then can measure efficiency of isolation
> vs migration although imperfectly as isolation is a span while migration
> probably fails at the head page.
> 
> The same applies when skipping buddies, the tail pages are not scanned
> so skipping them is not additional work.
> 
> Everything depends on what the tracepoint is being used for. If it's a
> measure of work done, then accounting for skipped tail pages over-estimates
> the amount of work. However, if the intent is to measure efficiency of
> isolation vs migration then the "span" scanned is more useful.

Yes, we are more concered about the efficiency of isolation vs migration.

> None of this kills the patch, it only notes that the tracepoints as-is
> probably cannot answer all relevant questions, most of which are only
> relevant when making a modification to compaction in general. The patch
> means that an unspecified pressure metric can be derived (maybe interesting
> to sysadmins) but loses a metric about time spent on scanning (maybe
> interesting to developers writing a patch). Of those concerns, sysadmins
> are probably more common so the patch is acceptable but some care will be
> need if modifying the tracepoints further if it enables one type of
> analysis at the cost of another.

I learned, and thanks for your excellent explaination.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ