linux-kernel - Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e9061a5c-bbef-e818-94f7-95e21a73a948@intel.com>
Date:   Mon, 24 May 2021 08:52:02 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Linux-MM <linux-mm@...ck.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug
 events

On 5/24/21 2:07 AM, Mel Gorman wrote:
> On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote:
>> On 5/21/21 3:28 AM, Mel Gorman wrote:
>>> The PCP high watermark is based on the number of online CPUs so the
>>> watermarks must be adjusted during CPU hotplug. At the time of
>>> hot-remove, the number of online CPUs is already adjusted but during
>>> hot-add, a delta needs to be applied to update PCP to the correct
>>> value. After this patch is applied, the high watermarks are adjusted
>>> correctly.
>>>
>>>   # grep high: /proc/zoneinfo  | tail -1
>>>               high:  649
>>>   # echo 0 > /sys/devices/system/cpu/cpu4/online
>>>   # grep high: /proc/zoneinfo  | tail -1
>>>               high:  664
>>>   # echo 1 > /sys/devices/system/cpu/cpu4/online
>>>   # grep high: /proc/zoneinfo  | tail -1
>>>               high:  649
>> This is actually a comment more about the previous patch, but it doesn't
>> really become apparent until the example above.
>>
>> In your example, you mentioned increased exit() performance by using
>> "vm.percpu_pagelist_fraction to increase the pcp->high value".  That's
>> presumably because of the increased batching effects and fewer lock
>> acquisitions.
>>
> Yes
> 
>> But, logically, doesn't that mean that, the more CPUs you have in a
>> node, the *higher* you want pcp->high to be?  If we took this to the
>> extreme and had an absurd number of CPUs in a node, we could end up with
>> a too-small pcp->high value.
>>
> I see your point but I don't think increasing pcp->high for larger
> numbers of CPUs is the right answer because then reclaim can be
> triggered simply because too many PCPs have pages.
> 
> To address your point requires much deeper surgery.
...
> There is value to doing something like this but it's beyond what this
> series is trying to do and doing the work without introducing regressions
> would be very difficult.

Agreed, such a solution is outside of the scope of what this set is
trying to do.

It would be nice to touch on this counter-intuitive property in the
changelog, and *maybe* add a WARN_ON_ONCE() if we hit an edge case.
Maybe WARN_ON_ONCE() if pcp->high gets below pcp->batch*SOMETHING.