lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 25 Nov 2023 10:22:10 +0800
From:   Kefeng Wang <>
To:     Dmytro Maluka <>,
        Michal Hocko <>
CC:     Liu Shixin <>,
        Andrew Morton <>,
        Greg Kroah-Hartman <>,
        huang ying <>,
        Aaron Lu <>,
        Dave Hansen <>,
        Jesper Dangaard Brouer <>,
        Vlastimil Babka <>,
        Kemi Wang <>,
        <>, <>
Subject: Re: [PATCH -next v2] mm, proc: collect percpu free pages into the
 free pages

On 2023/11/25 1:54, Dmytro Maluka wrote:
> On Tue, Aug 23, 2022 at 03:37:52PM +0200, Michal Hocko wrote:
>> On Tue 23-08-22 20:46:43, Liu Shixin wrote:
>>> On 2022/8/23 15:50, Michal Hocko wrote:
>>>> On Mon 22-08-22 14:12:07, Andrew Morton wrote:
>>>>> On Mon, 22 Aug 2022 11:33:54 +0800 Liu Shixin <> wrote:
>>>>>> The page on pcplist could be used, but not counted into memory free or
>>>>>> avaliable, and pcp_free is only showed by show_mem() for now. Since commit
>>>>>> d8a759b57035 ("mm, page_alloc: double zone's batchsize"), there is a
>>>>>> significant decrease in the display of free memory, with a large number
>>>>>> of cpus and zones, the number of pages in the percpu list can be very
>>>>>> large, so it is better to let user to know the pcp count.
>>>>>> On a machine with 3 zones and 72 CPUs. Before commit d8a759b57035, the
>>>>>> maximum amount of pages in the pcp lists was theoretically 162MB(3*72*768KB).
>>>>>> After the patch, the lists can hold 324MB. It has been observed to be 114MB
>>>>>> in the idle state after system startup in practice(increased 80 MB).
>>>>> Seems reasonable.
>>>> I have asked in the previous incarnation of the patch but haven't really
>>>> received any answer[1]. Is this a _real_ problem? The absolute amount of
>>>> memory could be perceived as a lot but is this really noticeable wrt
>>>> overall memory on those systems?
> Let me provide some other numbers, from the desktop side. On a low-end
> chromebook with 4GB RAM and a dual-core CPU, after commit b92ca18e8ca5
> (mm/page_alloc: disassociate the pcp->high from pcp->batch) the max
> amount of PCP pages increased 56x times: from 2.9MB (1.45 per CPU) to
> 165MB (82.5MB per CPU).
> On such a system, memory pressure conditions are not a rare occurrence,
> so several dozen MB make a lot of difference.

And with mm: PCP high auto-tuning merged in v6.7, the pcp could be more 
bigger than before.

> (The reason it increased so much is because it now corresponds to the
> low watermark, which is 165MB. And the low watermark, in turn, is so
> high because of khugepaged, which bumps up min_free_kbytes to 132MB
> regardless of the total amount of memory.)
>>> This may not obvious when the memory is sufficient. However, as products monitor the
>>> memory to plan it. The change has caused warning.
>> Is it possible that the said monitor is over sensitive and looking at
>> wrong numbers? Overall free memory doesn't really tell much TBH.
>> MemAvailable is a very rough estimation as well.
>> In reality what really matters much more is whether the memory is
>> readily available when it is required and none of MemFree/MemAvailable
>> gives you that information in general case.
>>> We have also considered using /proc/zoneinfo to calculate the total
>>> number of pcplists. However, we think it is more appropriate to add
>>> the total number of pcplists to free and available pages. After all,
>>> this part is also free pages.
>> Those free pages are not generally available as exaplained. They are
>> available to a specific CPU, drained under memory pressure and other
>> events but still there is no guarantee a specific process can harvest
>> that memory because the pcp caches are replenished all the time.
>> So in a sense it is a semi-hidden memory.
> I was intuitively assuming that per-CPU pages should be always available
> for allocation without resorting to paging out allocated pages (and thus
> it should be non-controversially a good idea to include per-CPU pages in
> MemFree, to make it more accurate).
> But looking at the code in __alloc_pages() and around, I see you are
> right: we don't try draining other CPUs' PCP lists *before* resorting to
> direct reclaim, compaction etc.
> BTW, why not? Shouldn't draining PCP lists be cheaper than pageout() in
> any case?

Same question here, could drain pcp before direct reclaim?

>> That being said, I am still not convinced this is actually going to help
>> all that much. You will see a slightly different numbers which do not
>> tell much one way or another and if the sole reason for tweaking these
>> numbers is that some monitor is complaining because X became X-epsilon
>> then this sounds like a weak justification to me. That epsilon happens
>> all the time because there are quite some hidden caches that are
>> released under memory pressure. I am not sure it is maintainable to
>> consider each one of them and pretend that MemFree/MemAvailable is
>> somehow precise. It has never been and likely never will be.
>> -- 
>> Michal Hocko
>> SUSE Labs

Powered by blists - more mailing lists