[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31718e40-51e9-c14d-f7f5-526ba26d0db8@huawei.com>
Date: Fri, 19 Aug 2022 17:53:27 +0800
From: Liu Shixin <liushixin2@...wei.com>
To: Aaron Lu <aaron.lu@...el.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>
CC: huang ying <huang.ying.caritas@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
Huang Ying <ying.huang@...el.com>
Subject: Re: [PATCH RFC] mm, proc: add PcpFree to meminfo
On 2022/8/19 15:40, Aaron Lu wrote:
> On Tue, Aug 16, 2022 at 05:24:07PM +0800, Kefeng Wang wrote:
>> On 2022/8/16 16:48, huang ying wrote:
>>> On Tue, Aug 16, 2022 at 4:38 PM Kefeng Wang <wangkefeng.wang@...wei.com> wrote:
>>>> From: Liu Shixin <liushixin2@...wei.com>
>>>>
>>>> The page on pcplist could be used, but not counted into memory free or
>>>> avaliable, and pcp_free is only showed by show_mem(). Since commit
>>>> d8a759b57035 ("mm, page_alloc: double zone's batchsize"), there is a
>>>> significant decrease in the display of free memory, with a large number
>>>> of cpus and nodes, the number of pages in the percpu list can be very
>>>> large, so it is better to let user to know the pcp count.
>>> Can you show some data?
>> 80M+ with 72cpus/2node
>>
> 80M+ for a 2 node system doesn't sound like a significant number.
>
>>> Another choice is to count PCP free pages in MemFree. Is that OK for
>>> your use case too?
>> Yes, the user will make policy according to MemFree, we think count PCP free
>> pages
>>
>> in MemFree is better, but don't know whether it is right way.
>>
> Is there a real problem where user makes a sub-optimal policy due to the
> not accounted 80M+ free memory?
I need to explain that 80M+ is the increased after patch d8a759b57035. Actually in my test,
the pcplist is about 114M after system startup, and in high load scenarios, the pcplist memory
can reach 300M+.
The downstream has sensed the memory change after the kernel is updated, which has an
actual impact on them. That's why I sent this patch to ask if should count this
part of memory.
> Counting PCP pages as free seems natural, since they are indeed free
> pages. One concern is, there might be much more calls of
> __mod_zone_freepage_state() if you do free page counting for PCP pages,
> not sure if that would hurt performance. Also, you will need to
> differentiate in __free_one_page() whether counting for free pages are
> still needed since some pages are freed through PCP(and thus already
> counted) while some are not.
I prefer to add this part of memory into free only when calculating MemFree and
MemAvailable, without modifying other statistics to avoid directly hurt performance
or cause other performance problems. How about this way?
> BTW, since commit b92ca18e8ca59("mm/page_alloc: disassociate the pcp->high
> from pcp->batch"), pcp size is no longer associated with batch size. Is
> it that you are testing on an older kernel?
I met the problem on stable-5.10. I think this patch can't fix the problem I met. Futher,
this series of patch make pcp->high to be greater. So the problem becomes even more
acute in mainline.
Thanks,
>
> Thanks,
> Aaron
>
>>>> Signed-off-by: Liu Shixin <liushixin2@...wei.com>
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@...wei.com>
>>>> ---
>>>> drivers/base/node.c | 14 +++++++++++++-
>>>> fs/proc/meminfo.c | 9 +++++++++
>>>> 2 files changed, 22 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>>>> index eb0f43784c2b..846864e45db6 100644
>>>> --- a/drivers/base/node.c
>>>> +++ b/drivers/base/node.c
>>>> @@ -375,6 +375,9 @@ static ssize_t node_read_meminfo(struct device *dev,
>>>> struct sysinfo i;
>>>> unsigned long sreclaimable, sunreclaimable;
>>>> unsigned long swapcached = 0;
>>>> + unsigned long free_pcp = 0;
>>>> + struct zone *zone;
>>>> + int cpu;
>>>>
>>>> si_meminfo_node(&i, nid);
>>>> sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
>>>> @@ -382,9 +385,17 @@ static ssize_t node_read_meminfo(struct device *dev,
>>>> #ifdef CONFIG_SWAP
>>>> swapcached = node_page_state_pages(pgdat, NR_SWAPCACHE);
>>>> #endif
>>>> + for_each_populated_zone(zone) {
>>>> + if (zone_to_nid(zone) != nid)
>>>> + continue;
>>>> + for_each_online_cpu(cpu)
>>>> + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
>>>> + }
>>>> +
>>>> len = sysfs_emit_at(buf, len,
>>>> "Node %d MemTotal: %8lu kB\n"
>>>> "Node %d MemFree: %8lu kB\n"
>>>> + "Node %d PcpFree: %8lu kB\n"
>>>> "Node %d MemUsed: %8lu kB\n"
>>>> "Node %d SwapCached: %8lu kB\n"
>>>> "Node %d Active: %8lu kB\n"
>>>> @@ -397,7 +408,8 @@ static ssize_t node_read_meminfo(struct device *dev,
>>>> "Node %d Mlocked: %8lu kB\n",
>>>> nid, K(i.totalram),
>>>> nid, K(i.freeram),
>>>> - nid, K(i.totalram - i.freeram),
>>>> + nid, K(free_pcp),
>>>> + nid, K(i.totalram - i.freeram - free_pcp),
>>>> nid, K(swapcached),
>>>> nid, K(node_page_state(pgdat, NR_ACTIVE_ANON) +
>>>> node_page_state(pgdat, NR_ACTIVE_FILE)),
>>>> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
>>>> index 6e89f0e2fd20..672c784dfc8a 100644
>>>> --- a/fs/proc/meminfo.c
>>>> +++ b/fs/proc/meminfo.c
>>>> @@ -38,6 +38,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>>>> unsigned long pages[NR_LRU_LISTS];
>>>> unsigned long sreclaimable, sunreclaim;
>>>> int lru;
>>>> + unsigned long free_pcp = 0;
>>>> + struct zone *zone;
>>>> + int cpu;
>>>>
>>>> si_meminfo(&i);
>>>> si_swapinfo(&i);
>>>> @@ -55,8 +58,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>>>> sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B);
>>>> sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B);
>>>>
>>>> + for_each_populated_zone(zone) {
>>>> + for_each_online_cpu(cpu)
>>>> + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count;
>>>> + }
>>>> +
>>>> show_val_kb(m, "MemTotal: ", i.totalram);
>>>> show_val_kb(m, "MemFree: ", i.freeram);
>>>> + show_val_kb(m, "PcpFree: ", free_pcp);
>>>> show_val_kb(m, "MemAvailable: ", available);
>>>> show_val_kb(m, "Buffers: ", i.bufferram);
>>>> show_val_kb(m, "Cached: ", cached);
>>>> --
>>>> 2.35.3
>>>>
>>>>
>>> .
> .
>
Powered by blists - more mailing lists