[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1729574046-3392-1-git-send-email-mengensun@tencent.com>
Date: Tue, 22 Oct 2024 13:14:06 +0800
From: MengEn Sun <mengensun88@...il.com>
To: ying.huang@...el.com
Cc: akpm@...ux-foundation.org,
alexjlzheng@...cent.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
mengensun88@...il.com,
mengensun@...cent.com
Subject: Re: [PATCH linux-mm v2] mm: make pcp_decay_high working better with NOHZ full
Thank you for your suggestion. I understand and am ready to make
some changes
>
> Have verified the issue with some test? If not, I suggest you to do
> that.
>
I have conducted tests:
Applying this patch or not does not have a significant impact on the test results.
perhaps my testing was not thorough enough. #^_^
But, the logic of the code is like following:
CPU0 CPUx
---- -----
T0: vmstat_work is pending
T1: vmstat_shepherd
check vmstat_work
and do nothing
T2: vmstat_work is in unpending state.
T3: alloc many pages
T4: free all the pages allocated at T3
T5: entry NOHZ, flushing all zonestats
and nodestats
T6: next vmstat_shepherd fired
In my opinion, there are indeed some issues. I'm not sure if there's
something I haven't understood?
By the way,
There are two other questions for me:
Q1:
Vmstat_work is a **deferreable work** So, It may be delayed for a long time
by NOHZ. As a result, "vmstat_update() may not be executed once every
second in the above scenario. Therefore, I'm not sure if using a deferrable
work to reduce pcp->high is appropriate. In my tests, if I don't use
deferrable work, it takes about a minute to reduce high to high_min, but
using deferrable work may take several minutes to reduce high to high_min.
Q2:
On a big machine, for example, with 1TB of memory, the default maximum
memory on PCP can be 1TB * 0.125.
This portion of memory is not accounted for in MemFree in /proc/meminfo.
Users can see this portion of memory from /proc/zoneinfo, but the memory
reported by the `free` command is reduced.
can we include the PCP memory in the MemFree statistic in /proc/meminfo?
> > While, This seems to be fine:
> > - if freeing and allocating memory occur later, it may the
> > high_max may be adjust automatically
> > - If memory is tight, the memory reclamation process will
> > release the pcp
>
> This could be a real issue for me.
Thanks, I will test more carefully for those issue
>
> > Whatever, we make vmstat_shepherd to checking whether we need
> > decay pcp high_max, and fire pcp_decay_high early if we need.
> >
> > Fixes: 51a755c56dc0 ("mm: tune PCP high automatically")
> > Reviewed-by: Jinliang Zheng <alexjlzheng@...cent.com>
> > Signed-off-by: MengEn Sun <mengensun@...cent.com>
> > ---
> > changelog:
> > v1: https://lore.kernel.org/lkml/20241012154328.015f57635566485ad60712f3@linux-foundation.org/T/#t
> > v2: Make the commit message clearer by adding some comments.
> > ---
> > mm/vmstat.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 1917c034c045..07b494b06872 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -2024,8 +2024,17 @@ static bool need_update(int cpu)
> >
> > for_each_populated_zone(zone) {
> > struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
> > + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
> > struct per_cpu_nodestat *n;
> >
> > + /* per_cpu_nodestats and per_cpu_zonestats maybe flush when cpu
> > + * entering NOHZ full, see quiet_vmstat. so, we check pcp
> > + * high_{min,max} to determine whether it is necessary to run
> > + * decay_pcp_high on the corresponding CPU
> > + */
>
> Please follow the comments coding style.
>
> /*
> * comments line 1
> * comments line 2
> */
>
Thank you for your suggestion. I understand and am ready to make
some changes
> > + if (pcp->high_max > pcp->high_min)
> > + return true;
> > +
>
> We don't tune pcp->high_max/min in fact. Instead, we tune pcp->high.
> Your code may make need_update() return true in most cases.
You are right, using high_max is incorrect. May i use pcp->high > pcp->high_min?
>
> > /*
> > * The fast way of checking if there are any vmstat diffs.
> > */
>
> --
> Best Regards,
> Huang, Ying
Best Regards,
MengEn, Sun
Powered by blists - more mailing lists