lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACSyD1OAj6W7FkSPwE3iY7UsJmH=d3TtvwQN0mtezKbznnaLUQ@mail.gmail.com>
Date: Tue, 23 Jul 2024 13:07:31 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: peterz@...radead.org, mgorman@...e.de, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [External] Re: [PATCH] mm/numa_balancing: Fix the memory
 thrashing problem in the single-threaded process

On Tue, Jul 23, 2024 at 11:39 AM Huang, Ying <ying.huang@...el.com> wrote:
>
> Zhongkun He <hezhongkun.hzk@...edance.com> writes:
>
> > I found a problem in my test machine that the memory of a process is
> > repeatedly migrated between two nodes and does not stop.
> >
> > 1.Test step and the machines.
> > ------------
> > VM machine: 4 numa nodes and 10GB per node.
> >
> > stress --vm 1 --vm-bytes 12g --vm-keep
> >
> > The info of numa stat:
> > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done
> > anon N0=98304 N1=0 N2=10250747904 N3=2634334208
> > anon N0=98304 N1=0 N2=10250747904 N3=2634334208
> > anon N0=98304 N1=0 N2=9937256448 N3=2947825664
> > anon N0=98304 N1=0 N2=8863514624 N3=4021567488
> > anon N0=98304 N1=0 N2=7789772800 N3=5095309312
> > anon N0=98304 N1=0 N2=6716030976 N3=6169051136
> > anon N0=98304 N1=0 N2=5642289152 N3=7242792960
> > anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> > anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> > anon N0=98304 N1=0 N2=4837007360 N3=8048074752
> > anon N0=98304 N1=0 N2=3763265536 N3=9121816576
> > anon N0=98304 N1=0 N2=2689523712 N3=10195558400
> > anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> > anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> > anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> > anon N0=98304 N1=0 N2=3320455168 N3=9564626944
> > anon N0=98304 N1=0 N2=4394196992 N3=8490885120
> > anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> > anon N0=98304 N1=0 N2=6174195712 N3=6710886400
> > anon N0=98304 N1=0 N2=7247937536 N3=5637144576
> > anon N0=98304 N1=0 N2=8321679360 N3=4563402752
> > anon N0=98304 N1=0 N2=9395421184 N3=3489660928
> > anon N0=98304 N1=0 N2=10247872512 N3=2637209600
> > anon N0=98304 N1=0 N2=10247872512 N3=2637209600
> >
> > 2. Root cause:
> > Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded
> > PTEs which are on the right node")the PTE of local pages will not be
> > changed in change_pte_range() for single-threaded process, so no
> > page_faults information will be generated in do_numa_page(). If a
> > single-threaded process has memory on another node, it will
> > unconditionally migrate all of it's local memory to that node,
> > even if the remote node has only one page.
> >
> > So, let's fix it. The memory of single-threaded process should follow
> > the cpu, not the numa faults info in order to avoid memory thrashing.
>
> Show the test results (numa stats) of the fixed kernel?
>

After a long time of testing, there is no memory thrashing
from the beginning.

while :;do cat memory.numa_stat | grep -w anon;sleep 5;done
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0
anon N0=2548117504 N1=10336903168 N2=139264 N3=0

I will add it to the commit in the next version.

> > Signed-off-by: Zhongkun He <hezhongkun.hzk@...edance.com>
> > ---
> >  kernel/sched/fair.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 24dda708b699..d7cbbda568fb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p)
> >               numa_group_count_active_nodes(ng);
> >               spin_unlock_irq(group_lock);
> >               max_nid = preferred_group_nid(p, max_nid);
> > +     } else if (atomic_read(&p->mm->mm_users) == 1) {
> > +             /*
> > +              * The memory of a single-threaded process should
> > +              * follow the CPU in order to avoid memory thrashing.
> > +              */
> > +             max_nid = numa_node_id();
> >       }
> >
> >       if (max_faults) {
>
> The change looks reasonable for me, Thanks!
>
> Acked-by: "Huang, Ying" <ying.huang@...el.com>
>

Thanks.

> --
> Best Regards,
> Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ