[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbWTALuB7-jH5ZxCDAy_Dxeh70Y4=eYE5Mixr2qW+Z9sVA@mail.gmail.com>
Date: Sun, 4 Aug 2024 12:11:06 -0700
From: Chris Li <chrisl@...nel.org>
To: Kairui Song <ryncsn@...il.com>
Cc: Ge Yang <yangge1116@....com>, Yu Zhao <yuzhao@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
Barry Song <21cnbao@...il.com>, David Hildenbrand <david@...hat.com>, baolin.wang@...ux.alibaba.com,
liuzixing@...on.cn, Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH V2] mm/gup: Clear the LRU flag of a page before adding to
LRU batch
On Sun, Aug 4, 2024 at 10:51 AM Chris Li <chrisl@...nel.org> wrote:
>
> On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@...il.com> wrote:
> >
> > > Hi Yu, I tested your patch, on my system, the OOM still exists (96
> > > core and 256G RAM), test memcg is limited to 512M and 32 thread ().
> > >
> > > And I found the OOM seems irrelevant to either your patch or Ge's
> > > patch. (it may changed the OOM chance slight though)
> > >
> > > After the very quick OOM (it failed to untar the linux source code),
> > > checking lru_gen_full:
> > > memcg 47 /build-kernel-tmpfs
> > > node 0
> > > 442 1691 29405 0
> > > 0 0r 0e 0p 57r
> > > 617e 0p
> > > 1 0r 0e 0p 0r
> > > 4e 0p
> > > 2 0r 0e 0p 0r
> > > 0e 0p
> > > 3 0r 0e 0p 0r
> > > 0e 0p
> > > 0 0 0 0
> > > 0 0
> > > 443 1683 57748 832
> > > 0 0 0 0 0
> > > 0 0
> > > 1 0 0 0 0
> > > 0 0
> > > 2 0 0 0 0
> > > 0 0
> > > 3 0 0 0 0
> > > 0 0
> > > 0 0 0 0
> > > 0 0
> > > 444 1670 30207 133
> > > 0 0 0 0 0
> > > 0 0
> > > 1 0 0 0 0
> > > 0 0
> > > 2 0 0 0 0
> > > 0 0
> > > 3 0 0 0 0
> > > 0 0
> > > 0 0 0 0
> > > 0 0
> > > 445 1662 0 0
> > > 0 0R 34T 0 57R
> > > 238T 0
> > > 1 0R 0T 0 0R
> > > 0T 0
> > > 2 0R 0T 0 0R
> > > 0T 0
> > > 3 0R 0T 0 0R
> > > 81T 0
> > > 13807L 324O 867Y 2538N
> > > 63F 18A
> > >
> > > If I repeat the test many times, it may succeed by chance, but the
> > > untar process is very slow and generates about 7000 generations.
> > >
> > > But if I change the untar cmdline to:
> > > python -c "import sys; sys.stdout.buffer.write(open('$linux_src',
> > > mode='rb').read())" | tar zx
> > >
> > > Then the problem is gone, it can untar the file successfully and very fast.
> > >
> > > This might be a different issue reported by Chris, I'm not sure.
> >
> > After more testing, I think these are two problems (note I changed the
> > memcg limit to 600m later so the compile test can run smoothly).
> >
> > 1. OOM during the untar progress (can be workarounded by the untar
> > cmdline I mentioned above).
>
> There are two different issues here.
> My recent test script has moved the untar phase out of memcg limit
> (mostly I want to multithreading untar) so the bisect I did is only
> catch the second one.
> The untar issue might not be a regression from this patch.
>
> > 2. OOM during the compile progress (this should be the one Chris encountered).
> >
> > Both 1 and 2 only exist for MGLRU.
> > 1 can be workarounded using the cmdline I mentioned above.
> > 2 is caused by Ge's patch, and 1 is not.
> >
> > I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a
> > problem, it's not related to this patch, maybe can be discussed
> > elsewhere.
>
> I will do a test run now with Yu's patch and report back.
Confirm Yu's patch fixes the regression for me. Now it can sustain
470M pressure without causing OOM kill.
Yu, please submit your patch. This regression has merged into Linus'
tree already.
Feel free to add:
Tested-by: Chris Li <chrisl@...nel.org>
Chris
Powered by blists - more mailing lists