lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbXJKskfo-bd5jr2GfagaFDoYz__dbQTKmq2=rqOpJzqYQ@mail.gmail.com>
Date: Sun, 4 Aug 2024 10:51:54 -0700
From: Chris Li <chrisl@...nel.org>
To: Kairui Song <ryncsn@...il.com>
Cc: Ge Yang <yangge1116@....com>, Yu Zhao <yuzhao@...gle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm <linux-mm@...ck.org>, 
	LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org, 
	Barry Song <21cnbao@...il.com>, David Hildenbrand <david@...hat.com>, baolin.wang@...ux.alibaba.com, 
	liuzixing@...on.cn, Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH V2] mm/gup: Clear the LRU flag of a page before adding to
 LRU batch

On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@...il.com> wrote:
>
> > Hi Yu, I tested your patch, on my system, the OOM still exists (96
> > core and 256G RAM), test memcg is limited to 512M and 32 thread ().
> >
> > And I found the OOM seems irrelevant to either your patch or Ge's
> > patch. (it may changed the OOM chance slight though)
> >
> > After the very quick OOM (it failed to untar the linux source code),
> > checking lru_gen_full:
> > memcg    47 /build-kernel-tmpfs
> >  node     0
> >         442       1691      29405           0
> >                      0          0r          0e          0p         57r
> >        617e          0p
> >                      1          0r          0e          0p          0r
> >          4e          0p
> >                      2          0r          0e          0p          0r
> >          0e          0p
> >                      3          0r          0e          0p          0r
> >          0e          0p
> >                                 0           0           0           0
> >          0           0
> >         443       1683      57748         832
> >                      0          0           0           0           0
> >          0           0
> >                      1          0           0           0           0
> >          0           0
> >                      2          0           0           0           0
> >          0           0
> >                      3          0           0           0           0
> >          0           0
> >                                 0           0           0           0
> >          0           0
> >         444       1670      30207         133
> >                      0          0           0           0           0
> >          0           0
> >                      1          0           0           0           0
> >          0           0
> >                      2          0           0           0           0
> >          0           0
> >                      3          0           0           0           0
> >          0           0
> >                                 0           0           0           0
> >          0           0
> >         445       1662          0           0
> >                      0          0R         34T          0          57R
> >        238T          0
> >                      1          0R          0T          0           0R
> >          0T          0
> >                      2          0R          0T          0           0R
> >          0T          0
> >                      3          0R          0T          0           0R
> >         81T          0
> >                             13807L        324O        867Y       2538N
> >         63F         18A
> >
> > If I repeat the test many times, it may succeed by chance, but the
> > untar process is very slow and generates about 7000 generations.
> >
> > But if I change the untar cmdline to:
> > python -c "import sys; sys.stdout.buffer.write(open('$linux_src',
> > mode='rb').read())" | tar zx
> >
> > Then the problem is gone, it can untar the file successfully and very fast.
> >
> > This might be a different issue reported by Chris, I'm not sure.
>
> After more testing, I think these are two problems (note I changed the
> memcg limit to 600m later so the compile test can run smoothly).
>
> 1. OOM during the untar progress (can be workarounded by the untar
> cmdline I mentioned above).

There are two different issues here.
My recent test script has moved the untar phase out of memcg limit
(mostly I want to multithreading untar) so the bisect I did is only
catch the second one.
The untar issue might not be a regression from this patch.

> 2. OOM during the compile progress (this should be the one Chris encountered).
>
> Both 1 and 2 only exist for MGLRU.
> 1 can be workarounded using the cmdline I mentioned above.
> 2 is caused by Ge's patch, and 1 is not.
>
> I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a
> problem, it's not related to this patch, maybe can be discussed
> elsewhere.

I will do a test run now with Yu's patch and report back.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ