lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00a27e2b-0fc2-4980-bc4e-b383f15d3ad9@126.com>
Date: Sat, 3 Aug 2024 16:25:27 +0800
From: Ge Yang <yangge1116@....com>
To: Chris Li <chrisl@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm <linux-mm@...ck.org>,
 LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
 Barry Song <21cnbao@...il.com>, David Hildenbrand <david@...hat.com>,
 baolin.wang@...ux.alibaba.com, liuzixing@...on.cn,
 Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH V2] mm/gup: Clear the LRU flag of a page before adding to
 LRU batch



在 2024/8/3 4:18, Chris Li 写道:
> On Thu, Aug 1, 2024 at 6:56 PM Ge Yang <yangge1116@....com> wrote:
>>
>>
>>
>>>> I can't reproduce this problem, using tmpfs to compile linux.
>>>> Seems you limit the memory size used to compile linux, which leads to
>>>> OOM. May I ask why the memory size is limited to 481280kB? Do I also
>>>> need to limit the memory size to 481280kB to test?
>>>
>>> Yes, you need to limit the cgroup memory size to force the swap
>>> action. I am using memory.max = 470M.
>>>
>>> I believe other values e.g. 800M can trigger it as well. The reason to
>>> limit the memory to cause the swap action.
>>> The goal is to intentionally overwhelm the memory load and let the
>>> swap system do its job. The 470M is chosen to cause a lot of swap
>>> action but not too high to cause OOM kills in normal kernels.
>>> In another word, high enough swap pressure but not too high to bust
>>> into OOM kill. e.g. I verify that, with your patch reverted, the
>>> mm-stable kernel can sustain this level of swap pressure (470M)
>>> without OOM kill.
>>>
>>> I borrowed the 470M magic value from Hugh and verified it works with
>>> my test system. Huge has a similar swab test up which is more
>>> complicated than mine. It is the inspiration of my swap stress test
>>> setup.
>>>
>>> FYI, I am using "make -j32" on a machine with 12 cores (24
>>> hyperthreading). My typical swap usage is about 3-5G. I set my
>>> swapfile size to about 20G.
>>> I am using zram or ssd as the swap backend.  Hope that helps you
>>> reproduce the problem.
>>>
>> Hi Chris,
>>
>> I try to construct the experiment according to your suggestions above.
> 
> Hi Ge,
> 
> Sorry to hear that you were not able to reproduce it.
> 
>> High swap pressure can be triggered, but OOM can't be reproduced. The
>> specific steps are as follows:
>> root@...ntu-server-2204:/home/yangge# cp workspace/linux/ /dev/shm/ -rf
> 
> I use a slightly different way to setup the tmpfs:
> 
> Here is section of my script:
> 
>          if ! [ -d $tmpdir ]; then
>                  sudo mkdir -p $tmpdir
>                  sudo mount -t tmpfs -o size=100% nodev $tmpdir
>          fi
> 
>          sudo mkdir -p $cgroup
>          sudo sh -c "echo $mem > $cgroup/memory.max" || echo setup
> memory.max error
>          sudo sh -c "echo 1 > $cgroup/memory.oom.group" || echo setup
> oom.group error
> 
> Per run:
> 
>         # $workdir is under $tmpdir
>          sudo rm -rf $workdir
>          mkdir -p $workdir
>          cd $workdir
>          echo "Extracting linux tree"
>          XZ_OPT='-T0 -9 –memory=75%' tar xJf $linux_src || die "xz
> extract failed"
> 
>          sudo sh -c "echo $BASHPID > $cgroup/cgroup.procs"
>          echo "Cleaning linux tree, setup defconfig"
>          cd $workdir/linux
>          make -j$NR_TASK clean
>          make defconfig > /dev/null
>          echo Kernel compile run $i
>          /usr/bin/time -a -o $log make --silent -j$NR_TASK  || die "make failed"
> >

Thanks.

>> root@...ntu-server-2204:/home/yangge# sync
>> root@...ntu-server-2204:/home/yangge# echo 3 > /proc/sys/vm/drop_caches
>> root@...ntu-server-2204:/home/yangge# cd /sys/fs/cgroup/
>> root@...ntu-server-2204:/sys/fs/cgroup/# mkdir kernel-build
>> root@...ntu-server-2204:/sys/fs/cgroup/# cd kernel-build
>> root@...ntu-server-2204:/sys/fs/cgroup/kernel-build# echo 470M > memory.max
>> root@...ntu-server-2204:/sys/fs/cgroup/kernel-build# echo $$ > cgroup.procs
>> root@...ntu-server-2204:/sys/fs/cgroup/kernel-build# cd /dev/shm/linux/
>> root@...ntu-server-2204:/dev/shm/linux# make clean && make -j24
> 
> I am using make -j 32.
> 
> Your step should work.
> 
> Did you enable MGLRU in your .config file? Mine did. I attached my
> config file here.
> 

The above test didn't enable MGLRU.

When MGLRU is enabled, I can reproduce OOM very soon. The cause of 
triggering OOM is being analyzed.

>>
>> Please help to see which step does not meet your requirements.
> 
> How many cores does your server have? I assume your RAM should be
> plenty on that server.
> 

My server has 64 cores (128 hyperthreading) and 160G of RAM.

> Chris


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ