lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 May 2020 11:42:50 -0400
From:   Qian Cai <cai@....pw>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Kees Cook <keescook@...omium.org>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Iurii Zaikin <yzaikin@...gle.com>, andi.kleen@...el.com,
        tim.c.chen@...el.com, dave.hansen@...el.com, ying.huang@...el.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] make vm_committed_as_batch aware of vm overcommit
 policy

On Wed, May 27, 2020 at 09:33:32PM +0800, Feng Tang wrote:
> Hi Qian,
> 
> On Wed, May 27, 2020 at 08:05:49AM -0400, Qian Cai wrote:
> > On Wed, May 27, 2020 at 06:46:06PM +0800, Feng Tang wrote:
> > > Hi Qian,
> > > 
> > > On Tue, May 26, 2020 at 10:25:39PM -0400, Qian Cai wrote:
> > > > > > > > [1] https://lkml.org/lkml/2020/3/5/57
> > > > > > > 
> > > > > > > Reverted this series fixed a warning under memory pressue.
> > > > > > 
> > > > > > Andrew, Stephen, can you drop this series?
> > > > > > 
> > > > > > > 
> > > > > > > [ 3319.257898] LTP: starting oom01
> > > > > > > [ 3319.284417] ------------[ cut here ]------------
> > > > > > > [ 3319.284439] memory commitment underflow
> > > > > 
> > > > > Thanks for the catch!
> > > > > 
> > > > > Could you share the info about the platform, like the CPU numbers
> > > > > and RAM size, and what's the mmap test size of your test program.
> > > > > It would be great if you can point me the link to the test program.
> > > > 
> > > > I have been reproduced this on both AMD and Intel. The test just
> > > > allocating memory and swapping.
> > > > 
> > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c
> > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c
> > > > 
> > > > It might be better to run the whole LTP mm tests if none of the above
> > > > triggers it for you which has quite a few memory pressurers.
> > > > 
> > > > /opt/ltp/runltp -f mm
> > > 
> > > Thanks for sharing. I tried to reproduce this on 2 server plaforms,
> > > but can't reproduce it, and they are still under testing.
> > > 
> > > Meanwhile, could you help to try the below patch, which is based on
> > > Andi's suggestion and have some debug info. The warning is a little
> > > strange, as the condition is
> > > 
> > > 	(percpu_counter_read(&vm_committed_as) <
> > >                        -(s64)vm_committed_as_batch * num_online_cpus())
> > > 
> > > while for your platform (48 CPU + 128 GB RAM), the
> > > '-(s64)vm_committed_as_batch * num_online_cpus()'
> > > is a s64 value: '-32G', which makes the condition hard to be true,
> > > and when it is,  it could be triggered by some magic for s32/s64
> > > operations around the percpu-counter. 
> > 
> > Here is the information on AMD and powerpc below affected by this. It
> > could need a bit patient to reproduce, but our usual daily CI would
> > trigger it eventually after a few tries.
> > 
> > # git clone https://github.com/cailca/linux-mm.git
> > # cd linux-mm
> > # ./compile.sh
> > # systemctl reboot
> > # ./test.sh
> 
> I just downloaded it, and it failed on my desktop machine as it failed
> in 'yum' and 'grub2' setup. The difficulty for me to reproduce is the
> test platforms are behind the 0day framework, and I can hardly  setup
> external test suits, though I have been trying for all day today :) 

I tried your debug patch and it did not even compile on linux-next
(where the issue was happened) and I am running out of time today. It
probably need to reproduce on large systems as it did not happen on one
of our small s390 system here.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ