linux-kernel - Re: [mm] 4e2c82a409: ltp.overcommit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 7 Jul 2020 09:04:36 -0400
From:   Qian Cai <cai@....pw>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Feng Tang <feng.tang@...el.com>,
        kernel test robot <rong.a.chen@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Kees Cook <keescook@...omium.org>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Iurii Zaikin <yzaikin@...gle.com>, andi.kleen@...el.com,
        tim.c.chen@...el.com, dave.hansen@...el.com, ying.huang@...el.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org, lkp@...ts.01.org
Subject: Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail

On Tue, Jul 07, 2020 at 02:06:19PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 07:43:48, Qian Cai wrote:
> > 
> > 
> > > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > > 
> > > Would you have any examples? Because I find this highly unlikely.
> > > OVERCOMMIT_NEVER only works when virtual memory is not largerly
> > > overcommited wrt to real memory demand. And that tends to be more of
> > > an exception rather than a rule. "Modern" userspace (whatever that
> > > means) tends to be really hungry with virtual memory which is only used
> > > very sparsely.
> > > 
> > > I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> > > friendly SW and this is a permanent setting or this is not used at all.
> > > At least this is my experience.
> > > 
> > > So I strongly suspect that LTP test failure is not something we should
> > > really lose sleep over. It would be nice to find a way to flush existing
> > > batches but I would rather see a real workload that would suffer from
> > > this imprecision.
> > 
> > I hear you many times that you really don’t care about those use
> > cases unless you hear exactly people are using in your world.
> > 
> > For example, when you said LTP oom tests are totally artificial last
> > time and how less you care about if they are failing, and I could only
> > enjoy their efficiencies to find many issues like race conditions
> > and bad error accumulation handling etc that your “real world use
> > cases” are going to take ages or no way to flag them.
> 
> Yes, they are effective at hitting corner cases and that is fine. I
> am not dismissing their usefulness. I have tried to explain that many
> times but let me try again. Seeing a corner case and think about a
> potential fix is one thing. On the other hand it is not really ideal to
> treat such a failure a hard regression and consider otherwise useful

Well, terms like "corner cases" and "hard regression" are rather
subjective.

> functionality/improvement to be reverted without a proper cost benefit
> analysis. Sure having corner cases is not really nice but really, look
> at this example again. Overcommit setting is a global thing, it is hard
> to change it during runtime nilly willy. Because that might have really
> detrimental side effects on all workloads running. So it is quite
> reasonable to expect that this is either early after the boot or when
> the system is in quiescent state when almost nothing but very core
> services are running and likelihood that the mode of operation changes.

Not really convinced that is only way people will use those tunables.

> 
> > There are just too many valid use cases in this wild world. The
> > difference is that I admit that I don’t know or even aware all the
> > use cases, and I don’t believe you do as well.
> 
> Me neither and I am not claiming that. All I am saying is that a real
> risk of a regression is reasonably low that I wouldn't lose sleep over
> that. It is perfectly fine to address this pro-actively if the fix is
> reasonably maintainable. I was mostly reacting to your pushing for a
> revert solely based on LTP results.
> 
> LTP is a very useful tool to raise awareness of potential problems but
> you shouldn't really follow those results just blindly.

You must think I am a newbie tester to give me this piece of advice
then.