lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200929145403.GE2277@dhcp22.suse.cz>
Date:   Tue, 29 Sep 2020 16:54:03 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Daniel Vetter <daniel@...ll.ch>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Ben Segall <bsegall@...gle.com>, Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>, linux-hexagon@...r.kernel.org,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...nel.org>,
        Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        linux-arch <linux-arch@...r.kernel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Brian Cain <bcain@...eaurora.org>,
        Richard Weinberger <richard@....at>,
        Russell King <linux@...linux.org.uk>,
        Ard Biesheuvel <ardb@...nel.org>,
        David Airlie <airlied@...ux.ie>,
        Ingo Molnar <mingo@...hat.com>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Mel Gorman <mgorman@...e.de>,
        intel-gfx <intel-gfx@...ts.freedesktop.org>,
        Matt Turner <mattst88@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        linux-xtensa@...ux-xtensa.org, Shuah Khan <shuah@...nel.org>,
        Jeff Dike <jdike@...toit.com>,
        linux-um <linux-um@...ts.infradead.org>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>, rcu@...r.kernel.org,
        linux-m68k <linux-m68k@...ts.linux-m68k.org>,
        Ivan Kokshaysky <ink@...assic.park.msu.ru>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Richard Henderson <rth@...ddle.net>,
        Chris Zankel <chris@...kel.net>,
        Max Filippov <jcmvbkbc@...il.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        alpha <linux-alpha@...r.kernel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [patch 00/13] preempt: Make preempt count unconditional

On Tue 29-09-20 11:00:03, Daniel Vetter wrote:
> On Tue, Sep 29, 2020 at 10:19:38AM +0200, Michal Hocko wrote:
> > On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> > > I can
> > > then figure out whether it's better to risk not spotting issues with
> > > call_rcu vs slapping a memalloc_noio_save/restore around all these
> > > critical section which force-degrades any allocation to GFP_ATOMIC at
> > 
> > did you mean memalloc_noreclaim_* here?
> 
> Yeah I picked the wrong one of that family of functions.
> 
> > > most, but has the risk that we run into code that assumes "GFP_KERNEL
> > > never fails for small stuff" and has a decidedly less tested fallback
> > > path than rcu code.
> > 
> > Even if the above then please note that memalloc_noreclaim_* or
> > PF_MEMALLOC should be used with an extreme care. Essentially only for
> > internal memory reclaimers. It grants access to _all_ the available
> > memory so any abuse can be detrimental to the overall system operation.
> > Allocation failure in this mode means that we are out of memory and any
> > code relying on such an allocation has to carefuly consider failure.
> > This is not a random allocation mode.
> 
> Agreed, that's why I don't like having these kind of automagic critical
> sections. It's a bit a shotgun approach. Paul said that the code would
> handle failures, but the problem is that it applies everywhere.

Ohh, in the ideal world we wouldn't need anything like that. But then
the reality fires:
* PF_MEMALLOC (resp memalloc_noreclaim_* for that matter) is primarily used
  to make sure that allocations from inside the memory reclaim - yeah that
  happens - will not recurse.
* PF_MEMALLOC_NO{FS,IO} (resp memalloc_no{fs,io}*) are used to mark no
  fs/io reclaim recursion critical sections because controling that for
  each allocation inside fs transaction (or other sensitive) or IO
  contexts turned out to be unmaintainable and people simply fallen into
  using NOFS/NOIO unconditionally which is causing reclaim imbalance
  problems.
* PF_MEMALLOC_NOCMA (resp memalloc_nocma*) is used for long term pinning
  when CMA pages cannot be pinned because that would break the CMA
  guarantees. Communicating this to all potential allocations during
  pinning is simply unfeasible.

So you are absolutely right that these critical sections with side
effects on all allocations are far from ideal from the API point of view
but they are mostly mirroring a demand for functionality which is
_practically_ impossible to achieve with our current code base. Not that
we couldn't get back to drawing board and come up with a saner thing and
rework the world...
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ