lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 May 2018 15:42:22 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Artem Bityutskiy <dedekind1@...il.com>,
        Richard Weinberger <richard@....at>,
        David Woodhouse <dwmw2@...radead.org>,
        Brian Norris <computersforpeace@...il.com>,
        Boris Brezillon <boris.brezillon@...e-electrons.com>,
        Marek Vasut <marek.vasut@...il.com>,
        Cyrille Pitchen <cyrille.pitchen@...ev4u.fr>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Steven Whitehouse <swhiteho@...hat.com>,
        Bob Peterson <rpeterso@...hat.com>,
        Trond Myklebust <trond.myklebust@...marydata.com>,
        Anna Schumaker <anna.schumaker@...app.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        Mikulas Patocka <mpatocka@...hat.com>,
        linux-mtd@...ts.infradead.org, linux-ext4@...r.kernel.org,
        cluster-devel@...hat.com, linux-nfs@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: vmalloc with GFP_NOFS

On Tue 24-04-18 13:25:42, Michal Hocko wrote:
[...]
> > As a suggestion, could you take
> > documentation about how to convert to the memalloc_nofs_{save,restore}
> > scope api (which I think you've written about e-mails at length
> > before), and put that into a file in Documentation/core-api?
> 
> I can.

Does something like the below sound reasonable/helpful?
---
=================================
GFP masks used from FS/IO context
=================================

:Date: Mapy, 2018
:Author: Michal Hocko <mhocko@...nel.org>

Introduction
============

FS resp. IO submitting code paths have to be careful when allocating
memory to prevent from potential recursion deadlocks caused by direct
memory reclaim calling back into the FS/IO path and block on already
held resources (e.g. locks). Traditional way to avoid this problem
is to clear __GFP_FS resp. __GFP_IO (note the later implies clearing
the first as well) in the gfp mask when calling an allocator. GFP_NOFS
resp. GFP_NOIO can be used as shortcut.

This has been the traditional way to avoid deadlocks since ages. It
turned out though that above approach has led to abuses when the restricted
gfp mask is used "just in case" without a deeper consideration which leads
to problems because an excessive use of GFP_NOFS/GFP_NOIO can lead to
memory over-reclaim or other memory reclaim issues.

New API
=======

Since 4.12 we do have a generic scope API for both NOFS and NOIO context
``memalloc_nofs_save``, ``memalloc_nofs_restore`` resp. ``memalloc_noio_save``,
``memalloc_noio_restore`` which allow to mark a scope to be a critical
section from the memory reclaim recursion into FS/IO POV. Any allocation
from that scope will inherently drop __GFP_FS resp. __GFP_IO from the given
mask so no memory allocation can recurse back in the FS/IO.

FS/IO code then simply calls the appropriate save function right at
the layer where a lock taken from the reclaim context (e.g. shrinker)
is taken and the corresponding restore function when the lock is
released. All that ideally along with an explanation what is the reclaim
context for easier maintenance.

What about __vmalloc(GFP_NOFS)
==============================

vmalloc doesn't support GFP_NOFS semantic because there are hardcoded
GFP_KERNEL allocations deep inside the allocator which are quit non-trivial
to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
almost always a bug. The good news is that the NOFS/NOIO semantic can be
achieved by the scope api.

In the ideal world, upper layers should already mark dangerous contexts
and so no special care is required and vmalloc should be called without
any problems. Sometimes if the context is not really clear or there are
layering violations then the recommended way around that is to wrap ``vmalloc``
by the scope API with a comment explaining the problem.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ