linux-kernel - Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1710161253520.13473@nuc-kabylake>
Date:   Mon, 16 Oct 2017 12:56:43 -0500 (CDT)
From:   Christopher Lameter <cl@...ux.com>
To:     Michal Hocko <mhocko@...nel.org>
cc:     Guy Shattah <sguy@...lanox.com>,
        Mike Kravetz <mike.kravetz@...cle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        Laura Abbott <labbott@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

On Mon, 16 Oct 2017, Michal Hocko wrote:

> > We already have that issue and have ways to control that by tracking
> > pinned and mlocked pages as well as limits on their allocations.
>
> Ohh, it is very different because mlock limit is really small (64kB)
> which is not even close to what this is supposed to be about. Moreover
> mlock doesn't prevent from migration and so it doesn't prevent
> compaction to form higher order allocations.

The mlock limit is configurable. There is a tracking of pinned pages as
well.

> Really, this is just too dangerous without a deep consideration of all
> the potential consequences. The more I am thinking about this the more I
> am convinced that this all should be driver specific mmap based thing.
> If it turns out to be too restrictive over time and there are more
> experiences about the usage we can consider thinking about a more
> generic API. But starting from the generic MAP_ flag is just asking for
> problems.

This issue is already present with the pinning of lots of memory via the
RDMA API when in use for large gigabyte ranges. There is nothing new aside
from memory being contiguous with this approach.

> > There is not much new here in terms of problems. The hardware that
> > needs this seems to become more and more plentiful. That is why we need a
> > generic implementation.
>
> It would really help to name that HW and other potential usecases
> independent on the HW because I am rather skeptical about the
> _plentiful_ part. And so I really do not see any foundation to claim
> the generic part. Because, fundamentally, it is the HW which requires
> the specific memory placement/physically contiguous range etc. So the
> generic implementation doesn't really make sense in such a context.

RDMA hardware? Storage interfaces? Look at what the RDMA subsystem
and storage (NVME?) support.

This is not a hardware specific thing but a reflection of the general
limitations of the exiting 4k page struct scheme that limits performance
and causes severe pressure on I/O devices.