lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Oct 2017 20:07:49 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Guy Shattah <sguy@...lanox.com>,
        Christopher Lameter <cl@...ux.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        Laura Abbott <labbott@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

On Mon 16-10-17 10:43:38, Mike Kravetz wrote:
> On 10/15/2017 12:50 AM, Guy Shattah wrote:
> > On 13/10/2017 19:17, Michal Hocko wrote:
[...]
> >> But a generic implementation would have to deal with many issues as
> >> already mentioned. If you make this driver specific you can have access
> >> control based on fd etc... I really fail to see how this is any
> >> different from remap_pfn_range.
> > Why have several driver specific implementation if you can generalize the idea and implement
> > an already existing POSIX standard?
> 
> Just to be clear, the posix standard talks about a typed memory object.
> The suggested implementation has one create a connection to the memory
> object to receive a fd, then use mmap as usual to get a mapping backed
> by contiguous pages/memory.  Of course, this type of implementation is
> not a requirement.

I am not sure that POSIC standard for typed memory is easily
implementable in Linux. Does any OS actually implement this API?

> However, this type of implementation looks quite a
> bit like hugetlbfs today.
> - Both require opening a special file/device, and then calling mmap on
>   the returned fd.  You can technically use mmap(MAP_HUGETLB), but that
>   still ends up using hugetbfs.  BTW, there was resistance to adding the
>   MAP_HUGETLB flag to mmap.

And I think we shouldn't really shape any API based on hugetlb.

> - Allocation of contiguous memory is much like 'on demand' allocation of
>   huge pages.  There are some (not many) users that use this model.  They
>   attempt to allocate huge pages on demand, and if not available fall back
>   to base pages.  This is how contiguous allocations would need to work.
>   Of course, most hugetlbfs users pre-allocate pages for their use, and
>   this 'might' be something useful for contiguous allocations as well.

But there is still admin configuration required to consume memory from
the pool or overcommit that pool.

> I wonder if going down the path of a separate devide/filesystem/etc for
> contiguous allocations might be a better option.  It would keep the
> implementation somewhat separate.  However, I would then be afraid that
> we end up with another 'separate/special vm' as in the case of hugetlbfs
> today.

That depends on who is actually going to use the contiguous memory. If
we are talking about drivers to communication to the userspace then
using driver specific fd with its mmap implementation then we do not
need any special fs nor a seperate infrastructure. Well except for a
library function to handle the MM side of the thing.

If we really need a general purpose physical contiguous memory allocator
then I would agree that using MAP_ flag might be a way to go but that
would require a very careful consideration of who is allowed to allocate
and how much/large blocks. I do not see a good fit to conveying that
information to the kernel right now. Moreover, and most importantly, I
haven't heard any sound usecase for such a functionality in the first
place. There is some hand waving about performance but there are no real
numbers to back those claims AFAIK. Not to mention a serious
consideration of potential consequences of the whole MM.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ