linux-kernel - Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e238c56-c59d-f648-95fc-c8cb56c3652e@mellanox.com>
Date:   Mon, 16 Oct 2017 12:11:04 +0300
From:   Guy Shattah <sguy@...lanox.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Christopher Lameter <cl@...ux.com>,
        Mike Kravetz <mike.kravetz@...cle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        Laura Abbott <labbott@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support



On 16/10/2017 11:24, Michal Hocko wrote:
> On Sun 15-10-17 10:50:29, Guy Shattah wrote:
>>
>> On 13/10/2017 19:17, Michal Hocko wrote:
>>> On Fri 13-10-17 10:56:13, Cristopher Lameter wrote:
>>>> On Fri, 13 Oct 2017, Michal Hocko wrote:
>>>>
>>>>>> There is a generic posix interface that could we used for a variety of
>>>>>> specific hardware dependent use cases.
>>>>> Yes you wrote that already and my counter argument was that this generic
>>>>> posix interface shouldn't bypass virtual memory abstraction.
>>>> It does do that? In what way?
>>> availability of the virtual address space depends on the availability of
>>> the same sized contiguous physical memory range. That sounds like the
>>> abstraction is gone to large part to me.
>> In what way? userspace users will still be working with virtual memory.
> So you are saying that providing an API which fails randomly because of
> the physically fragmented memory is OK? Users shouldn't really care
> about the state of the physical memory. That is what we have the virtual
> memory for.

Users still see and work with virtual addresses, just as before.
Users using the suggested API are aware that API might fail since it 
involves current
system memory state. This won't be the first system call or the last one 
to fail due to
reasons beyond user control. For example: any user app might fail due to 
number of
open files, disk space, memory availability, network availability. All 
beyond user control.
A smart user always has their ways to handle exceptions.
A typical user failing to allocate contiguous memory and May fallback to 
allocating
non-contiguous memory. And by the way - even if each vendor implements 
their own
methods to allocate contiguous memory then this vendor specific API 
might fail too.
For the same reasons.




>   
>>>>>> There are numerous RDMA devices that would all need the mmap
>>>>>> implementation. And this covers only the needs of one subsystem. There are
>>>>>> other use cases.
>>>>> That doesn't prevent providing a library function which could be reused
>>>>> by all those drivers. Nothing really too much different from
>>>>> remap_pfn_range.
>>>> And then in all the other use cases as well. It would be much easier if
>>>> mmap could give you the memory you need instead of havig numerous drivers
>>>> improvise on their own. This is in particular also useful
>>>> for numerous embedded use cases where you need contiguous memory.
>>> But a generic implementation would have to deal with many issues as
>>> already mentioned. If you make this driver specific you can have access
>>> control based on fd etc... I really fail to see how this is any
>>> different from remap_pfn_range.
>> Why have several driver specific implementation if you can generalize the
>> idea and implement
>> an already existing POSIX standard?
> Because users shouldn't really care, really. We do have means to get
> large memory and having a guaranteed large memory is a PITA. Just look
> at hugetlb and all the issues it exposes. And that one is preallocated
> and it requires admin to do a conscious decision about the amount of the
> memory. You would like to establish something similar except without
> bounds to the size and no pre-allowed amount by an admin. This sounds
> just crazy to me.

Users do care about the performance they get using devices which benefit
from contiguous memory allocation.
Assuming that user requires 700Mb of contiguous memory. Then why allocate
giant (1GB) page when you can allocate 700Mb out of the 1GB and put the 
rest of the
300Mb back in the huge-pages/small-pages pool?


>
> On the other hand if you make this per-device mmap implementation you
> can have both admin defined policy on who is allowed this memory and
> moreover drivers can implement their fallback strategies which best suit
> their needs. I really fail to see how this is any different from using
> specialized mmap implementations.
We tried doing it in the past. but the maintainer gave us a very good 
argument:
" If you want to support anonymous mmaps to allocate large contiguous
pages work with the MM folks on providing that in a generic fashion."

After discussing it with people who have the same requirements as we do -
I totally agree with him

http://comments.gmane.org/gmane.linux.drivers.rdma/31467

> I might be really wrong but I consider such a general purpose flag quite
> dangerous and future maintenance burden. At least from the hugetlb/THP
> history I do not see why this should be any different.
Could you please elaborate why is it dangerous and future maintenance 
burden?

Thanks.