lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 24 Oct 2017 09:41:46 +0200
From:   "C.Wehrmeyer" <c.wehrmeyer@....de>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>, linux-mm@...ck.org,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: PROBLEM: Remapping hugepages mappings causes kernel to return
 EINVAL

On 2017-10-23 20:02, Michal Hocko wrote:
> On Mon 23-10-17 19:52:27, C.Wehrmeyer wrote:
> [...]
>>> or you can mmap a larger block and
>>> munmap the initial unaligned part.
>>
>> And how is that supposed to be transparent? When I hear "transparent" I
>> think of a mechanism which I can put under a system so that it benefits from
>> it, while the system does not notice or at least does not need to be aware
>> of it. The system also does not need to be changed for it.
> 
> How do you expect to get a huge page when the mapping itself is not
> properly aligned?

There are four ways that I can think of from the top of my head, but 
only one of them would be actually transparent.

1. Provide a flag to mmap, which might be something different from 
MAP_HUGETLB. After all your question revolved merely around properly 
aligned pages - we don't want to *force* the kernel to reserve 
hugepages, we just want it to provide the proper alignment in this case. 
That wouldn't be very transparent, but it would be the easiest route to 
go (and mmap already kind-of supports such a thing).

2. Based on transparent_hugepage/enabled always churn out properly 
aligned pages. In this case madvise(MADV_HUGEPAGE) becomes obsolete - 
after all it's mmap which decides what kind of addresses we get. First 
getting *some* mapping that isn't properly aligned for hugepages and 
*then* trying to mitigate the damage by another syscall not only defies 
the meaning of "transparent", but might also be hard to implement 
kernel-side. Let's say I map 8 MiBs of memory, without mmap knowing that 
I'd prefer this to be allocated via THPs. I could either go with your 
route (map 8 MiBs and then some more, trim at the beginning and the end, 
and then tell madvise that all of that is now going to be hugepages - 
which is something that could easily be done in the kernel, especially 
with the internal knowledge about what the actual page size is and 
without all those context switches that one takes in by mapping, 
munmapping, munmapping *again* and then *madvising* the actual memory), 
or I'd go with my third option.

3. I map 8 MiBs, some some misaligned address from mmap, and then try to 
mitigate the damage by telling madvise that all that is now supposed to 
use hugepages. The dumb way of implementing this would be to split the 
mapping - one section at the beginning has 256 4-KiB pages, the next one 
utilises 3 2-MiB pages, and the last section has 256 4-KiB pages again 
(or some such), effectively equalling 8 MiBs. I don't even know if Linux 
supports variable-page-size mappings, and of course we're still carrying 
512 4-KiBs pages with us that would have easily been mapped into one 
2-MiB page, which is why I call it the dumb way.

4. Like three, but a wee bit smarter: introduce another system call that 
works like madvise(MADV_HUGEPAGE), but let it return the address of a 
properly aligned mapping, thus giving userspace 4 genuine 2-MiB pages. 
Just like 3) that wouldn't be transparent, but at least it's only 4 
context switches that don't give us half-baked hugepages. However, this 
approach would effectively only be 1), just more complicated and 
un-transparent.

tl; dr:

1. Provide mmap with some sort of flag (which would be redundant IMHO) 
in order to churn out properly aligned pages (not transparent, but the 
current MAP_HUGETLB flag isn't either).
2. Based on THP enabling status always churn out properly aligned pages, 
and just failsafe to smaller pages if hugepages couldn't be allocated 
(truly transparent).
3. Map in memory, then tell madvise to make as many hugepages out of it 
as possible while still keeping the initial mapping (not transparent, 
and not sure Linux can actually do that).
4. Introduce a new system call (not transparent from the get-go) to give 
out properly aligned pages, or make them properly aligned while the 
mapping is transformed from not-properly-aligned to properly-aligned.

Your call.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ