lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ED52C51D9B87F54892CE544909A13C6C1FFB07DC@IRSMSX101.ger.corp.intel.com>
Date:   Fri, 10 Feb 2017 15:47:57 +0000
From:   "Andrejczuk, Grzegorz" <grzegorz.andrejczuk@...el.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhocko@...e.com" <mhocko@...e.com>,
        "n-horiguchi@...jp.nec.com" <n-horiguchi@...jp.nec.com>,
        "gerald.schaefer@...ibm.com" <gerald.schaefer@...ibm.com>,
        "aneesh.kumar@...ux.vnet.ibm.com" <aneesh.kumar@...ux.vnet.ibm.com>,
        "vaishali.thakkar@...cle.com" <vaishali.thakkar@...cle.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
CC:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC] mm/hugetlb: use mem policy when allocating surplus huge
 pages

On Mike Kravetz, February 9, 2017 8:32 PM wrote:
> I believe another way of stating the problem is as follows:
>
> At mmap(MAP_HUGETLB) time a reservation for the number of huge pages
> is made.  If surplus huge pages need to be (and can be) allocated to
> satisfy the reservation, they will be allocated at this time.  However,
> the memory policy of the task is not taken into account when these
> pages are allocated to satisfy the reservation.
>
> Later when the task actually faults on pages in the mapping, reserved
> huge pages should be instantiated in the mapping.  However, at fault time
> the task's memory policy is taken into account.  It is possible that the
> pages reserved at mmap() time, are located on nodes such that they can
> not satisfy the request with the task's memory policy.  In such a case,
> the allocation fails in the same way as if there was no reservation.
>
> Does that sound accurate?

Yes, thank you for taking time to rephrase it.
It's much cleaner now.

> Your problem statement (and solution) address the case where surplus huge
> pages need to be allocated at mmap() time to satisfy a reservation and
> later fault.  I 'think' there is a more general problem huge page reservations
> and memory policy.

Yes, I fixed very specific code path. This problem is probably one of many
problems in the crossing of the memory policy and huge pages reservations.

> - In both cases, there are enough free pages to satisfy the reservation
>   at mmap time.  However, at fault time it can not get both the pages is
>   requires from the specified node.

There is difference that interleaving in preallocated huge page is well known
and expected, when in overcommit all the pages might or might not be assigned
to the requested NUMA node. Also after setting nr_hugepages it is possible
to check number of the huge pages reserved for each node by:
cat /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/nr_hugepages
with nr_overcommit_hugepages it is impossible.

>  I'm thinking we may need to expand the reservation tracking to be
>  per-node like free_huge_pages_node and others.  Like the code below,
>  we need to take memory policy into account at reservation time.
>  
>  Thoughts?

Are amounts of free, allocated and surplus huge pages tracked in sysfs mentioned above?
My limited understanding of this problem is that obtaining all the memory policies
requires struct vm_area (for bind, preferred) and address (for interleave).
The first is lost in hugetlb_reserve_pages, the latter is lost when file->mmap is called.
So reservation of the huge pages needs to be done in mmap_region function
before calling file->mmap and I think this requires some new hugetlb API. 

Best Regards,
Grzegorz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ