lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Sep 2016 10:37:04 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Gerald Schaefer <gerald.schaefer@...ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Michal Hocko <mhocko@...e.cz>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Rui Teng <rui.teng@...ux.vnet.ibm.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 0/1] memory offline issues with hugepage size > memory
 block size

On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a gigantic
> hugetlb page with a size > memory block size.
> 
> When no other smaller hugepage sizes are present, the VM_BUG_ON() will
> trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not use the head
> page of the compound hugetlb page which will result in a NULL hstate
> from page_hstate(). list_del() would also not work well on a tail page.
> 
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
> 
> However, this all assumes that it is the desired behaviour to remove
> a (gigantic) unused hugetlb page from the pool, just because a small
> (in relation to the  hugepage size) memory block is going offline. Not
> sure if this is the right thing, and it doesn't look very consistent
> given that in this scenario it is _not_ possible to migrate
> such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
> will return false in both cases, i.e. the memory block will be reported
> as removable, no matter if the hugepage that it is part of is unused or
> in use.
> 
> This patch is assuming that it would be OK to remove the hugepage,
> i.e. memory offline beats pre-allocated unused (gigantic) hugepages.
> 
> Any thoughts?

Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
this thread:
https://lkml.org/lkml/2016/9/13/146

Their approach (I believe) would be to fail the offline operation in
this case.  However, I could argue that failing the operation, or
dissolving the unused huge page containing the area to be offlined is
the right thing to do.

I never thought too much about the VM_BUG_ON(), but you are correct in
that it should be removed in either case.

The other thing that needs to be changed is the locking in
dissolve_free_huge_page().  I believe the lock only needs to be held if
we are removing the huge page from the pool.  It is not a correctness
but performance issue.

-- 
Mike Kravetz

> 
> 
> Gerald Schaefer (1):
>   mm/hugetlb: fix memory offline with hugepage size > memory block size
> 
>  mm/hugetlb.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ