lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d4851fd-f0fd-9bfe-d271-b53891fdab6f@oracle.com>
Date:   Thu, 11 Mar 2021 14:53:08 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Michal Hocko <mhocko@...e.com>,
        "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Muchun Song <songmuchun@...edance.com>, corbet@....net,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
        hpa@...or.com, dave.hansen@...ux.intel.com, luto@...nel.org,
        peterz@...radead.org, viro@...iv.linux.org.uk,
        akpm@...ux-foundation.org, mchehab+huawei@...nel.org,
        pawan.kumar.gupta@...ux.intel.com, rdunlap@...radead.org,
        oneukum@...e.com, anshuman.khandual@....com, jroedel@...e.de,
        almasrymina@...gle.com, rientjes@...gle.com, willy@...radead.org,
        osalvador@...e.de, song.bao.hua@...ilicon.com, david@...hat.com,
        naoya.horiguchi@....com, joao.m.martins@...cle.com,
        duanxiongchun@...edance.com, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, Chen Huang <chenhuang5@...wei.com>,
        Bodeddula Balasubramaniam <bodeddub@...zon.com>
Subject: Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated
 with each HugeTLB page

On 3/11/21 9:59 AM, Mike Kravetz wrote:
> On 3/11/21 4:17 AM, Michal Hocko wrote:
>>> Yeah per cpu preempt counting shouldn't be noticeable but I have to
>>> confess I haven't benchmarked it.
>>
>> But all this seems moot now http://lkml.kernel.org/r/YEoA08n60+jzsnAl@hirez.programming.kicks-ass.net
>>
> 
> The proper fix for free_huge_page independent of this series would
> involve:
> 
> - Make hugetlb_lock and subpool lock irq safe
> - Hand off freeing to a workque if the freeing could sleep
> 
> Today, the only time we can sleep in free_huge_page is for gigantic
> pages allocated via cma.  I 'think' the concern about undesirable
> user visible side effects in this case is minimal as freeing/allocating
> 1G pages is not something that is going to happen at a high frequency.
> My thinking could be wrong?
> 
> Of more concern, is the introduction of this series.  If this feature
> is enabled, then ALL free_huge_page requests must be sent to a workqueue.
> Any ideas on how to address this?
> 

Thinking about this more ...

A call to free_huge_page has two distinct outcomes
1) Page is freed back to the original allocator: buddy or cma
2) Page is put on hugetlb free list

We can only possibly sleep in the first case 1.  In addition, freeing a
page back to the original allocator involves these steps:
1) Removing page from hugetlb lists
2) Updating hugetlb counts: nr_hugepages, surplus
3) Updating page fields
4) Allocate vmemmap pages if needed as in this series
5) Calling free routine of original allocator

If hugetlb_lock is irq safe, we can perform the first 3 steps under that
lock without issue.  We would then use a workqueue to perform the last
two steps.  Since we are updating hugetlb user visible data under the
lock, there should be no delays.  Of course, giving those pages back to
the original allocator could still be delayed, and a user may notice
that.  Not sure if that would be acceptable?  I think Muchun had a
similar setup just for vmemmmap allocation in an early version of this
series.

This would also require changes to where accounting is done in
dissolve_free_huge_page and update_and_free_page as mentioned elsewhere.

P.S. We could further optimize to check for the possibility of sleeping
(cma or vmemmap) and only send to workqueue in those cases.
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ