linux-kernel - Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f8664fb-0d65-b7d6-39d6-2ce5fc86623a@redhat.com>
Date:   Tue, 16 Feb 2021 09:13:09 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...e.com>,
        Muchun Song <songmuchun@...edance.com>
Cc:     Jonathan Corbet <corbet@....net>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org,
        Peter Zijlstra <peterz@...radead.org>, viro@...iv.linux.org.uk,
        Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
        mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
        Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
        anshuman.khandual@....com, jroedel@...e.de,
        Mina Almasry <almasrymina@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Oscar Salvador <osalvador@...e.de>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>,
        Joao Martins <joao.m.martins@...cle.com>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        linux-doc@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap
 pages associated with each HugeTLB page

On 15.02.21 20:02, Michal Hocko wrote:
> On Tue 16-02-21 01:48:29, Muchun Song wrote:
>> On Tue, Feb 16, 2021 at 12:28 AM Michal Hocko <mhocko@...e.com> wrote:
>>>
>>> On Mon 15-02-21 23:36:49, Muchun Song wrote:
>>> [...]
>>>>> There shouldn't be any real reason why the memory allocation for
>>>>> vmemmaps, or handling vmemmap in general, has to be done from within the
>>>>> hugetlb lock and therefore requiring a non-sleeping semantic. All that
>>>>> can be deferred to a more relaxed context. If you want to make a
>>>>
>>>> Yeah, you are right. We can put the freeing hugetlb routine to a
>>>> workqueue. Just like I do in the previous version (before v13) patch.
>>>> I will pick up these patches.
>>>
>>> I haven't seen your v13 and I will unlikely have time to revisit that
>>> version. I just wanted to point out that the actual allocation doesn't
>>> have to happen from under the spinlock. There are multiple ways to go
>>> around that. Dropping the lock would be one of them. Preallocation
>>> before the spin lock is taken is another. WQ is certainly an option but
>>> I would take it as the last resort when other paths are not feasible.
>>>
>>
>> "Dropping the lock" and "Preallocation before the spin lock" can limit
>> the context of put_page to non-atomic context. I am not sure if there
>> is a page puted somewhere under an atomic context. e.g. compaction.
>> I am not an expert on this.
> 
> Then do a due research or ask for a help from the MM community. Do
> not just try to go around harder problems and somehow duct tape a
> solution. I am sorry for sounding harsh here but this is a repetitive
> pattern.
> 
> Now to the merit. put_page can indeed be called from all sorts of
> contexts. And it might be indeed impossible to guarantee that hugetlb
> pages are never freed up from an atomic context. Requiring that would be
> even hard to maintain longterm. There are ways around that, I believe,
> though.
> 
> The most simple one that I can think of right now would be using
> in_atomic() rather than in_task() check free_huge_page. IIRC recent
> changes would allow in_atomic to be reliable also on !PREEMPT kernels
> (via RCU tree, not sure where this stands right now). That would make
> __free_huge_page always run in a non-atomic context which sounds like an
> easy enough solution.
> Another way would be to keep a pool of ready pages to use in case of
> GFP_NOWAIT allocation fails and have means to keep that pool replenished
> when needed. Would it be feasible to reused parts of the freed page in
> the worst case?

As already discussed, this is only possible when the huge page does not 
reside on ZONE_MOVABLE/CMA.

In addition, we can no longer form a huge page at that memory location ever.

-- 
Thanks,

David / dhildenb