linux-kernel - Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <689b7c24-623d-c01e-6c0f-ad430f1fa3ae@redhat.com>
Date:   Wed, 29 Sep 2021 17:05:08 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Ping Fang <pifang@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
        Oscar Salvador <osalvador@...e.de>,
        Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment >
 1

On 29.09.21 16:49, Uladzislau Rezki wrote:
> On Wed, Sep 29, 2021 at 4:40 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 29.09.21 16:30, Uladzislau Rezki wrote:
>>>>
>>>> So the idea is that once we run into a dead end because we took a left
>>>> subtree, we rollback to the next possible rigth subtree and try again.
>>>> If we run into another dead end, we repeat ... thus, this can now happen
>>>> more than once.
>>>>
>>>> I assume the only implication is that this can now be slower in some
>>>> corner cases with larger alignment, because it might take longer to find
>>>> something suitable. Fair enough.
>>>>
>>> Yep, your understanding is correct regarding the tree traversal. If no
>>> suitable block
>>> is found in left sub-tree we roll-back and check right one. So it can
>>> be(the scanning)
>>> more than one time.
>>>
>>> I did some performance analyzing using vmalloc test suite to figure
>>> out a performance
>>> loss for allocations with specific alignment. On that syntactic test i
>>> see approx. 30%
>>> of degradation:
>>
>> How realistic is that test case? I assume most alignment we're dealing
>> with is:
>> * 1/PAGE_SIZE
>> * huge page size (for automatic huge page placing)
>>
> Well that is synthetic test. Most of the alignments are 1 or PAGE_SIZE.
> There are users which use internal API where you can specify an alignment
> you want but those are mainly like KASAN, module alloc, etc.
> 
>>>
>>> 2.225 microseconds vs 1.496 microseconds. That time includes both
>>> vmalloc() and vfree()
>>> calls. I do not consider it as a big degrade, but from the other hand
>>> we can still adjust the
>>> search length for alignments > one page:
>>>
>>> # add it on top of previous proposal and search length instead of size
>>> length = align > PAGE_SIZE ? size + align:size;
>>
>> That will not allow to place huge pages in the case of kasan. And I
>> consider that more important than optimizing a syntactic test :) My 2 cents.
>>
> Could you please to be more specific? I mean how is it connected with huge
> pages mappings? Huge-pages are which have order > 0. Or you mean that
> a special alignments are needed for mapping huge pages?

Let me try to clarify:


KASAN does an exact allocation when onlining a memory block, 
__vmalloc_node_range() will try placing huge pages first, increasing the 
alignment to e.g., "1 << PMD_SHIFT".

If we increase the search length in find_vmap_lowest_match(), that 
search will fail if the exact allocation is surrounded by other 
allocations. In that case, we won't place a huge page although we could 
-- because find_vmap_lowest_match() would be imprecise for alignments > 
PAGE_SIZE.


Memory blocks we online/offline on x86 are at least 128MB. The KASAN 
"overhead" we have to allocate is 1/8 of that -- 16 MB, so essentially 8 
huge pages.

__vmalloc_node_range() will increase the alignment to 2MB to try placing 
huge pages first. find_vmap_lowest_match() will search within the given 
exact 16MB are a 18MB area (size + align), which won't work. So 
__vmalloc_node_range() will fallback to the original PAGE_SIZE alignment 
and shift=PAGE_SHIFT.

__vmalloc_area_node() will set the set_vm_area_page_order effectively to 
0 --  small pages.

Does that make sense or am I missing something?

-- 
Thanks,

David / dhildenb