linux-kernel - Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <221e38c1-4b8a-8608-455a-6bde544adaf0@redhat.com>
Date:   Fri, 17 Sep 2021 10:47:54 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Ping Fang <pifang@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
        Oscar Salvador <osalvador@...e.de>,
        Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment >
 1

>>> This patch looks like a KASAN specific to me. So i would like to avoid
>>> such changes to
>>> the vmalloc internals in order to simplify further maintenance and
>>> keeping it generic
>>> instead.
>>
>> There is nothing KASAN specific in there :) It's specific to exact
>> applications -- and KASAN may be one of the only users for now.
>>
> Well, you can name it either way :) So it should not be specific by the
> design, otherwise as i mentioned the allocator would be like a peace of
> code that handles corner cases what is actually not acceptable.

Let's not overstress the situation of adding essentially 3 LOC in order 
to fix a sane use case of the vmalloc allocator that was not considered 
properly by internal changes due to 68ad4a330433 ("mm/vmalloc.c: keep 
track of free blocks for vmap allocation").

> 
>>>
>>> Currently the find_vmap_lowest_match() adjusts the search size
>>> criteria for any alignment
>>> values even for PAGE_SIZE alignment. That is not optimal. Because a
>>> PAGE_SIZE or one
>>> page is a minimum granularity we operate on vmalloc allocations thus
>>> all free blocks are
>>> page aligned.
>>>
>>> All that means that we should adjust the search length only if an
>>> alignment request is bigger than
>>> one page, in case of one page, that corresponds to PAGE_SIZE value,
>>> there is no reason in such
>>> extra adjustment because all free->va_start have a page boundary anyway.
>>>
>>> Could you please test below patch that is a generic improvement?
>>
>> I played with the exact approach below (almost exactly the same code in
>> find_vmap_lowest_match()), and it might make sense as a general improvement
>> -- if we can guarantee that start+end of ranges are always PAGE-aligned; I
>> was not able to come to that conclusion...
> All free blocks are PAGE aligned that is how it has to be. A vstart also
> should be aligned otherwise the __vunmap() will complain about freeing
> a bad address:
> 
> <snip>
>      if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
>              addr))
>          return;
> <snip>
> 
> BTW, we should add an extra check to the alloc_vmap_area(), something like
> below:
> 
> <snip>
>      if (!PAGE_ALIGNED(ALIGN(vstart, align))) {
>          WARN_ONCE(1, "vmalloc: vstart or align are not page aligned (0x%lx, 0x%lx)\n",
>              vstart, align);
>          return ERR_PTR(-EBUSY);
> 	}
> <snip>
> 
> to check that passed pair of vstart and align are correct.
> 

Yes we better should.

>>
>> vmap_init_free_space() shows me that our existing alignment code/checks
>> might be quite fragile.
>>
> It is not important to page align a first address. As i mentioned before
> vstart and align have to be correct and we should add such check.
> 
>>
>> But I mainly decided to go with my patch instead because it will also work
>> for exact allocations with align > PAGE_SIZE: most notably, when we try
>> population of hugepages instead of base pages in __vmalloc_node_range(), by
>> increasing the alignment. If the exact range allows for using huge pages,
>> placing huge pages will work just fine with my modifications, while it won't
>> with your approach.
>>
>> Long story short: my approach handles exact allocations also for bigger
>> alignment, Your optimization makes sense as a general improvement for any
>> vmalloc allocations.
>>
>> Thanks for having a look!
>>
> The problem is that there are no users(seems only KASAN) who set the range
> that corresponds to exact size. If you add an aligment overhead on top of

So there is one user and it was broken by 68ad4a330433 ("mm/vmalloc.c: 
keep track of free blocks for vmap allocation").

> it means that search size should include it because we may end up with exact
> free block and after applying aligment it will not fit. This is how allocators
> handle aligment.

This is an implementation detail of the vmalloc allocator ever since 
68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap 
allocation").

If callers pass an exact range, and the alignment they specify applies, 
why should we reject such an allocation? It's leaking an implementation 
detail fixed easily internally into callers.

> 
> Another approach is(you mentioned it in your commit message):
> 
> urezki@...38:~/data/raid0/coding/linux-next.git$ git diff mm/kasan/shadow.c
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 082ee5b6d9a1..01d3bd76c851 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -200,7 +200,7 @@ static int __meminit kasan_mem_notifier(struct notifier_block *nb,
>                  if (shadow_mapped(shadow_start))
>                          return NOTIFY_OK;
>   
> -               ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start,
> +               ret = __vmalloc_node_range(shadow_size, 1, shadow_start,
>                                          shadow_end, GFP_KERNEL,
>                                          PAGE_KERNEL, VM_NO_GUARD,
>                                          pfn_to_nid(mem_data->start_pfn),
> urezki@...38:~/data/raid0/coding/linux-next.git$
> 
> why not? Also you can increase the range in KASAN code.

No, that's leaking implementation details to the caller. And no, 
increasing the range and eventually allocating something bigger (e.g., 
placing a huge page where it might not have been possible) is not 
acceptable for KASAN.

If you're terribly unhappy with this patch, please suggest something 
reasonable to fix exact allocations:
a) Fixes the KASAN use case.
b) Allows for automatic placement of huge pages for exact allocations.
c) Doesn't leak implementation details into the caller.

Thanks

-- 
Thanks,

David / dhildenb