lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <348f209e-89bc-4289-aaf9-e57437e31b0d@yandex-team.ru>
Date: Tue, 23 Sep 2025 15:04:50 +0300
From: Daniil Tatianin <d-tatianin@...dex-team.ru>
To: Florian Westphal <fw@...len.de>
Cc: Eric Dumazet <edumazet@...gle.com>,
 Pablo Neira Ayuso <pablo@...filter.org>,
 Jozsef Kadlecsik <kadlec@...filter.org>, Phil Sutter <phil@....cc>,
 "David S. Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, netfilter-devel@...r.kernel.org,
 coreteam@...filter.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/3] netfilter/x_tables: go back to using vmalloc for
 xt_table_info

On 9/23/25 2:36 PM, Florian Westphal wrote:

> Daniil Tatianin <d-tatianin@...dex-team.ru> wrote:
>>> On Mon, Sep 22, 2025 at 12:48 PM Daniil Tatianin
>>> <d-tatianin@...dex-team.ru> wrote:
>>>> This code previously always used vmalloc for anything above
>>>> PAGE_ALLOC_COSTLY_ORDER, but this logic was changed in
>>>> commit eacd86ca3b036 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()").
>>>>
>>>> The commit that changed it did so because "xt_alloc_table_info()
>>>> basically opencodes kvmalloc()", which is not actually what it was
>>>> doing. kvmalloc() does not attempt to go directly to vmalloc if the
>>>> order the caller is trying to allocate is "expensive", instead it only
>>>> uses vmalloc as a fallback in case the buddy allocator is not able to
>>>> fullfill the request.
>>>>
>>>> The difference between the two is actually huge in case the system is
>>>> under memory pressure and has no free pages of a large order. Before the
>>>> change to kvmalloc we wouldn't even try going to the buddy allocator for
>>>> large orders, but now we would force it to try to find a page of the
>>>> required order by waking up kswapd/kcompactd and dropping reclaimable memory
>>>> for no reason at all to satisfy our huge order allocation that could easily
>>>> exist within vmalloc'ed memory instead.
>>> This would hint at an issue with kvmalloc(), why not fixing it, instead
>>> of trying to fix all its users ?
> I agree with Eric.  There is nothing special in xtables compared to
> kvmalloc usage elsewhere in the stack.  Why "fix" xtables and not e.g.
> rhashtable?
>
> Please work with mm hackers to improve the situation for your use case.
>
> Maybe its enough to raise __GFP_NORETRY in kmalloc_gfp_adjust() if size
> results in >= PAGE_ALLOC_COSTLY_ORDER allocation.

Thanks for your reply! Perhaps this is the way to go, although this 
might have
much broader implications since there are tons of other callers to take 
into account.

I'm not sure whether rhashtable's size also directly depends on user 
input, I was only
aware of x_table since this is the case we ran into specifically.

>
>> Thanks for the quick reply! From my understanding, there is a lot of
>> callers of kvmalloc
>> who do indeed benefit from the physical memory being contiguous, because
>> it is then
>> used for hardware DMA etc., so I'm not sure that would be feasible.
> How can that work?  kvmalloc won't make vmalloc backed memory
> physically contiguous.

The allocated physical memory won't be contiguous only for fallback 
cases (which should be rare),
I assume in that case the hardware operation may end up being more 
expensive with larger scatter-gather
lists etc. So most of the time such code can take optimized paths for 
fully contiguous memory. This is not
the case for x_tables etc.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ