linux-kernel - Re: [PATCH 1/3] netfilter/x_tables: go back to using vmalloc for xt_table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f1ff52a-d2c2-40de-b00c-661b75c18dc7@yandex-team.ru>
Date: Tue, 23 Sep 2025 13:11:41 +0300
From: Daniil Tatianin <d-tatianin@...dex-team.ru>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Pablo Neira Ayuso <pablo@...filter.org>,
 Jozsef Kadlecsik <kadlec@...filter.org>, Florian Westphal <fw@...len.de>,
 Phil Sutter <phil@....cc>, "David S. Miller" <davem@...emloft.net>,
 David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 netfilter-devel@...r.kernel.org, coreteam@...filter.org,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/3] netfilter/x_tables: go back to using vmalloc for
 xt_table_info

On 9/23/25 12:12 AM, Eric Dumazet wrote:

> On Mon, Sep 22, 2025 at 12:48 PM Daniil Tatianin
> <d-tatianin@...dex-team.ru> wrote:
>> This code previously always used vmalloc for anything above
>> PAGE_ALLOC_COSTLY_ORDER, but this logic was changed in
>> commit eacd86ca3b036 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()").
>>
>> The commit that changed it did so because "xt_alloc_table_info()
>> basically opencodes kvmalloc()", which is not actually what it was
>> doing. kvmalloc() does not attempt to go directly to vmalloc if the
>> order the caller is trying to allocate is "expensive", instead it only
>> uses vmalloc as a fallback in case the buddy allocator is not able to
>> fullfill the request.
>>
>> The difference between the two is actually huge in case the system is
>> under memory pressure and has no free pages of a large order. Before the
>> change to kvmalloc we wouldn't even try going to the buddy allocator for
>> large orders, but now we would force it to try to find a page of the
>> required order by waking up kswapd/kcompactd and dropping reclaimable memory
>> for no reason at all to satisfy our huge order allocation that could easily
>> exist within vmalloc'ed memory instead.
> This would hint at an issue with kvmalloc(), why not fixing it, instead
> of trying to fix all its users ?

Thanks for the quick reply! From my understanding, there is a lot of 
callers of kvmalloc
who do indeed benefit from the physical memory being contiguous, because 
it is then
used for hardware DMA etc., so I'm not sure that would be feasible.

>
> There was a time where PAGE_ALLOC_COSTLY_ORDER was used.

Out of curiosity, do you mean kvmalloc used to always fall back to 
vmalloc for > COSTLY_ORDER?
If so, do you happen to know, which commit changed that behavior? I 
tried grepping the logs and
looking at the git blame of slub.c but I guess it was changed too long 
ago so I wasn't successful.

>
>
>
>> Revert the change to always call vmalloc, since this code doesn't really
>> benefit from contiguous physical memory, and the size it allocates is
>> directly dictated by the userspace-passed table buffer thus allowing it to
>> torture the buddy allocator by carefully crafting a huge table that fits
>> right at the maximum available memory order on the system.
>>
>> Signed-off-by: Daniil Tatianin <d-tatianin@...dex-team.ru>
>> ---
>>   net/netfilter/x_tables.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>> index 90b7630421c4..c98f4b05d79d 100644
>> --- a/net/netfilter/x_tables.c
>> +++ b/net/netfilter/x_tables.c
>> @@ -1190,7 +1190,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
>>          if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
>>                  return NULL;
>>
>> -       info = kvmalloc(sz, GFP_KERNEL_ACCOUNT);
>> +       info = __vmalloc(sz, GFP_KERNEL_ACCOUNT);
>>          if (!info)
>>                  return NULL;
>>
>> @@ -1210,7 +1210,7 @@ void xt_free_table_info(struct xt_table_info *info)
>>                  kvfree(info->jumpstack);
>>          }
>>
>> -       kvfree(info);
>> +       vfree(info);
>>   }
>>   EXPORT_SYMBOL(xt_free_table_info);
>>
>> --
>> 2.34.1
>>