lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iKDXXjf-OFu+oAYfKp9WOdq4v=HBWpFn=7HRNUCy_9RFg@mail.gmail.com>
Date: Tue, 23 Sep 2025 05:22:53 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Daniil Tatianin <d-tatianin@...dex-team.ru>
Cc: Florian Westphal <fw@...len.de>, Pablo Neira Ayuso <pablo@...filter.org>, 
	Jozsef Kadlecsik <kadlec@...filter.org>, Phil Sutter <phil@....cc>, 
	"David S. Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	netfilter-devel@...r.kernel.org, coreteam@...filter.org, 
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/3] netfilter/x_tables: go back to using vmalloc for xt_table_info

On Tue, Sep 23, 2025 at 5:04 AM Daniil Tatianin
<d-tatianin@...dex-team.ru> wrote:
>
> On 9/23/25 2:36 PM, Florian Westphal wrote:
>
> > Daniil Tatianin <d-tatianin@...dex-team.ru> wrote:
> >>> On Mon, Sep 22, 2025 at 12:48 PM Daniil Tatianin
> >>> <d-tatianin@...dex-team.ru> wrote:
> >>>> This code previously always used vmalloc for anything above
> >>>> PAGE_ALLOC_COSTLY_ORDER, but this logic was changed in
> >>>> commit eacd86ca3b036 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()").
> >>>>
> >>>> The commit that changed it did so because "xt_alloc_table_info()
> >>>> basically opencodes kvmalloc()", which is not actually what it was
> >>>> doing. kvmalloc() does not attempt to go directly to vmalloc if the
> >>>> order the caller is trying to allocate is "expensive", instead it only
> >>>> uses vmalloc as a fallback in case the buddy allocator is not able to
> >>>> fullfill the request.
> >>>>
> >>>> The difference between the two is actually huge in case the system is
> >>>> under memory pressure and has no free pages of a large order. Before the
> >>>> change to kvmalloc we wouldn't even try going to the buddy allocator for
> >>>> large orders, but now we would force it to try to find a page of the
> >>>> required order by waking up kswapd/kcompactd and dropping reclaimable memory
> >>>> for no reason at all to satisfy our huge order allocation that could easily
> >>>> exist within vmalloc'ed memory instead.
> >>> This would hint at an issue with kvmalloc(), why not fixing it, instead
> >>> of trying to fix all its users ?
> > I agree with Eric.  There is nothing special in xtables compared to
> > kvmalloc usage elsewhere in the stack.  Why "fix" xtables and not e.g.
> > rhashtable?
> >
> > Please work with mm hackers to improve the situation for your use case.
> >
> > Maybe its enough to raise __GFP_NORETRY in kmalloc_gfp_adjust() if size
> > results in >= PAGE_ALLOC_COSTLY_ORDER allocation.
>
> Thanks for your reply! Perhaps this is the way to go, although this
> might have
> much broader implications since there are tons of other callers to take
> into account.
>
> I'm not sure whether rhashtable's size also directly depends on user
> input, I was only
> aware of x_table since this is the case we ran into specifically.
>
> >
> >> Thanks for the quick reply! From my understanding, there is a lot of
> >> callers of kvmalloc
> >> who do indeed benefit from the physical memory being contiguous, because
> >> it is then
> >> used for hardware DMA etc., so I'm not sure that would be feasible.
> > How can that work?  kvmalloc won't make vmalloc backed memory
> > physically contiguous.
>
> The allocated physical memory won't be contiguous only for fallback
> cases (which should be rare),
> I assume in that case the hardware operation may end up being more
> expensive with larger scatter-gather
> lists etc. So most of the time such code can take optimized paths for
> fully contiguous memory. This is not
> the case for x_tables etc.

At least some years ago, we were seeing a performance difference.

x_tables data is often read sequentially, I do not know if modern
cpus hardware prefetches use TLB (virtual space). I do not know
if they can span a 4K page, even if physically contiguous.


Some context :

commit 6c5ab6511f718c3fb19bcc3f78a90b0e0b601675
Author: Michal Hocko <mhocko@...e.com>
Date:   Mon May 8 15:57:15 2017 -0700

    mm: support __GFP_REPEAT in kvmalloc_node for >32kB

    vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
    vhost_vsock because it would really like to prefer kmalloc to the
    vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
    allocation to vmalloc") for more context.  Michael Tsirkin has also
    noted:

commit 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5
Author: Michael S. Tsirkin <mst@...hat.com>
Date:   Wed Jan 23 21:46:47 2013 +0100

    vhost-net: extend device allocation to vmalloc

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ