[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com>
Date: Sat, 27 Sep 2025 12:19:27 +0100
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: Bastien Curutchet <bastien.curutchet@...tlin.com>, Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>, Jonathan Lemon <jonathan.lemon@...il.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>,
"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>, Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
Alexis Lothore <alexis.lothore@...tlin.com>, Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v4 04/15] selftests/bpf: test_xsk: fix memory
leak in testapp_stats_rx_dropped()
On Fri, Sep 26, 2025 at 12:47 PM Maciej Fijalkowski
<maciej.fijalkowski@...el.com> wrote:
>
> On Fri, Sep 26, 2025 at 08:39:28AM +0200, Bastien Curutchet wrote:
> > Hi Maciej,
> >
> > On 9/25/25 3:32 PM, Maciej Fijalkowski wrote:
> > > On Wed, Sep 24, 2025 at 04:49:39PM +0200, Bastien Curutchet (eBPF Foundation) wrote:
> > > > testapp_stats_rx_dropped() generates pkt_stream twice. The last
> > > > generated is released by pkt_stream_restore_default() at the end of the
> > > > test but we lose the pointer of the first pkt_stream.
> > > >
> > > > Release the 'middle' pkt_stream when it's getting replaced to prevent
> > > > memory leaks.
> > > >
> > > > Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@...tlin.com>
> > > > ---
> > > > tools/testing/selftests/bpf/test_xsk.c | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/test_xsk.c b/tools/testing/selftests/bpf/test_xsk.c
> > > > index 8d7c38eb32ca3537cb019f120c3350ebd9f8c6bc..eb18288ea1e4aa1c9337d16333b7174ecaed0999 100644
> > > > --- a/tools/testing/selftests/bpf/test_xsk.c
> > > > +++ b/tools/testing/selftests/bpf/test_xsk.c
> > > > @@ -536,6 +536,13 @@ static void pkt_stream_receive_half(struct test_spec *test)
> > > > struct pkt_stream *pkt_stream = test->ifobj_tx->xsk->pkt_stream;
> > > > u32 i;
> > > > + if (test->ifobj_rx->xsk->pkt_stream != test->rx_pkt_stream_default)
> > > > + /* Packet stream has already been replaced so we have to release this one.
> > > > + * The newly created one will be freed by the restore_default() at the
> > > > + * end of the test
> > > > + */
> > > > + pkt_stream_delete(test->ifobj_rx->xsk->pkt_stream);
> > >
> > > I don't see why this one is not addressed within test case
> > > (testapp_stats_rx_dropped()) and other fix is (testapp_xdp_shared_umem()).
> > >
> >
> > pkt_stream_receive_half() can be used by other tests. I thought it would be
>
> So is pkt_stream_replace_half() and other routines that eventually call
> pkt_stream_generate() and overwrite the pkt_stream, right?
>
> It just feels odd to have a special treatment in one function and other
> are left as-is just because currently we don't have another abusive test
> case.
>
> Maybe it's enough of bike-shedding here, just wanted to clarify on my POV.
>
> In the end don't get me wrong here, this interface is a bit PITA for me
> and thanks for whole effort!
My reading of this discussion that it doesn't block the series
and can be done in the follow up if necessary.
So I was planning to apply it, but it found real bugs:
./test_progs -t xsk
[ 18.066989] bpf_testmod: loading out-of-tree module taints kernel.
[ 32.204881] BUG: Bad page state in process test_progs pfn:11c98b
[ 32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x11c98b
[ 32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
[ 32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000
0000000000000000
[ 32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff
0000000000000000
[ 32.220900] page dumped because: page_pool leak
[ 32.222636] Modules linked in: bpf_testmod(O) bpf_preload
[ 32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G
O 6.17.0-rc5-gfec474d29325 #6969 PREEMPT
[ 32.224638] Tainted: [O]=OOT_MODULE
[ 32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 32.224641] Call Trace:
[ 32.224644] <IRQ>
[ 32.224646] dump_stack_lvl+0x4b/0x70
[ 32.224653] bad_page.cold+0xbd/0xe0
[ 32.224657] __free_frozen_pages+0x838/0x10b0
[ 32.224660] ? skb_pp_cow_data+0x782/0xc30
[ 32.224665] bpf_xdp_shrink_data+0x221/0x530
[ 32.224668] ? skb_pp_cow_data+0x6d1/0xc30
[ 32.224671] bpf_xdp_adjust_tail+0x598/0x810
[ 32.224673] ? xsk_destruct_skb+0x321/0x800
[ 32.224678] bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
[ 32.224681] veth_xdp_rcv_skb+0x45d/0x15a0
[ 32.224684] ? get_stack_info_noinstr+0x16/0xe0
[ 32.224688] ? veth_set_channels+0x920/0x920
[ 32.224691] ? get_stack_info+0x2f/0x80
[ 32.224693] ? unwind_next_frame+0x3af/0x1df0
[ 32.224697] veth_xdp_rcv.constprop.0+0x38a/0xbe0
[ 32.224700] ? common_startup_64+0x13e/0x148
[ 32.224703] ? veth_xdp_rcv_one+0xcd0/0xcd0
[ 32.224706] ? stack_trace_save+0x84/0xa0
[ 32.224709] ? stack_depot_save_flags+0x28/0x820
[ 32.224713] ? __resched_curr.constprop.0+0x332/0x3b0
[ 32.224716] ? timerqueue_add+0x217/0x320
[ 32.224719] veth_poll+0x115/0x5e0
[ 32.224722] ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
[ 32.224726] ? update_load_avg+0x1cb/0x12d0
[ 32.224730] ? update_cfs_group+0x121/0x2c0
[ 32.224733] __napi_poll+0xa0/0x420
[ 32.224736] net_rx_action+0x901/0xe90
[ 32.224740] ? run_backlog_napi+0x50/0x50
[ 32.224743] ? clockevents_program_event+0x1cc/0x280
[ 32.224746] ? hrtimer_interrupt+0x31e/0x7c0
[ 32.224749] handle_softirqs+0x151/0x430
[ 32.224752] do_softirq+0x3f/0x60
[ 32.224755] </IRQ>
[ 32.224756] <TASK>
[ 32.224757] __local_bh_enable_ip+0x58/0x60
[ 32.224759] __dev_direct_xmit+0x295/0x540
[ 32.224762] __xsk_generic_xmit+0x180a/0x2df0
[ 32.224764] ? ___kmalloc_large_node+0xdf/0x130
[ 32.224767] ? __mutex_unlock_slowpath.isra.0+0x330/0x330
[ 32.224770] ? __rtnl_unlock+0x65/0xd0
[ 32.224773] ? xsk_create+0x700/0x700
[ 32.224774] ? netdev_run_todo+0xce/0xbe0
[ 32.224777] ? _raw_spin_lock_irqsave+0x7b/0xc0
[ 32.224780] xsk_sendmsg+0x365/0x770
[ 32.224782] ? xsk_poll+0x640/0x640
[ 32.224783] __sock_sendmsg+0xc1/0x150
[ 32.224787] __sys_sendto+0x1d0/0x260
[ 32.224790] ? __ia32_sys_getpeername+0xb0/0xb0
[ 32.224793] ? fput+0x29/0x80
[ 32.224796] ? __sys_bind+0x187/0x1c0
[ 32.224798] ? __sys_bind_socket+0x90/0x90
[ 32.224801] ? randomize_page+0x60/0x60
[ 32.224804] ? fget+0x18e/0x230
[ 32.224807] __x64_sys_sendto+0xe0/0x1b0
[ 32.224810] ? fpregs_assert_state_consistent+0x57/0xe0
[ 32.224812] do_syscall_64+0x46/0x180
[ 32.224815] entry_SYSCALL_64_after_hwframe+0x4b/0x53
and at the end:
# ERROR: [receive_pkts] Receive loop timed out
test_xsk:FAIL:Run test unexpected error: -1 (errno 12)
#251/32 ns_xsk_drv/XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF:FAIL
#251 ns_xsk_drv:FAIL
Summary: 1/67 PASSED, 0 SKIPPED, 1 FAILED
[ 99.308243] page_pool_release_retry() stalled pool shutdown: id
185, 48 inflight 60 sec
[ 159.724173] page_pool_release_retry() stalled pool shutdown: id
185, 48 inflight 120 sec
The test is great and the work to make it run as part of test_progs
paid off big time.
But we cannot enable it by default, since it will be crashing CI VMs.
Please reproduce the above issue.
You might need
CONFIG_DEBUG_VM=y
and other mm debug flags.
If the fix can be done quickly let's land the fix first.
If not, please respin the series, but disable the test by default
until the bug is fixed.
Powered by blists - more mailing lists