[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUBPxhBSPwW3jKVGNunmz7-MHwnRgL1RCjSA-WqvnJmGDg@mail.gmail.com>
Date: Thu, 18 Sep 2025 19:28:10 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>, Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Mina Almasry <almasrymina@...gle.com>, Kuniyuki Iwashima <kuni1840@...il.com>, bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH v9 bpf-next/net 6/6] selftest: bpf: Add test for SK_MEMCG_EXCLUSIVE.
On Thu, Sep 18, 2025 at 6:14 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 9/17/25 6:17 PM, Kuniyuki Iwashima wrote:
> >>> +
> >>> +close:
> >>> + for (i = 0; i < ARRAY_SIZE(sk); i++)
> >>> + close(sk[i]);
> >>> +
> >>> + if (test_case->type == SOCK_DGRAM) {
> >>> + /* UDP recv queue is destroyed after RCU grace period.
> >>> + * With one kern_sync_rcu(), memory_allocated[0] of the
> >>> + * isoalted case often matches with memory_allocated[1]
> >>> + * of the preceding non-exclusive case.
> >>> + */
> >> I don't think I understand the double kern_sync_rcu() below.
> > With one kern_sync_rcu(), when I added bpf_printk() for memory_allocated,
> > I sometimes saw two consecutive non-zero values, meaning memory_allocated[0]
> > still see the previous test case result (memory_allocated[1]).
> > ASSERT_LE() succeeds as expected, but somewhat unintentionally.
> >
> > bpf_trace_printk: memory_allocated: 0 <-- non exclusive case
> > bpf_trace_printk: memory_allocated: 4160
> > bpf_trace_printk: memory_allocated: 4160 <-- exclusive case's
> > memory_allocated[0]
> > bpf_trace_printk: memory_allocated: 0
> > bpf_trace_printk: memory_allocated: 0
> > bpf_trace_printk: memory_allocated: 0
> >
> > One kern_sync_rcu() is enough to kick call_rcu() + sk_destruct() but
> > does not guarantee that it completes, so if the queue length was too long,
> > the memory_allocated does not go down fast enough.
> >
> > But now I don't see the flakiness with NR_SEND 32, and one
> > kern_sync_rcu() might be enough unless the env is too slow...?
>
> Ah, got it. I put you in the wrong path. It needs rcu_barrier() instead.
>
> Is recv() enough? May be just recv(MSG_DONTWAIT) to consume it only for UDP
> socket? that will slow down the udp test... only read 1 byte and the remaining
> can be MSG_TRUNC?
recv() before close() seems to work with fewer sockets (with the same
bytes of send()s and without kern_sync_rcu()).
And the test time was mostly the same with "no recv() + 1 kern_sync_rcu()"
case (2~2.5s on my qemu), so draining seems better.
#define NR_PAGES 128
#define NR_SOCKETS 2
#define BUF_TOTAL (NR_PAGES * 4096 / (NR_SOCKETS / 2))
#define BUF_SINGLE 1024
#define NR_SEND (BUF_TOTAL / BUF_SINGLE)
NR_SOCKET==64
test_progs-1380 [008] ...11 2121.731176: bpf_trace_printk:
memory_allocated: 0
test_progs-1380 [008] ...11 2121.737700: bpf_trace_printk:
memory_allocated: 576
test_progs-1380 [008] ...11 2121.743436: bpf_trace_printk:
memory_allocated: 64
test_progs-1380 [008] ...11 2121.749984: bpf_trace_printk:
memory_allocated: 64
test_progs-1380 [008] ...11 2121.755763: bpf_trace_printk:
memory_allocated: 64
test_progs-1380 [008] ...11 2121.762171: bpf_trace_printk:
memory_allocated: 64
NR_SOCKET==2
test_progs-1424 [009] ...11 2352.990520: bpf_trace_printk:
memory_allocated: 0
test_progs-1424 [009] ...11 2352.997395: bpf_trace_printk:
memory_allocated: 514
test_progs-1424 [009] ...11 2353.001910: bpf_trace_printk:
memory_allocated: 2
test_progs-1424 [009] ...11 2353.008947: bpf_trace_printk:
memory_allocated: 2
test_progs-1424 [009] ...11 2353.012988: bpf_trace_printk:
memory_allocated: 2
test_progs-1424 [009] ...11 2353.019988: bpf_trace_printk:
memory_allocated: 0
While NR_PAGES sets to 128, the actual allocated page was around 300 for
TCP and 500 for UDP. I reduced NR_PAGES to 64, then TCP consumed 160
pages, and UDP consumed 250, but test time didn't change.
>
> btw, does the test need 64 sockets? is it because of the per socket snd/rcvbuf
> limitation?
I chose 64 for UDP that does not tune rcvbuf automatically, but I saw an error
when I increased the number of send() bytes and set SO_RCVBUF anyway,
so any number of sockets is fine now.
I'll use 2 sockets but will keep for-loops so that we can change NR_SOCKETS
easily when the test gets flaky.
>
> Another option is to trace SEC("fexit/__sk_destruct") to ensure all the cleanup
> is done but seems overkill if recv() can do.
Agreed, draining before close() would be enough.
Powered by blists - more mailing lists