[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87cy30tlwq.fsf@cloudflare.com>
Date: Fri, 23 Jan 2026 15:59:33 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org, "John Fastabend" <john.fastabend@...il.com>,
"David S. Miller" <davem@...emloft.net>, "Eric Dumazet"
<edumazet@...gle.com>, "Jakub Kicinski" <kuba@...nel.org>, "Paolo Abeni"
<pabeni@...hat.com>, "Simon Horman" <horms@...nel.org>, "Neal Cardwell"
<ncardwell@...gle.com>, "Kuniyuki Iwashima" <kuniyu@...gle.com>, "David
Ahern" <dsahern@...nel.org>, "Andrii Nakryiko" <andrii@...nel.org>,
"Eduard Zingerman" <eddyz87@...il.com>, "Alexei Starovoitov"
<ast@...nel.org>, "Daniel Borkmann" <daniel@...earbox.net>, "Martin
KaFai Lau" <martin.lau@...ux.dev>, "Song Liu" <song@...nel.org>,
"Yonghong Song" <yonghong.song@...ux.dev>, "KP Singh"
<kpsingh@...nel.org>, "Stanislav Fomichev" <sdf@...ichev.me>, "Hao Luo"
<haoluo@...gle.com>, "Jiri Olsa" <jolsa@...nel.org>, "Shuah Khan"
<shuah@...nel.org>, "Michal Luczaj" <mhal@...x.co>, "Cong Wang"
<cong.wang@...edance.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap
On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote:
> January 21, 2026 at 20:55, "Jiayuan Chen" <jiayuan.chen@...ux.dev
> mailto:jiayuan.chen@...ux.dev?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E
>> wrote:
>> January 21, 2026 at 17:36, "Jakub Sitnicki" <jakub@...udflare.com
>> mailto:jakub@...udflare.com?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E
>> > I've been thinking about this some more and came to the conclusion that
>> > this udp_bpf_ioctl implementation is actually what we want, while
>> > tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is
>> > non-empty.
>> >
>> > Why? Because the verdict prog might redirect or drop the skbs from
>> > sk_receive_queue once it actually runs. The messages might never appear
>> > on the msg_ingress queue.
>> >
>> > What I think we should be doing, in the end, is kicking the
>> > sk_receive_queue processing on bpf_map_update_elem, if there's data
>> > ready.
>> >
>> > The API semantics I'm proposing is:
>> >
>> > 1. ioctl(FIONREAD) -> reports N bytes
>> > 2. bpf_map_update_elem(sk) -> socket inserted into sockmap
>> > 3. poll() for POLLIN -> wait for socket to be ready to read
>> > 5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't
>> > redirect or drop it
>> >
>> > We don't have to add the the queue kick on map update in this series.
>> >
>> > If you decide to leave it for later, can I ask that you open an issue at
>> > our GH project [1]?
>> >
>> > I don't want it to fall through the cracks. And I sometimes have people
>> > asking what they could help with in sockmap.
>> >
>> > Thanks,
>> > -jkbs
>> >
>> > [1] https://github.com/sockmap-project/sockmap-project/issues
>> >
>> Hi Jakub,
>>
>> Thanks for taking the time to think through this carefully. I agree with your
>> analysis - reporting sk_receive_queue length is misleading since the verdict
>> prog might redirect or drop those skbs.
>>
>> There's no rush to merge this patch.
>>
>> Since the kick queue on bpf_map_update_elem addresses a closely related issue,
>> I think it makes sense to include it in this patchset for easier tracking rather
>> than splitting it out.
>>
>> I'll spend more time looking into this and come back with an updated version.
>>
>> Thanks,
>> Jiayuan
>>
>
>
> Hi Jakub,
>
> I've been thinking about this more, and I realize the problem is not as simple as it seems.
>
> Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF
> program may not be fully initialized at that point. For example, with a
> redirect program, the destination fd might not yet be inserted into the
> map. If we kick the data through the BPF program immediately, the
> redirect lookup would fail, leading to unexpected behavior (data being
> dropped or passed to the wrong socket).
I reckon there is not much we can do about it because we have no control
over when inserts/removes sockets from sockmap. It can happen at any
time.
Also, a newly received segment can trigger sk_data_ready callback,
and that would also cause the skbs to get processed. We don't have
control of that either.
Does this change break any of our existing tests/benchmarks or some
other setup of yours?
> I also considered triggering the kick in poll/select via
> sk_msg_is_readable(). However, this approach doesn't work for TCP
> because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready()
> will return early when sk_receive_queue has data, before ever calling
> sk_is_readable().
>
> In the next version, I'll address your other nits and remove the
> sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on
> the GH project to track this problem so we can continue exploring
> better solutions.
Sounds like a plan. Thanks!
Powered by blists - more mailing lists