netdev - Re: [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87cy30tlwq.fsf@cloudflare.com>
Date: Fri, 23 Jan 2026 15:59:33 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org,  "John Fastabend" <john.fastabend@...il.com>,
  "David S. Miller" <davem@...emloft.net>,  "Eric Dumazet"
 <edumazet@...gle.com>,  "Jakub Kicinski" <kuba@...nel.org>,  "Paolo Abeni"
 <pabeni@...hat.com>,  "Simon Horman" <horms@...nel.org>,  "Neal Cardwell"
 <ncardwell@...gle.com>,  "Kuniyuki Iwashima" <kuniyu@...gle.com>,  "David
 Ahern" <dsahern@...nel.org>,  "Andrii Nakryiko" <andrii@...nel.org>,
  "Eduard Zingerman" <eddyz87@...il.com>,  "Alexei Starovoitov"
 <ast@...nel.org>,  "Daniel Borkmann" <daniel@...earbox.net>,  "Martin
 KaFai Lau" <martin.lau@...ux.dev>,  "Song Liu" <song@...nel.org>,
  "Yonghong Song" <yonghong.song@...ux.dev>,  "KP Singh"
 <kpsingh@...nel.org>,  "Stanislav Fomichev" <sdf@...ichev.me>,  "Hao Luo"
 <haoluo@...gle.com>,  "Jiri Olsa" <jolsa@...nel.org>,  "Shuah Khan"
 <shuah@...nel.org>,  "Michal Luczaj" <mhal@...x.co>,  "Cong Wang"
 <cong.wang@...edance.com>,  netdev@...r.kernel.org,
  linux-kernel@...r.kernel.org,  linux-kselftest@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 2/3] bpf, sockmap: Fix FIONREAD for sockmap

On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote:
> January 21, 2026 at 20:55, "Jiayuan Chen" <jiayuan.chen@...ux.dev
> mailto:jiayuan.chen@...ux.dev?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E
>> wrote:
>> January 21, 2026 at 17:36, "Jakub Sitnicki" <jakub@...udflare.com
>> mailto:jakub@...udflare.com?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E
>> >  I've been thinking about this some more and came to the conclusion that
>> >  this udp_bpf_ioctl implementation is actually what we want, while
>> >  tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is
>> >  non-empty.
>> >  
>> >  Why? Because the verdict prog might redirect or drop the skbs from
>> >  sk_receive_queue once it actually runs. The messages might never appear
>> >  on the msg_ingress queue.
>> >  
>> >  What I think we should be doing, in the end, is kicking the
>> >  sk_receive_queue processing on bpf_map_update_elem, if there's data
>> >  ready.
>> >  
>> >  The API semantics I'm proposing is:
>> >  
>> >  1. ioctl(FIONREAD) -> reports N bytes
>> >  2. bpf_map_update_elem(sk) -> socket inserted into sockmap
>> >  3. poll() for POLLIN -> wait for socket to be ready to read
>> >  5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't
>> >  redirect or drop it
>> >  
>> >  We don't have to add the the queue kick on map update in this series.
>> >  
>> >  If you decide to leave it for later, can I ask that you open an issue at
>> >  our GH project [1]?
>> >  
>> >  I don't want it to fall through the cracks. And I sometimes have people
>> >  asking what they could help with in sockmap.
>> >  
>> >  Thanks,
>> >  -jkbs
>> >  
>> >  [1] https://github.com/sockmap-project/sockmap-project/issues
>> > 
>> Hi Jakub,
>> 
>> Thanks for taking the time to think through this carefully. I agree with your
>> analysis - reporting sk_receive_queue length is misleading since the verdict
>> prog might redirect or drop those skbs.
>> 
>> There's no rush to merge this patch.
>> 
>> Since the kick queue on bpf_map_update_elem addresses a closely related issue,
>> I think it makes sense to include it in this patchset for easier tracking rather
>> than splitting it out.
>> 
>> I'll spend more time looking into this and come back with an updated version.
>> 
>> Thanks,
>> Jiayuan
>>
>
>
> Hi Jakub,
>
>   I've been thinking about this more, and I realize the problem is not as simple as it seems.
>
>   Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF
>   program may not be fully initialized at that point. For example, with a
>   redirect program, the destination fd might not yet be inserted into the
>   map. If we kick the data through the BPF program immediately, the
>   redirect lookup would fail, leading to unexpected behavior (data being
>   dropped or passed to the wrong socket).

I reckon there is not much we can do about it because we have no control
over when inserts/removes sockets from sockmap. It can happen at any
time.

Also, a newly received segment can trigger sk_data_ready callback,
and that would also cause the skbs to get processed. We don't have
control of that either.

Does this change break any of our existing tests/benchmarks or some
other setup of yours?

>   I also considered triggering the kick in poll/select via
>   sk_msg_is_readable(). However, this approach doesn't work for TCP
>   because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready()
>   will return early when sk_receive_queue has data, before ever calling
>   sk_is_readable().
>
>   In the next version, I'll address your other nits and remove the
>   sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on
>   the GH project to track this problem so we can continue exploring
>   better solutions.

Sounds like a plan. Thanks!