[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjExIgF2ib0ePvbh@pop-os.localdomain>
Date: Tue, 15 Mar 2022 17:36:50 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: wangyufen <wangyufen@...wei.com>, ast@...nel.org,
john.fastabend@...il.com, daniel@...earbox.net, lmb@...udflare.com,
davem@...emloft.net, kafai@...com, dsahern@...nel.org,
kuba@...nel.org, songliubraving@...com, yhs@...com,
kpsingh@...nel.org, netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap
elements in user mode is not allowed
On Tue, Mar 15, 2022 at 01:12:08PM +0100, Jakub Sitnicki wrote:
> On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote:
> > 在 2022/3/14 23:30, Jakub Sitnicki 写道:
> >> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote:
> >>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete
> >>> the sockmap element, the tcp socket will switch to use the TCP protocol
> >>> stack to send and receive packets. The switching process may cause some
> >>> issues, such as if some msgs exist in the ingress queue and are cleared
> >>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal.
> >>>
> >>> Signed-off-by: Wang Yufen <wangyufen@...wei.com>
> >>> ---
> >> Can you please tell us a bit more about the life-cycle of the socket in
> >> your workload? Questions that come to mind:
> >>
> >> 1) What triggers the removal of the socket from sockmap in your case?
> > We use sk_msg to redirect with sock hash, like this:
> >
> > skA redirect skB
> > Tx <-----------> skB,Rx
> >
> > And construct a scenario where the packet sending speed is high, the
> > packet receiving speed is slow, so the packets are stacked in the ingress
> > queue on the receiving side. In this case, if run bpf_map_delete_elem() to
> > delete the sockmap entry, will trigger the following procedure:
> >
> > sock_hash_delete_elem()
> > sock_map_unref()
> > sk_psock_put()
> > sk_psock_drop()
> > sk_psock_stop()
> > __sk_psock_zap_ingress()
> > __sk_psock_purge_ingress_msg()
> >
> >> 2) Would it still be a problem if removal from sockmap did not cause any
> >> packets to get dropped?
> > Yes, it still be a problem. If removal from sockmap did not cause any
> > packets to get dropped, packet receiving process switches to use TCP
> > protocol stack. The packets in the psock ingress queue cannot be received
> >
> > by the user.
>
> Thanks for the context. So, if I understand correctly, you want to avoid
> breaking the network pipe by updating the sockmap from user-space.
>
> This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that?
Doesn't BPF_MAP_FREEZE only freeze write operations from syscalls?
For sockmap, receiving packets is not a part of map write operation.
The problem here is that skmsg can only be consumed when the socket is
still in the map, as it uses a separate queue and a separate type of
message (skmsg vs. skb). So, esstentially this behavior is by design.
Thanks.
Powered by blists - more mailing lists