[<prev] [next>] [day] [month] [year] [list]
Message-ID: <6231746e8e561_ad0208bf@john.notmuch>
Date: Tue, 15 Mar 2022 22:23:58 -0700
From: John Fastabend <john.fastabend@...il.com>
To: wangyufen <wangyufen@...wei.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>
Cc: ast@...nel.org, john.fastabend@...il.com, lmb@...udflare.com,
davem@...emloft.net, kafai@...com, dsahern@...nel.org,
kuba@...nel.org, songliubraving@...com, yhs@...com,
kpsingh@...nel.org, netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap
elements in user mode is not allowed
wangyufen wrote:
>
> 在 2022/3/16 0:25, Daniel Borkmann 写道:
> > On 3/15/22 1:12 PM, Jakub Sitnicki wrote:
> >> On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote:
> >>> 在 2022/3/14 23:30, Jakub Sitnicki 写道:
> >>>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote:
> >>>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to
> >>>>> delete
> >>>>> the sockmap element, the tcp socket will switch to use the TCP
> >>>>> protocol
> >>>>> stack to send and receive packets. The switching process may cause
> >>>>> some
> >>>>> issues, such as if some msgs exist in the ingress queue and are
> >>>>> cleared
> >>>>> by sk_psock_drop(), the packets are lost, and the tcp data is
> >>>>> abnormal.
> >>>>>
> >>>>> Signed-off-by: Wang Yufen <wangyufen@...wei.com>
> >>>>> ---
> >>>> Can you please tell us a bit more about the life-cycle of the
> >>>> socket in
> >>>> your workload? Questions that come to mind:
> >>>>
> >>>> 1) What triggers the removal of the socket from sockmap in your case?
> >>> We use sk_msg to redirect with sock hash, like this:
> >>>
> >>> skA redirect skB
> >>> Tx <-----------> skB,Rx
> >>>
> >>> And construct a scenario where the packet sending speed is high, the
> >>> packet receiving speed is slow, so the packets are stacked in the
> >>> ingress
> >>> queue on the receiving side. In this case, if run
> >>> bpf_map_delete_elem() to
> >>> delete the sockmap entry, will trigger the following procedure:
> >>>
> >>> sock_hash_delete_elem()
> >>> sock_map_unref()
> >>> sk_psock_put()
> >>> sk_psock_drop()
> >>> sk_psock_stop()
> >>> __sk_psock_zap_ingress()
> >>> __sk_psock_purge_ingress_msg()
> >>>
> >>>> 2) Would it still be a problem if removal from sockmap did not
> >>>> cause any
> >>>> packets to get dropped?
> >>> Yes, it still be a problem. If removal from sockmap did not cause any
> >>> packets to get dropped, packet receiving process switches to use TCP
> >>> protocol stack. The packets in the psock ingress queue cannot be
> >>> received
> >>>
> >>> by the user.
> >>
> >> Thanks for the context. So, if I understand correctly, you want to avoid
> >> breaking the network pipe by updating the sockmap from user-space.
> >>
> >> This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that?
> >
> > +1
> >
> > Aside from that, the patch as-is also fails BPF CI in a lot of places,
> > please
> > make sure to check selftests:
> >
> > https://github.com/kernel-patches/bpf/runs/5537367301?check_suite_focus=true
> >
> >
> > [...]
> > #145/73 sockmap_listen/sockmap IPv6 test_udp_redir:OK
> > #145/74 sockmap_listen/sockmap IPv6 test_udp_unix_redir:OK
> > #145/75 sockmap_listen/sockmap Unix test_unix_redir:OK
> > #145/76 sockmap_listen/sockmap Unix test_unix_redir:OK
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/77 sockmap_listen/sockhash IPv4 TCP test_insert_invalid:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/78 sockmap_listen/sockhash IPv4 TCP test_insert_opened:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > #145/79 sockmap_listen/sockhash IPv4 TCP test_insert_bound:FAIL
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > ./test_progs:test_ops_cleanup:1424: map_delete: expected
> > EINVAL/ENOENT: Operation not supported
> > test_ops_cleanup:FAIL:1424
> > [...]
> >
> > Thanks,
> > Daniel
> > .
>
> I'm not sure about this patch. The main purpose is to point out the
> possible problems
>
> when the socket is deleted from the map.I'm sorry for the trouble.
>
> Thanks.
If you want to delete a socket you should flush it first. To do this
stop redirecting traffic to it and then read all the data out. At
the moment its a bit tricky to know when the recieving socket is
empty though. Adding a flag on delete to only delete when the
ingress qlen == 0 might be a possibility if you need delete to
work and are trying to work out how to safely delete sockets.
Powered by blists - more mailing lists