netdev - Re: [Patch bpf-next v3 3/9] udp: implement ->sendmsg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWbcrBCguHXh0NhyOrCfP3N2x7LzM=pYqKHT6=NCN_JAw@mail.gmail.com>
Date:   Sat, 6 Mar 2021 10:34:14 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, duanxiongchun@...edance.com,
        Dongdong Wang <wangdongdong.6@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Cong Wang <cong.wang@...edance.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf-next v3 3/9] udp: implement ->sendmsg_locked()

On Fri, Mar 5, 2021 at 5:21 PM John Fastabend <john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@...edance.com>
> >
> > UDP already has udp_sendmsg() which takes lock_sock() inside.
> > We have to build ->sendmsg_locked() on top of it, by adding
> > a new parameter for whether the sock has been locked.
> >
> > Cc: John Fastabend <john.fastabend@...il.com>
> > Cc: Daniel Borkmann <daniel@...earbox.net>
> > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > Cc: Lorenz Bauer <lmb@...udflare.com>
> > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > ---
> >  include/net/udp.h  |  1 +
> >  net/ipv4/af_inet.c |  1 +
> >  net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
> >  3 files changed, 25 insertions(+), 7 deletions(-)
>
> [...]
>
> > -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> > +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
> >  {
>
> The lock_sock is also taken by BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK() in
> udp_sendmsg(),
>
>  if (cgroup_bpf_enabled(BPF_CGROUP_UDP4_SENDMSG) && !connected) {
>     err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
>                                     (struct sockaddr *)usin, &ipc.addr);
>
> so that will also need to be handled.

Indeed, good catch!

>
> It also looks like sk_dst_set() wants the sock lock to be held, but I'm not
> seeing how its covered in the current code,
>
>  static inline void
>  __sk_dst_set(struct sock *sk, struct dst_entry *dst)
>  {
>         struct dst_entry *old_dst;
>
>         sk_tx_queue_clear(sk);
>         sk->sk_dst_pending_confirm = 0;
>         old_dst = rcu_dereference_protected(sk->sk_dst_cache,
>                                             lockdep_sock_is_held(sk));
>         rcu_assign_pointer(sk->sk_dst_cache, dst);
>         dst_release(old_dst);
>  }

I do not see how __sk_dst_set() is called in udp_sendmsg().

>
> I guess this could trip lockdep now, I'll dig a bit more Monday and see
> if its actually the case.
>
> In general I don't really like code that wraps locks in 'if' branches
> like this. It seem fragile to me. I didn't walk every path in the code

I do not like it either, actually I spent quite some time trying to
get rid of this lock_sock, it is definitely not easy. The comment in
sk_psock_backlog() is clearly wrong, we do not lock_sock to keep
sk_socket, we lock it to protect other structures like
ingress_{skb,msg}.

> to see if a lock is taken in any of the called functions but it looks
> like ip_send_skb() can call into netfilter code and may try to take
> the sock lock.

Are you saying skb_send_sock_locked() is buggy? If so, clearly not
my fault.

>
> Do we need this locked send at all? We use it in sk_psock_backlog
> but that routine needs an optimization rewrite for TCP anyways.
> Its dropping a lot of performance on the floor for no good reason.

At least for ingress_msg. It is not as easy as adding a queue lock here,
because we probably want to retrieve atomically with the receive queue
together.

Thanks.