netdev - Re: [Patch bpf 2/3] net: poll psock queues too for sockmap sockets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAM_iQpXtSYUKy3JRtFG3uuL9jwBiQzjoZt2ab-VOvEaygZh-VA@mail.gmail.com>
Date:   Mon, 27 Sep 2021 12:29:01 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
        Yucong Sun <sunyucong@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf 2/3] net: poll psock queues too for sockmap sockets

On Mon, Sep 27, 2021 at 11:07 AM John Fastabend
<john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@...edance.com>
> >
> > Yucong noticed we can't poll() sockets in sockmap even
> > when they are the destination sockets of redirections.
> > This is because we never poll any psock queues in ->poll().
> > We can not overwrite ->poll() as it is in struct proto_ops,
> > not in struct proto.
> >
> > So introduce sk_msg_poll() to poll psock ingress_msg queue
> > and let sockets which support sockmap invoke it directly.
> >
> > Reported-by: Yucong Sun <sunyucong@...il.com>
> > Cc: John Fastabend <john.fastabend@...il.com>
> > Cc: Daniel Borkmann <daniel@...earbox.net>
> > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > Cc: Lorenz Bauer <lmb@...udflare.com>
> > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > ---
> >  include/linux/skmsg.h |  6 ++++++
> >  net/core/skmsg.c      | 15 +++++++++++++++
> >  net/ipv4/tcp.c        |  2 ++
> >  net/ipv4/udp.c        |  2 ++
> >  net/unix/af_unix.c    |  5 +++++
> >  5 files changed, 30 insertions(+)
> >
>
> [...]
>                                                   struct sk_buff *skb)
> >  {
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index e8b48df73c85..2eb1a87ba056 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -280,6 +280,7 @@
> >  #include <linux/uaccess.h>
> >  #include <asm/ioctls.h>
> >  #include <net/busy_poll.h>
> > +#include <linux/skmsg.h>
> >
> >  /* Track pending CMSGs. */
> >  enum {
> > @@ -563,6 +564,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
> >
> >               if (tcp_stream_is_readable(sk, target))
> >                       mask |= EPOLLIN | EPOLLRDNORM;
> > +             mask |= sk_msg_poll(sk);
> >
> >               if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
> >                       if (__sk_stream_is_writeable(sk, 1)) {
>
>
> For TCP we implement the stream_memory_read() hook which we implement in
> tcp_bpf.c with tcp_bpf_stream_read. This just checks psock->ingress_msg
> list which should cover any redirect from skmsg into the ingress side
> of another socket.
>
> And the tcp_poll logic is using tcp_stream_is_readable() which is
> checking for sk->sk_prot->stream_memory_read() and then calling it.

Ah, I missed it. It is better to have such a hook in struct proto,
since we just can overwrite it with bpf hooks. Let me rename it
for non-TCP and implement it for UDP and AF_UNIX too.

>
> The straight receive path, e.g. not redirected from a sender should
> be covered by the normal tcp_epollin_ready() checks because this
> would be after TCP does the normal updates to rcv_nxt, copied_seq,
> etc.

Yes.

>
> So above is not in the TCP case by my reading. Did I miss a
> case? We also have done tests with Envoy which I thought were polling
> so I'll check on that as well.

Right, all of these selftests in patch 3/3 are non-TCP.

Thanks.