[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67651781a0ec7_1f295208c8@john.notmuch>
Date: Thu, 19 Dec 2024 23:06:41 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Jiayuan Chen <mrpre@....com>,
Jakub Sitnicki <jakub@...udflare.com>
Cc: bpf@...r.kernel.org,
martin.lau@...ux.dev,
ast@...nel.org,
edumazet@...gle.com,
davem@...emloft.net,
dsahern@...nel.org,
kuba@...nel.org,
pabeni@...hat.com,
linux-kernel@...r.kernel.org,
song@...nel.org,
john.fastabend@...il.com,
andrii@...nel.org,
mhal@...x.co,
yonghong.song@...ux.dev,
daniel@...earbox.net,
xiyou.wangcong@...il.com,
horms@...nel.org
Subject: Re: [PATCH bpf v3 1/2] bpf: fix wrong copied_seq calculation
Jiayuan Chen wrote:
> On Wed, Dec 18, 2024 at 04:35:53PM +0800, Jakub Sitnicki wrote:
> [...]
> > On Wed, Dec 18, 2024 at 01:34 PM +08, Jiayuan Chen wrote:
> > > + if (tcp_flags & TCPHDR_FIN)
> > > + break;
> > > + }
> > > +
> > > + WRITE_ONCE(psock->strp_offset, offset);
> > > + return copied;
> > > +}
> > > +
> > > enum {
> > > TCP_BPF_IPV4,
> > > TCP_BPF_IPV6,
> >
> > [...]
> >
> > To reiterate my earlier question / suggestion [1] - it would be great if
> > we can avoid duplicating what tcp_read_skb / tcp_read_sock already do.
> >
> > Keeping extra state in sk_psock / strparser seems to be the key. I think
> > you should be able to switch strp_data_ready / str_read_sock to
> > ->read_skb and make an adapter around strp_recv.
> >
> > Rough code below is what I have in mind. Not tested, compiled
> > only. Don't expect it to work. And I haven't even looked how to address
> > the kTLS path. But you get the idea.
> >
> > [1] https://msgid.link/87o71bx1l4.fsf@cloudflare.com
> >
> > ---8<---
> >
> > diff --git a/include/net/strparser.h b/include/net/strparser.h
> > index 41e2ce9e9e10..0dd48c1bc23b 100644
> > --- a/include/net/strparser.h
> > +++ b/include/net/strparser.h
> > @@ -95,9 +95,14 @@ struct strparser {
> > u32 interrupted : 1;
> > u32 unrecov_intr : 1;
> >
> > + unsigned int need_bytes;
> > +
> > struct sk_buff **skb_nextp;
> > struct sk_buff *skb_head;
> > - unsigned int need_bytes;
> > +
> > + int rcv_err;
> > + unsigned int rcv_off;
> > +
> > struct delayed_work msg_timer_work;
> > struct work_struct work;
> > struct strp_stats stats;
> > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > index 8299ceb3e373..8a08996429d3 100644
> > --- a/net/strparser/strparser.c
> > +++ b/net/strparser/strparser.c
> > @@ -18,6 +18,7 @@
> > #include <linux/poll.h>
> > #include <linux/rculist.h>
> > #include <linux/skbuff.h>
> > +#include <linux/skmsg.h>
> > #include <linux/socket.h>
> > #include <linux/uaccess.h>
> > #include <linux/workqueue.h>
> > @@ -327,13 +328,39 @@ int strp_process(struct strparser *strp, struct sk_buff *orig_skb,
> > }
> > EXPORT_SYMBOL_GPL(strp_process);
> >
> > -static int strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
> > - unsigned int orig_offset, size_t orig_len)
> > +static int strp_read_skb(struct sock *sk, struct sk_buff *skb)
> > {
> > - struct strparser *strp = (struct strparser *)desc->arg.data;
> > -
> > - return __strp_recv(desc, orig_skb, orig_offset, orig_len,
> > - strp->sk->sk_rcvbuf, strp->sk->sk_rcvtimeo);
> > + struct sk_psock *psock = sk_psock_get(sk);
> > + struct strparser *strp = &psock->strp;
> > + read_descriptor_t desc = {
> > + .arg.data = strp,
> > + .count = 1,
> > + .error = 0,
> > + };
> > + unsigned int off;
> > + size_t len;
> > + int used;
> > +
> > + off = strp->rcv_off;
> > + len = skb->len - off;
> > + used = __strp_recv(&desc, skb, off, len,
> > + sk->sk_rcvbuf, sk->sk_rcvtimeo);
I guess the main complication here is read_skb has already unlinked
the skb so we would lose the skb entirely in some cases here? Easy
example would be ENOMEM on __strp_recv clone.
OTOH you could likely optimize __strp_recv a fair amount for the
good case (if it happens to be true in your case) where all data
is in the skb normally and skip the clone or something. Although
not clear to me how common case that is.
> > + /* skb not consumed */
> > + if (used <= 0) {
> > + strp->rcv_err = used;
> > + return used;
> > + }
> > + /* skb partially consumed */
> > + if (used < len) {
> > + strp->rcv_err = 0;
> > + strp->rcv_off += used;
> > + return -EPIPE; /* stop reading */
> > + }
> > + /* skb fully consumed */
> > + strp->rcv_err = 0;
> > + strp->rcv_off = 0;
> > + tcp_eat_recv_skb(sk, skb);
> > + return used;
> > }
> >
> > static int default_read_sock_done(struct strparser *strp, int err)
> > @@ -345,21 +372,14 @@ static int default_read_sock_done(struct strparser *strp, int err)
> > static int strp_read_sock(struct strparser *strp)
> > {
> > struct socket *sock = strp->sk->sk_socket;
> > - read_descriptor_t desc;
> >
> > - if (unlikely(!sock || !sock->ops || !sock->ops->read_sock))
> > + if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
> > return -EBUSY;
> >
> > - desc.arg.data = strp;
> > - desc.error = 0;
> > - desc.count = 1; /* give more than one skb per call */
> > -
> > /* sk should be locked here, so okay to do read_sock */
> > - sock->ops->read_sock(strp->sk, &desc, strp_recv);
> > -
> > - desc.error = strp->cb.read_sock_done(strp, desc.error);
> > + sock->ops->read_skb(strp->sk, strp_read_skb);
> >
> > - return desc.error;
> > + return strp->cb.read_sock_done(strp, strp->rcv_err);
> > }
> >
> > /* Lower sock lock held */
>
> Thanks Jakub Sitnicki.
>
> I understand your point about using tcp_read_skb to replace
> tcp_read_sock, avoiding code duplication and reducing the number of
> interfaces.
>
> Currently, not all modules using strparser have issues with
> copied_seq miscalculation. The issue exists mainly with
> bpf::sockmap + strparser because bpf::sockmap implements a
> proprietary read interface for user-land: tcp_bpf_recvmsg_parser().
>
> Both this and strp_recv->tcp_read_sock update copied_seq, leading
> to errors.
>
> This is why I rewrote the tcp_read_sock() interface specifically for
> bpf::sockmap.
>
> So far, I found two other modules that use the standard strparser module:
>
> 1.kcmsock.c
> 2.espintcp.c (ESP over TCP implementation)
> (Interesting, these two don't have self-tests)
>
> Take kcm as an example: its custom read interface kcm_recvmsg()
> does not conflict with copied_seq updates in tcp_read_sock().
>
> Therefore, for kcmsock, updating copied_seq in tcp_read_sock is
> necessary and aligns with the read semantics. espintcp is similar.
>
> In summary, different modules using strp_recv have different needs
> for copied_seq. I still insist on implementing tcp_bpf_read_sock()
> specifically for bpf::sockmap without affecting others.
>
> Otherwise, we may need tcp_read_skb() to determine whether
> to update copied_seq according to the different needs of each module.
>
>
> Additionally,
> I've found that KTLS has its own read_sock() and
> a strparser-like implementation (in tls_strp.c), separate from the
> standard strparser module. Therefore, even with your proposed
> solution, KTLS may be not affected.
>
> regards
>
>
Powered by blists - more mailing lists