[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpXVYcrhZ083uFdNqMSuAqq-qPQgp+Hx1KUYaquZmSz1Zw@mail.gmail.com>
Date: Mon, 29 Mar 2021 23:36:31 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, duanxiongchun@...edance.com,
Dongdong Wang <wangdongdong.6@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Cong Wang <cong.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf-next v7 09/13] udp: implement ->read_sock() for sockmap
On Mon, Mar 29, 2021 at 11:23 PM John Fastabend
<john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Mon, Mar 29, 2021 at 1:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > From: Cong Wang <cong.wang@...edance.com>
> > > >
> > > > This is similar to tcp_read_sock(), except we do not need
> > > > to worry about connections, we just need to retrieve skb
> > > > from UDP receive queue.
> > > >
> > > > Note, the return value of ->read_sock() is unused in
> > > > sk_psock_verdict_data_ready().
> > > >
> > > > Cc: John Fastabend <john.fastabend@...il.com>
> > > > Cc: Daniel Borkmann <daniel@...earbox.net>
> > > > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > > > Cc: Lorenz Bauer <lmb@...udflare.com>
> > > > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > > > ---
>
> [...]
>
> > > > }
> > > > EXPORT_SYMBOL(__skb_recv_udp);
> > > >
> > > > +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > > > + sk_read_actor_t recv_actor)
> > > > +{
> > > > + int copied = 0;
> > > > +
> > > > + while (1) {
> > > > + int offset = 0, err;
> > >
> > > Should this be
> > >
> > > int offset = sk_peek_offset()?
> >
> > What are you really suggesting? sk_peek_offset() is just 0 unless
> > we have MSG_PEEK here and we don't, because we really want to
> > dequeue the skb rather than peeking it.
> >
> > Are you suggesting we should do peeking? I am afraid we can't.
> > Please be specific, guessing your mind is not an effective way to
> > address your reviews.
>
> I was only asking for further details because the offset addition
> below struck me as odd.
>
> >
> > >
> > > MSG_PEEK should work from recv side, at least it does on TCP side. If
> > > its handled in some following patch a comment would be nice. I was
> > > just reading udp_recvmsg() so maybe its not needed.
> >
> > Please explain why do we need peeking in sockmap? At very least
> > it has nothing to do with my patchset.
>
> We need MSG_PEEK to work from application side. From sockmap
> side I agree its not needed.
How does the application reach udp_read_sock()? UDP does not support
splice() as I already mentioned, as ->splice_read() is still missing.
>
> >
> > I do not know why you want to use TCP as a "standard" here, TCP
> > also supports splice(), UDP still doesn't even with ->read_sock().
> > Of course they are very different.
>
> Not claiming any "standard" here only that user application needs
> to work correctly if it passes MSG_PEEK.
I do not see how an application could pass any msg flag to
udp_read_sock().
>
> >
> > >
> > > > + struct sk_buff *skb;
> > > > +
> > > > + skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> > > > + if (!skb)
> > > > + return err;
> > > > + if (offset < skb->len) {
> > > > + size_t len;
> > > > + int used;
> > > > +
> > > > + len = skb->len - offset;
> > > > + used = recv_actor(desc, skb, offset, len);
> > > > + if (used <= 0) {
> > > > + if (!copied)
> > > > + copied = used;
> > > > + break;
> > > > + } else if (used <= len) {
> > > > + copied += used;
> > > > + offset += used;
> > >
> > > The while loop is going to zero this? What are we trying to do
> > > here with offset?
> >
> > offset only matters for MSG_PEEK and we do not support peeking
> > in sockmap case, hence it is unnecessary here. I "use" it here just
> > to make the code as complete as possible.
>
> huh? If its not used the addition is just confusing. Can we drop it?
If you mean dropping this single line of code, yes. If you mean
dropping 'offset' completely, no, as both __skb_recv_udp() and
recv_actor() still need it. If you mean I should re-write
__skb_recv_udp() and recv_actor() just to drop 'offset', I am afraid
that is too much with too little gain.
>
> >
> > To further answer your question, it is set to 0 when we return a
> > valid skb on line 201 inside __skb_try_recv_from_queue(), as
> > "_off" is set to 0 and won't change unless we have MSG_PEEK.
> >
> > 173 bool peek_at_off = false;
> > 174 struct sk_buff *skb;
> > 175 int _off = 0;
> > 176
> > 177 if (unlikely(flags & MSG_PEEK && *off >= 0)) {
> > 178 peek_at_off = true;
> > 179 _off = *off;
> > 180 }
> > 181
> > 182 *last = queue->prev;
> > 183 skb_queue_walk(queue, skb) {
> > 184 if (flags & MSG_PEEK) {
> > 185 if (peek_at_off && _off >= skb->len &&
> > 186 (_off || skb->peeked)) {
> > 187 _off -= skb->len;
> > 188 continue;
> > 189 }
> > 190 if (!skb->len) {
> > 191 skb = skb_set_peeked(skb);
> > 192 if (IS_ERR(skb)) {
> > 193 *err = PTR_ERR(skb);
> > 194 return NULL;
> > 195 }
> > 196 }
> > 197 refcount_inc(&skb->users);
> > 198 } else {
> > 199 __skb_unlink(skb, queue);
> > 200 }
> > 201 *off = _off;
> > 202 return skb;
> >
> > Of course, when we return NULL, we return immediately without
> > using offset:
> >
> > 1794 skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> > 1795 if (!skb)
> > 1796 return err;
> >
> > This should not be hard to figure out. Hope it is clear now.
> >
>
> Yes, but tracking offset only to clear it a couple lines later
> is confusing.
Yeah, but that's __skb_recv_udp()'s fault, not mine. We can refactor
__skb_recv_udp() a bit for !MSG_PEEK case, but I do not see
much gain here.
Thanks.
Powered by blists - more mailing lists