lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpXVYcrhZ083uFdNqMSuAqq-qPQgp+Hx1KUYaquZmSz1Zw@mail.gmail.com>
Date:   Mon, 29 Mar 2021 23:36:31 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, duanxiongchun@...edance.com,
        Dongdong Wang <wangdongdong.6@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Cong Wang <cong.wang@...edance.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf-next v7 09/13] udp: implement ->read_sock() for sockmap

On Mon, Mar 29, 2021 at 11:23 PM John Fastabend
<john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Mon, Mar 29, 2021 at 1:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > From: Cong Wang <cong.wang@...edance.com>
> > > >
> > > > This is similar to tcp_read_sock(), except we do not need
> > > > to worry about connections, we just need to retrieve skb
> > > > from UDP receive queue.
> > > >
> > > > Note, the return value of ->read_sock() is unused in
> > > > sk_psock_verdict_data_ready().
> > > >
> > > > Cc: John Fastabend <john.fastabend@...il.com>
> > > > Cc: Daniel Borkmann <daniel@...earbox.net>
> > > > Cc: Jakub Sitnicki <jakub@...udflare.com>
> > > > Cc: Lorenz Bauer <lmb@...udflare.com>
> > > > Signed-off-by: Cong Wang <cong.wang@...edance.com>
> > > > ---
>
> [...]
>
> > > >  }
> > > >  EXPORT_SYMBOL(__skb_recv_udp);
> > > >
> > > > +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > > > +               sk_read_actor_t recv_actor)
> > > > +{
> > > > +     int copied = 0;
> > > > +
> > > > +     while (1) {
> > > > +             int offset = 0, err;
> > >
> > > Should this be
> > >
> > >  int offset = sk_peek_offset()?
> >
> > What are you really suggesting? sk_peek_offset() is just 0 unless
> > we have MSG_PEEK here and we don't, because we really want to
> > dequeue the skb rather than peeking it.
> >
> > Are you suggesting we should do peeking? I am afraid we can't.
> > Please be specific, guessing your mind is not an effective way to
> > address your reviews.
>
> I was only asking for further details because the offset addition
> below struck me as odd.
>
> >
> > >
> > > MSG_PEEK should work from recv side, at least it does on TCP side. If
> > > its handled in some following patch a comment would be nice. I was
> > > just reading udp_recvmsg() so maybe its not needed.
> >
> > Please explain why do we need peeking in sockmap? At very least
> > it has nothing to do with my patchset.
>
> We need MSG_PEEK to work from application side. From sockmap
> side I agree its not needed.

How does the application reach udp_read_sock()? UDP does not support
splice() as I already mentioned, as ->splice_read() is still missing.

>
> >
> > I do not know why you want to use TCP as a "standard" here, TCP
> > also supports splice(), UDP still doesn't even with ->read_sock().
> > Of course they are very different.
>
> Not claiming any "standard" here only that user application needs
> to work correctly if it passes MSG_PEEK.

I do not see how an application could pass any msg flag to
udp_read_sock().

>
> >
> > >
> > > > +             struct sk_buff *skb;
> > > > +
> > > > +             skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> > > > +             if (!skb)
> > > > +                     return err;
> > > > +             if (offset < skb->len) {
> > > > +                     size_t len;
> > > > +                     int used;
> > > > +
> > > > +                     len = skb->len - offset;
> > > > +                     used = recv_actor(desc, skb, offset, len);
> > > > +                     if (used <= 0) {
> > > > +                             if (!copied)
> > > > +                                     copied = used;
> > > > +                             break;
> > > > +                     } else if (used <= len) {
> > > > +                             copied += used;
> > > > +                             offset += used;
> > >
> > > The while loop is going to zero this? What are we trying to do
> > > here with offset?
> >
> > offset only matters for MSG_PEEK and we do not support peeking
> > in sockmap case, hence it is unnecessary here. I "use" it here just
> > to make the code as complete as possible.
>
> huh? If its not used the addition is just confusing. Can we drop it?

If you mean dropping this single line of code, yes. If you mean
dropping 'offset' completely, no, as both __skb_recv_udp() and
recv_actor() still need it. If you mean I should re-write
__skb_recv_udp() and recv_actor() just to drop 'offset', I am afraid
that is too much with too little gain.

>
> >
> > To further answer your question, it is set to 0 when we return a
> > valid skb on line 201 inside __skb_try_recv_from_queue(), as
> > "_off" is set to 0 and won't change unless we have MSG_PEEK.
> >
> > 173         bool peek_at_off = false;
> > 174         struct sk_buff *skb;
> > 175         int _off = 0;
> > 176
> > 177         if (unlikely(flags & MSG_PEEK && *off >= 0)) {
> > 178                 peek_at_off = true;
> > 179                 _off = *off;
> > 180         }
> > 181
> > 182         *last = queue->prev;
> > 183         skb_queue_walk(queue, skb) {
> > 184                 if (flags & MSG_PEEK) {
> > 185                         if (peek_at_off && _off >= skb->len &&
> > 186                             (_off || skb->peeked)) {
> > 187                                 _off -= skb->len;
> > 188                                 continue;
> > 189                         }
> > 190                         if (!skb->len) {
> > 191                                 skb = skb_set_peeked(skb);
> > 192                                 if (IS_ERR(skb)) {
> > 193                                         *err = PTR_ERR(skb);
> > 194                                         return NULL;
> > 195                                 }
> > 196                         }
> > 197                         refcount_inc(&skb->users);
> > 198                 } else {
> > 199                         __skb_unlink(skb, queue);
> > 200                 }
> > 201                 *off = _off;
> > 202                 return skb;
> >
> > Of course, when we return NULL, we return immediately without
> > using offset:
> >
> > 1794                 skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> > 1795                 if (!skb)
> > 1796                         return err;
> >
> > This should not be hard to figure out. Hope it is clear now.
> >
>
> Yes, but tracking offset only to clear it a couple lines later
> is confusing.

Yeah, but that's __skb_recv_udp()'s fault, not mine. We can refactor
__skb_recv_udp() a bit for !MSG_PEEK case, but I do not see
much gain here.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ