lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSeDRPh2XEa6QnKYX-ROdBEhaQ0W-ak9z3npZKn7mQuHyA@mail.gmail.com>
Date:   Thu, 7 May 2020 17:40:24 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Kelly Littlepage <kelly@...chronos.com>
Cc:     David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        Iris Liu <iris@...chronos.com>,
        Mike Maloney <maloney@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH] net: tcp: fix rx timestamp behavior for tcp_recvmsg

On Tue, May 5, 2020 at 4:23 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Mon, May 4, 2020 at 12:30 PM Kelly Littlepage <kelly@...chronos.com> wrote:
> >
> > Timestamping cmsgs are not returned when the user buffer supplied to
> > recvmsg is too small to copy at least one skbuff in entirety.
>
> In general a tcp reader should not make any assumptions on
> packetization of the bytestream, including the number of skbs that
> might have made up the bytestream.
>
> > Support
> > for TCP rx timestamps came from commit 98aaa913b4ed ("tcp: Extend
> > SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg") which noted that the cmsg
> > should "return the timestamp corresponding to the highest sequence
> > number data returned." The commit further notes that when coalescing
> > skbs code should "maintain the invariant of returning the timestamp of
> > the last byte in the recvmsg buffer."
>
> This states that if a byte range spans multiple timestamps, only the
> last one is returned.
>
> > This is consistent with Section 1.4 of timestamping.txt, a document that
> > discusses expected behavior when timestamping streaming protocols. It's
> > worth noting that Section 1.4 alludes to a "buffer" in a way that might
> > have resulted in the current behavior:
> >
> > > The SO_TIMESTAMPING interface supports timestamping of bytes in a
> > bytestream. Each request is interpreted as a request for when the entire
> > contents of the buffer has passed a timestamping point....In practice,
> > timestamps can be correlated with segments of a bytestream consistently,
> > if both semantics of the timestamp and the timing of measurement are
> > chosen correctly....For bytestreams, we chose that a timestamp is
> > generated only when all bytes have passed a point.
> >
> > An interpretation of skbs as delineators for timestamping points makes
> > sense for tx timestamps but poses implementation challenges on the rx
> > side. Under the current API unless tcp_recvmsg happens to return bytes
> > copied from precisely one skb there's no useful mapping from bytes to
> > timestamps. Some sequences of reads will result in timestamps getting
> > lost
>
> That's a known caveat, see above. This patch does not change that.
>
> > and others will result in the user receiving a timestamp from the
> > second to last skb that tcp_recvmsg copied from instead of the last.
>
> On Tx, the idea was to associate a timestamp with the last byte in the
> send buffer, so that a timestamp for this seqno informs us of the
> upper bound on latency of all bytes in the send buffer.
>
> On Rx, we currently return the timestamp of the last skb of which the
> last byte is read, which is associated with a byte in the recv buffer,
> but it is not necessarily the last one. Nor the first. As such it is
> not clear what it defines.
>
> Your patch addresses this by instead always returning the timestamp
> associated with the last byte in the recv buffer. The same timestamp
> could then be returned again for a subsequent recv call, if the entire
> recv buffer is filled from the same skb. Which is fine.
>
> That sounds correct to me.

Due to my earlier comments the patch is no longer on patchwork. Can
you please resubmit it.

But to be clear, the code looks good to me. Please add

Fixes: 98aaa913b4ed ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg")

The commit message can perhaps be a bit shorter. They key points are

1. the stated intent of the original commit is to "return the
timestamp corresponding to the highest sequence number data returned."
2. the current implementation returns the timestamp for the last byte
of the last fully read skb, which is not necessarily the last byte in
the recv buffer.
3. that this patch converts behavior to the original definition.

Previous draft versions of the patch recorded the timestamp before
label skip_copy, which also matches this behavior.

I took a quick look at the selftests under
tools/testing/selftests/net, but they don't test for this specific
behavior. Given that test code should make no assumptions on
packetization, it is also not that straightforward to test in a robust
manner.


>
> > The
> > proposed change addresses both problems while remaining consistent with
> > 1.4 and the wording of commit 98aaa913b4ed ("tcp: Extend
> > SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg").
> >
> > Co-developed-by: Iris Liu <iris@...chronos.com>
> > Signed-off-by: Iris Liu <iris@...chronos.com>
> > Signed-off-by: Kelly Littlepage <kelly@...chronos.com>
> > ---
> >  net/ipv4/tcp.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 6d87de434377..e72bd651d21a 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -2154,13 +2154,15 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
> >                         tp->urg_data = 0;
> >                         tcp_fast_path_check(sk);
> >                 }
> > -               if (used + offset < skb->len)
> > -                       continue;
> >
> >                 if (TCP_SKB_CB(skb)->has_rxtstamp) {
> >                         tcp_update_recv_tstamps(skb, &tss);
> >                         cmsg_flags |= 2;
> >                 }
> > +
> > +               if (used + offset < skb->len)
> > +                       continue;
> > +
> >                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> >                         goto found_fin_ok;
> >                 if (!(flags & MSG_PEEK))
> > --
> > 2.26.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ