lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-Jxq2nZ2X0C3cRLUnszddwdGMS+nhCPs2uWjGqc=Amd7g@mail.gmail.com>
Date: Fri, 14 Jun 2024 14:02:53 +0200
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Philo Lu <lulie@...ux.alibaba.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, 
	Mike Maloney <maloney@...gle.com>, Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, 
	rostedt@...dmis.org, mhiramat@...nel.org, mathieu.desnoyers@...icios.com, 
	davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org, 
	xuanzhuo@...ux.alibaba.com, dust.li@...ux.alibaba.com, 
	Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH net-next] tcp: Add tracepoint for rxtstamp coalescing

> >> On Tue, 2024-06-11 at 12:58 +0800, Philo Lu wrote:
> >>> During tcp coalescence, rx timestamps of the former skb ("to" in
> >>> tcp_try_coalesce), will be lost. This may lead to inaccurate
> >>> timestamping results if skbs come out of order.
> >>>
> >>> Here is an example.
> >>> Assume a message consists of 3 skbs, namely A, B, and C. And these skbs
> >>> are processed by tcp in the following order:
> >>> A -(1us)-> C -(1ms)-> B
> >>
> >> IMHO the above order makes the changelog confusing
> >>
> >>> If C is coalesced to B, the final rx timestamps of the message will be
> >>> those of C. That is, the timestamps show that we received the message
> >>> when C came (including hardware and software). However, we actually
> >>> received it 1ms later (when B came).
> >>>
> >>> With the added tracepoint, we can recognize such cases and report them
> >>> if we want.
> >>
> >> We really need very good reasons to add new tracepoints to TCP. I'm
> >> unsure if the above example match such requirement. The reported
> >> timestamp actually matches the first byte in the aggregate segment,
> >> inferring anything more is IMHO stretching too far the API semantic.
> >>
> >
> > Note the current behavior was a conscious choice, see
> > commit 98aaa913b4ed2503244 ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE
> > to TCP recvmsg")
> > for the rationale.
> >
>
> IIUC, the behavior of returning the timestamp of the skb with highest
> sequence number works well without disorder. But once disorder occurs,
> tcp coalescence can cause this issue.
>
> > Perhaps another application would need to add a new timestamp to report
> > both the oldest and newest timestamps.
>
> I prefer this way, we do need both oldest and newest timestamps of a
> message to find if any packet is unexpected delayed after sending.
> But given there can be both hardware and software timestamps, we may
> need more fields in sk_buff to carry these new timestamps.

Unfortunately returning multiple timestamps in tcp_recv_timestamp
requires a new extended struct scm_timestamping, and likely an extra
field to store both after coalescing.

FWIW, I maintain a patch that also changes semantics, by returning not
the timestamp associated with the last byte in the message (which is
the current defined behavior), but the first byte that makes the
socket readable. Usually just the first byte, unless SO_RCVLOWAT is
set.

It is definitely easier to define a flag like SOF_TIMESTAMPING_POLLIN
that changes behavior of the one timestamp returned, than to return
two timestamps.


> >
> > Or add a socket flag to prevent coalescing for applications needing
> > precise timestamps.
> >
> > Willem might know better about this.
> >
> > I agree the tracepoint seems not needed. What about solving the issue instead ?
> Thanks.

A tracepoint is also not needed as a bpftrace program with kfunc on
tcp_try_coalesce should be able to access this information already
without kernel modifications. Or if it has to be at this line, a
program with kprobe at offset, but that requires manual register
reading.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ