[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCY=03zt3RBJP0G8+8hC1oC+o2h67MdqrmEkjM3-y-gZQ@mail.gmail.com>
Date: Thu, 6 Feb 2025 10:14:50 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org,
willemb@...gle.com, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, horms@...nel.org, bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v8 12/12] selftests/bpf: add simple bpf tests in
the tx path for timestamping feature
On Thu, Feb 6, 2025 at 9:28 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 2/5/25 8:08 AM, Jason Xing wrote:
> >>> + switch (skops->op) {
> >>> + case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> >>> + delay = val->sched_delay = timestamp - val->sendmsg_ns;
> >>> + break;
> >>
> >> For a test this is fine. But just a reminder that in general a packet
> >> may pass through multiple qdiscs. For instance with bonding or tunnel
> >> virtual devices in the egress path.
> >
> > Right, I've seen this in production (two times qdisc timestamps
> > because of bonding).
> >
> >>
> >>> + case BPF_SOCK_OPS_TS_SW_OPT_CB:
> >>> + prior_ts = val->sched_delay + val->sendmsg_ns;
> >>> + delay = val->sw_snd_delay = timestamp - prior_ts;
> >>> + break;
> >>> + case BPF_SOCK_OPS_TS_ACK_OPT_CB:
> >>> + prior_ts = val->sw_snd_delay + val->sched_delay + val->sendmsg_ns;
> >>> + delay = val->ack_delay = timestamp - prior_ts;
> >>> + break;
> >>
> >> Similar to the above: fine for a test, but in practice be aware that
> >> packets may be resent, in which case an ACK might precede a repeat
> >> SCHED and SND. And erroneous or malicious peers may also just never
> >> send an ACK. So this can never be relied on in production settings,
> >> e.g., as the only signal to clear an entry from a map (as done in the
> >> branch below).
>
> All good points. I think all these notes should be added as comment to the test.
Got it, I will add them in the commit message.
> I think as a test, this will be a good start and can use some followup to
> address the cases.
Good idea.
>
> >
> > Agreed. In production, actually what we do is print all the timestamps
> > and let an agent parse them.
>
> The BPF program that runs in the kernel can provide its own user interface that
> best fits its environment. If a raw printing interface is sufficient, that works
> well and is simple on the BPF program side. If the production system cannot
> afford the raw printing cost, the bpf prog can perform some aggregation first.
>
> The BPF program should be able to detect when an outgoing skb is re-transmitted
> and act accordingly. There is BPF timer to retire entries for which no ACK has
> been received.
Oh, first time to know the BPF timer.
>
> Potentially, this data can be aggregated into the individual bpf_sk_storage or
> using a BPF map keyed by a particular IP address prefix.
>
> I just want to highlight here for people in the future referencing this thread
> to look for implementation ideas.
Thanks, I think they are useful! I will copy more description in the
commit message.
Thanks,
Jason
Powered by blists - more mailing lists