netdev - Re: [PATCH net-next v2 06/12] net-timestamp: introduce TS_SCHED_OPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBU1FSVvhiTmapX=Byhv8W0T1f8oFR4sAZ1g4xONgSPrg@mail.gmail.com>
Date: Wed, 16 Oct 2024 14:08:22 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, dsahern@...nel.org, willemdebruijn.kernel@...il.com, 
	willemb@...gle.com, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, 
	eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev, 
	john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me, 
	haoluo@...gle.com, jolsa@...nel.org, bpf@...r.kernel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 06/12] net-timestamp: introduce
 TS_SCHED_OPT_CB to generate dev xmit timestamp

On Wed, Oct 16, 2024 at 1:35 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 10/15/24 6:24 PM, Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 9:01 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>
> >> On 10/11/24 9:06 PM, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@...cent.com>
> >>>
> >>> Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to
> >>> print timestamps when the skb just passes the dev layer.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@...cent.com>
> >>> ---
> >>>    include/uapi/linux/bpf.h       |  5 +++++
> >>>    net/core/skbuff.c              | 17 +++++++++++++++--
> >>>    tools/include/uapi/linux/bpf.h |  5 +++++
> >>>    3 files changed, 25 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 157e139ed6fc..3cf3c9c896c7 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -7019,6 +7019,11 @@ enum {
> >>>                                         * by the kernel or the
> >>>                                         * earlier bpf-progs.
> >>>                                         */
> >>> +     BPF_SOCK_OPS_TS_SCHED_OPT_CB,   /* Called when skb is passing through
> >>> +                                      * dev layer when SO_TIMESTAMPING
> >>> +                                      * feature is on. It indicates the
> >>> +                                      * recorded timestamp.
> >>> +                                      */
> >>>    };
> >>>
> >>>    /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>> index 3a4110d0f983..16e7bdc1eacb 100644
> >>> --- a/net/core/skbuff.c
> >>> +++ b/net/core/skbuff.c
> >>> @@ -5632,8 +5632,21 @@ static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
> >>>                return;
> >>>
> >>>        tp = tcp_sk(sk);
> >>> -     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> >>> -             return;
> >>> +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
> >>> +             struct timespec64 tstamp;
> >>> +             u32 cb_flag;
> >>> +
> >>> +             switch (tstype) {
> >>> +             case SCM_TSTAMP_SCHED:
> >>> +                     cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> >>> +                     break;
> >>> +             default:
> >>> +                     return;
> >>> +             }
> >>> +
> >>> +             tstamp = ktime_to_timespec64(ktime_get_real());
> >>> +             tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
> >>
> >> There is bpf_ktime_get_*() helper. The bpf prog can directly call the
> >> bpf_ktime_get_* helper and use whatever clock it sees fit instead of enforcing
> >> real clock here and doing an extra ktime_to_timespec64. Right now the
> >> bpf_ktime_get_*() does not have real clock which I think it can be added.
> >
> > In this way, there is no need to add tcp_call_bpf_*arg() to pass
> > timestamp to userspace, right? Let the bpf program implement it.
> >
> > Now I wonder what information I should pass? Sorry for the lack of BPF
> > related knowledge :(
>
> Just pass the cb_flag op in this case.

I see. I saw one example just passing a NULL parameter:
tcp_call_bpf(sk, BPF_SOCK_OPS_BASE_RTT, 0, NULL);.

>
> A bpf selftest is missing in this series to show how it is going to be used.

Sorry, I didn't implement a standard selftest, but I wrote a full BPF
program in patch[0/12]. I planned to write a selftests after every
expert agrees the current approach.

> Yes, there are existing socket API tests on time stamping but I believe this
> discussion has already shown some subtle differences that warrant a closer to
> real world bpf prog example first.
>
> >
> >>
> >> I think overall the tstamp reporting interface does not necessarily have to
> >> follow the socket API. The bpf prog is running in the kernel. It could pass
> >> other information to the bpf prog if it sees fit. e.g. the bpf prog could also
> >> get the original transmitted tcp skb if it is useful.
> >
> > Good to know that! But how the BPF program parses the skb by using
> > tcp_call_bpf_2arg() which only passes u32 parameters.
>
> "struct skbuff *skb" has already been added to "struct bpf_sock_ops_kern". It is
> only assigned during the "BPF_SOCK_OPS_PARSE_*HDR_CB". It is not exposed
> directly to bpf prog but it could be. However, it may need to change some
> convert_ctx code in filter.c which I am not excited about. We haven't added
> convert_ctx changes for a while since it is the old way.
>
> Together with the "u32  bpf_sock_ops_cb_flags;" change in patch 9 which is only
> for tcp_sock and other _CB flags are also tcp specific only. For now, I am not

Right, the first move I made is to make TCP work.

> sure carrying this sockops to the future UDP support is desired.

I hope so. But it's not an urgent thing that needs to be done recently.

>
> Take a look at tcp_call_bpf(). It needs to initialize the whole "struct
> bpf_sock_ops_kern" regardless of what the bpf prog is needed before calling the
> bpf prog. The "u32 args[4]" is one of them. The is the older way of using bpf to
> extend kernel.

I see.

>
> bpf has struct_ops support now which can pass only what is needed and without
> the need of doing the convert_ctx in filter.c. The "struct tcp_congestion_ops"
> can already be implemented in bpf. Take a look at
> selftests/bpf/progs/bpf_cubic.c. All the BPF_SOCK_OPS_*_CB (e.g.
> BPF_SOCK_OPS_TS_SCHED_OPT_CB here) could just a "ops" in the struct_ops.

Interesting, but it seems this way is much more complex than the
current approach.

>
> That said, I think the first thing needs to figure out is how to enable bpf time
> stamping without having side effect on the user space.

In the next version, I will avoid affecting the cmsg case, so no more
side effects I think.

> Continue the sockops approach first

I'm a little hesitant to do so because it looks like we will introduce
more codes. Please let me investigate more :)

> and use it to create a selftest bpf prog example. Then we can decide.

I copy the BPF program from patch [0/12], please take a look and help
me review this:
---
Here is the test output:
1) receive path
iperf3-987305  [008] ...11 179955.200990: bpf_trace_printk: rx: port:
5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0
2) xmit path
iperf3-19765   [013] ...11  2021.329602: bpf_trace_printk: tx: port:
47528:5201, key: 1036, timestamp: 1728357067,436678584
iperf3-19765   [013] b..11  2021.329611: bpf_trace_printk: tx: port:
47528:5201, key: 1036, timestamp: 1728357067,436689976
iperf3-19765   [013] ...11  2021.329622: bpf_trace_printk: tx: port:
47528:5201, key: 1036, timestamp: 1728357067,436700739

Here is the full bpf program:
#include <linux/bpf.h>

#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include <uapi/linux/net_tstamp.h>

int _version SEC("version") = 1;
char _license[] SEC("license") = "GPL";

# define SO_TIMESTAMPING         37

__section("sockops")
int set_initial_rto(struct bpf_sock_ops *skops)
{
        int op = (int) skops->op;
        u32 sport = 0, dport = 0;
        int flags;

        switch (op) {
        //case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
        case BPF_SOCK_OPS_TCP_CONNECT_CB:
        case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
                flags = SOF_TIMESTAMPING_RX_SOFTWARE |
                        SOF_TIMESTAMPING_TX_SCHED |
SOF_TIMESTAMPING_TX_ACK | SOF_TIMESTAMPING_TX_SOFTWARE |
                        SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP;
                bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING,
&flags, sizeof(flags));
                bpf_sock_ops_cb_flags_set(skops,
BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG|BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG);
                break;
        case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
        case BPF_SOCK_OPS_TS_SW_OPT_CB:
        case BPF_SOCK_OPS_TS_ACK_OPT_CB:
                dport = bpf_ntohl(skops->remote_port);
                sport = skops->local_port;
                bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n",
                            sport, dport, skops->args[0],
skops->args[1], skops->args[2]);
                break;
        case BPF_SOCK_OPS_TS_RX_OPT_CB:
                dport = bpf_ntohl(skops->remote_port);
                sport = skops->local_port;
                bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u,
hwtimestamp: %u,%u\n",
                           sport, dport, skops->args[0],
skops->args[1], skops->args[2], skops->args[3]);
                break;
        }
        return 1;
}
---

What is your opinion on the above?

Thanks,
Jason