netdev - Re: [PATCH net-next v2 06/12] net-timestamp: introduce TS_SCHED_OPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c7b2366-074e-48c1-a918-daf0a94c4b55@linux.dev>
Date: Tue, 15 Oct 2024 22:35:14 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, dsahern@...nel.org, willemdebruijn.kernel@...il.com,
 willemb@...gle.com, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
 eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev,
 john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
 haoluo@...gle.com, jolsa@...nel.org, bpf@...r.kernel.org,
 netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 06/12] net-timestamp: introduce
 TS_SCHED_OPT_CB to generate dev xmit timestamp

On 10/15/24 6:24 PM, Jason Xing wrote:
> On Wed, Oct 16, 2024 at 9:01 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>>
>> On 10/11/24 9:06 PM, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@...cent.com>
>>>
>>> Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to
>>> print timestamps when the skb just passes the dev layer.
>>>
>>> Signed-off-by: Jason Xing <kernelxing@...cent.com>
>>> ---
>>>    include/uapi/linux/bpf.h       |  5 +++++
>>>    net/core/skbuff.c              | 17 +++++++++++++++--
>>>    tools/include/uapi/linux/bpf.h |  5 +++++
>>>    3 files changed, 25 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 157e139ed6fc..3cf3c9c896c7 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -7019,6 +7019,11 @@ enum {
>>>                                         * by the kernel or the
>>>                                         * earlier bpf-progs.
>>>                                         */
>>> +     BPF_SOCK_OPS_TS_SCHED_OPT_CB,   /* Called when skb is passing through
>>> +                                      * dev layer when SO_TIMESTAMPING
>>> +                                      * feature is on. It indicates the
>>> +                                      * recorded timestamp.
>>> +                                      */
>>>    };
>>>
>>>    /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 3a4110d0f983..16e7bdc1eacb 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -5632,8 +5632,21 @@ static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
>>>                return;
>>>
>>>        tp = tcp_sk(sk);
>>> -     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
>>> -             return;
>>> +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
>>> +             struct timespec64 tstamp;
>>> +             u32 cb_flag;
>>> +
>>> +             switch (tstype) {
>>> +             case SCM_TSTAMP_SCHED:
>>> +                     cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
>>> +                     break;
>>> +             default:
>>> +                     return;
>>> +             }
>>> +
>>> +             tstamp = ktime_to_timespec64(ktime_get_real());
>>> +             tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
>>
>> There is bpf_ktime_get_*() helper. The bpf prog can directly call the
>> bpf_ktime_get_* helper and use whatever clock it sees fit instead of enforcing
>> real clock here and doing an extra ktime_to_timespec64. Right now the
>> bpf_ktime_get_*() does not have real clock which I think it can be added.
> 
> In this way, there is no need to add tcp_call_bpf_*arg() to pass
> timestamp to userspace, right? Let the bpf program implement it.
> 
> Now I wonder what information I should pass? Sorry for the lack of BPF
> related knowledge :(

Just pass the cb_flag op in this case.

A bpf selftest is missing in this series to show how it is going to be used. 
Yes, there are existing socket API tests on time stamping but I believe this 
discussion has already shown some subtle differences that warrant a closer to 
real world bpf prog example first.

> 
>>
>> I think overall the tstamp reporting interface does not necessarily have to
>> follow the socket API. The bpf prog is running in the kernel. It could pass
>> other information to the bpf prog if it sees fit. e.g. the bpf prog could also
>> get the original transmitted tcp skb if it is useful.
> 
> Good to know that! But how the BPF program parses the skb by using
> tcp_call_bpf_2arg() which only passes u32 parameters.

"struct skbuff *skb" has already been added to "struct bpf_sock_ops_kern". It is 
only assigned during the "BPF_SOCK_OPS_PARSE_*HDR_CB". It is not exposed 
directly to bpf prog but it could be. However, it may need to change some 
convert_ctx code in filter.c which I am not excited about. We haven't added 
convert_ctx changes for a while since it is the old way.

Together with the "u32	bpf_sock_ops_cb_flags;" change in patch 9 which is only 
for tcp_sock and other _CB flags are also tcp specific only. For now, I am not 
sure carrying this sockops to the future UDP support is desired.

Take a look at tcp_call_bpf(). It needs to initialize the whole "struct 
bpf_sock_ops_kern" regardless of what the bpf prog is needed before calling the 
bpf prog. The "u32 args[4]" is one of them. The is the older way of using bpf to 
extend kernel.

bpf has struct_ops support now which can pass only what is needed and without 
the need of doing the convert_ctx in filter.c. The "struct tcp_congestion_ops" 
can already be implemented in bpf. Take a look at 
selftests/bpf/progs/bpf_cubic.c. All the BPF_SOCK_OPS_*_CB (e.g. 
BPF_SOCK_OPS_TS_SCHED_OPT_CB here) could just a "ops" in the struct_ops.

That said, I think the first thing needs to figure out is how to enable bpf time 
stamping without having side effect on the user space. Continue the sockops 
approach first and use it to create a selftest bpf prog example. Then we can decide.