lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e398770a-1ab5-478b-820d-16c6060e0008@linux.alibaba.com>
Date: Fri, 25 Oct 2024 19:05:08 +0800
From: "D. Wythe" <alibuda@...ux.alibaba.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: kgraul@...ux.ibm.com, wenjia@...ux.ibm.com, jaka@...ux.ibm.com,
 ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, pabeni@...hat.com,
 song@...nel.org, sdf@...gle.com, haoluo@...gle.com, yhs@...com,
 edumazet@...gle.com, john.fastabend@...il.com, kpsingh@...nel.org,
 jolsa@...nel.org, guwen@...ux.alibaba.com, kuba@...nel.org,
 davem@...emloft.net, netdev@...r.kernel.org, linux-s390@...r.kernel.org,
 linux-rdma@...r.kernel.org, bpf@...r.kernel.org, dtcccc@...ux.alibaba.com
Subject: Re: [PATCH net-next 3/4] net/smc: Introduce smc_bpf_ops



On 10/25/24 8:26 AM, Martin KaFai Lau wrote:
> On 10/23/24 7:42 PM, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@...ux.alibaba.com>
>>
>> The introduction of IPPROTO_SMC enables eBPF programs to determine
>> whether to use SMC based on the context of socket creation, such as
>> network namespaces, PID and comm name, etc.
>>
>> As a subsequent enhancement, this patch introduces a new hook for eBPF
>> programs that allows decisions on whether to use SMC or not at runtime,
>> including but not limited to local/remote IP address or ports. In
>> simpler words, this feature allows modifications to syn_smc through eBPF
>> programs before the TCP three-way handshake got established.
>>
>> Signed-off-by: D. Wythe <alibuda@...ux.alibaba.com>
>> ---
>>   include/linux/tcp.h   |   2 +-
>>   include/net/smc.h     |  47 +++++++++++
>>   include/net/tcp.h     |   6 ++
>>   net/ipv4/tcp_input.c  |   3 +-
>>   net/ipv4/tcp_output.c |  14 +++-
>>   net/smc/Kconfig       |  12 +++
>>   net/smc/Makefile      |   1 +
>>   net/smc/af_smc.c      |  38 ++++++---
>>   net/smc/smc.h         |   4 +
>>   net/smc/smc_bpf.c     | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/smc/smc_bpf.h     |  34 ++++++++
>>   11 files changed, 357 insertions(+), 16 deletions(-)
>>   create mode 100644 net/smc/smc_bpf.c
>>   create mode 100644 net/smc/smc_bpf.h
>>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index 6a5e08b..4ef160a 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -478,7 +478,7 @@ struct tcp_sock {
>>   #endif
>>   #if IS_ENABLED(CONFIG_SMC)
>>       bool    syn_smc;    /* SYN includes SMC */
>> -    bool    (*smc_hs_congested)(const struct sock *sk);
>> +    struct tcpsmc_ctx *smc;
>>   #endif
>>   #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO)
>> diff --git a/include/net/smc.h b/include/net/smc.h
>> index db84e4e..34ab2c6 100644
>> --- a/include/net/smc.h
>> +++ b/include/net/smc.h
>> @@ -18,6 +18,8 @@
>>   #include "linux/ism.h"
>>   struct sock;
>> +struct tcp_sock;
>> +struct inet_request_sock;
>>   #define SMC_MAX_PNETID_LEN    16    /* Max. length of PNET id */
>> @@ -97,4 +99,49 @@ struct smcd_dev {
>>       u8 going_away : 1;
>>   };
>> +/*
>> + * This structure is used to store the parameters passed to the member of struct_ops.
>> + * Due to the BPF verifier cannot restrict the writing of bit fields, such as limiting
>> + * it to only write ireq->smc_ok. Using kfunc can solve this issue, but we don't want
>> + * to introduce a kfunc with such a narrow function.
> 
> imo, adding kfunc is fine.
> 
>> + *
>> + * Moreover, using this structure for unified parameters also addresses another
>> + * potential issue. Currently, kfunc cannot recognize the calling context
>> + * through BPF's existing structure. In the future, we can solve this problem
>> + * by passing this ctx to kfunc.
> 
> This part I don't understand. How is it different from the "tcp_cubic_kfunc_set" allowed in 
> tcp_congestion_ops?

Hi Martin,

Yes, creating an independent kfunc for each callback and filtering via expected_attach_type can 
indeed solve the problem.

Our main concern is to avoid introducing kfuncs as much as possible. For our subsystem, we might 
need to maintain it in a way that maintains a uapi, as we certainly have user applications depending 
on it.

This is also why we need to create a separate ctx, as there’s no way to restrict bit writes, so we 
created a ctx->smc_ok that is allowed to write.

This is also why we had to create a separate structure, tcpsmc_ctx ...

However, I now realize that compromising to avoid introducing kfuncs has gone too far, affecting the 
readability of the code. I will try to use kfuncs in the next version to solve those issues.


> 
>> + */
>> +struct smc_bpf_ops_ctx {
>> +    struct {
>> +        struct tcp_sock *tp;
>> +    } set_option;
>> +    struct {
>> +        const struct tcp_sock *tp;
>> +        struct inet_request_sock *ireq;
>> +        int smc_ok;
>> +    } set_option_cond;
>> +};
> 
> There is no need to create one single ctx for struct_ops prog. struct_ops prog can take >1 args and 
> different ops can take different args.
> 

Same reason with concern on kfunc. I'll change it in next version.


>> +
>> +struct smc_bpf_ops {
>> +    /* priavte */
>> +
>> +    struct list_head    list;
>> +
>> +    /* public */
>> +
>> +    /* Invoked before computing SMC option for SYN packets.
>> +     * We can control whether to set SMC options by modifying
>> +     * ctx->set_option->tp->syn_smc.
>> +     * This's also the only member that can be modified now.
>> +     * Only member in ctx->set_option is valid for this callback.
>> +     */
>> +    void (*set_option)(struct smc_bpf_ops_ctx *ctx);
>> +
>> +    /* Invoked before Set up SMC options for SYN-ACK packets
>> +     * We can control whether to respond SMC options by modifying
>> +     * ctx->set_option_cond.smc_ok.
>> +     * Only member in ctx->set_option_cond is valid for this callback.
>> +     */
>> +    void (*set_option_cond)(struct smc_bpf_ops_ctx *ctx);
> 
> The struct smc_bpf_ops already has set_option and set_option_cnd, but...
> 
>> +};
>> +
>>   #endif    /* _SMC_H */
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index 739a9fb..c322443 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -2730,6 +2730,12 @@ static inline void tcp_bpf_rtt(struct sock *sk, long mrtt, u32 srtt)
>>   #if IS_ENABLED(CONFIG_SMC)
>>   extern struct static_key_false tcp_have_smc;
>> +struct tcpsmc_ctx {
>> +    /* Invoked before computing SMC option for SYN packets. */
>> +    void (*set_option)(struct tcp_sock *tp);
>> +    /* Invoked before Set up SMC options for SYN-ACK packets */
>> +    void (*set_option_cond)(const struct tcp_sock *tp, struct inet_request_sock *ireq);
>> +};
> 
> another new struct tcpsmc_ctx has exactly the same functions (at least the same name) but different 
> arguments. I don't understand why this duplicate, is it because the need to prepare the "struct 
> smc_bpf_ops_ctx"?

Yes, same reason with concern on kfunc. I'll change it in next version.

> 
> The "struct tcpsmc_ctx" should be the "struct smc_bpf_ops" itself.
> 
> [ ... ]
> 
>> +static int smc_bpf_ops_btf_struct_access(struct bpf_verifier_log *log,
>> +                     const struct bpf_reg_state *reg,
>> +                     const struct bpf_prog *prog,
>> +                     int off, int size)
>> +{
>> +    const struct btf_member *member;
>> +    const char *mname;
>> +    int member_idx;
>> +
>> +    member_idx = prog->expected_attach_type;
>> +    if (member_idx >= btf_type_vlen(smc_bpf_ops_type))
>> +        goto out_err;
>> +
>> +    member = &btf_type_member(smc_bpf_ops_type)[member_idx];
>> +    mname = btf_str_by_offset(saved_btf, member->name_off);
>> +
>> +    if (!strcmp(mname, "set_option")) {
> 
> btf_member_bit_offset can be used instead of strcmp. Take a look at bpf_tcp_ca.c and kernel/sched/ext.c
> 

Got it, thanks for that.

Besides, it seems that we don't need the export btf_str_by_offset anymore in that way.
I'll remove it in the next version.


>> +        /* only support to modify tcp_sock->syn_smc */
>> +        if (reg->btf_id == tcp_sock_id &&
>> +            off == offsetof(struct tcp_sock, syn_smc) &&
>> +            off + size == offsetofend(struct tcp_sock, syn_smc))
>> +            return 0;
>> +    } else if (!strcmp(mname, "set_option_cond")) {
>> +        /* only support to modify smc_bpf_ops_ctx->smc_ok */
>> +        if (reg->btf_id == smc_bpf_ops_ctx_id &&
>> +            off == offsetof(struct smc_bpf_ops_ctx, set_option_cond.smc_ok) &&
>> +            off + size == offsetofend(struct smc_bpf_ops_ctx, set_option_cond.smc_ok))
>> +            return 0;
>> +    }
>> +
>> +out_err:
>> +    return -EACCES;
>> +}
>> +
>> +static const struct bpf_verifier_ops smc_bpf_verifier_ops = {
>> +    .get_func_proto = bpf_base_func_proto,
>> +    .is_valid_access = bpf_tracing_btf_ctx_access,
>> +    .btf_struct_access = smc_bpf_ops_btf_struct_access,
>> +};
>> +
>> +static struct bpf_struct_ops bpf_smc_bpf_ops = {
>> +    .init = smc_bpf_ops_init,
>> +    .name = "smc_bpf_ops",
>> +    .reg = smc_bpf_ops_reg,
>> +    .unreg = smc_bpf_ops_unreg,
>> +    .cfi_stubs = &__bpf_smc_bpf_ops,
>> +    .verifier_ops = &smc_bpf_verifier_ops,
>> +    .init_member = smc_bpf_ops_init_member,
>> +    .check_member = smc_bpf_ops_check_member,
>> +    .owner = THIS_MODULE,
>> +};
>> +
>> +int smc_bpf_struct_ops_init(void)
>> +{
>> +    return register_bpf_struct_ops(&bpf_smc_bpf_ops, smc_bpf_ops);
>> +}
>> +
>> +void bpf_smc_set_tcp_option(struct tcp_sock *tp)
>> +{
>> +    struct smc_bpf_ops_ctx ops_ctx = {};
>> +    struct smc_bpf_ops *ops;
>> +
>> +    ops_ctx.set_option.tp = tp;
> 
> All this initialization should be unnecessary. Directly pass tp instead.
> 

Same reason with kfunc concern. I'll change it in next version.

>> +
>> +    rcu_read_lock();
>> +    list_for_each_entry_rcu(ops, &smc_bpf_ops_list, list) {
> 
> Does it need to have a list (meaning >1) of smc_bpf_ops to act on a sock? The ordering expectation 
> is hard to manage.
> 

Considering that the SMC modules also has its own ops that needs to be registered on it (the logic 
of smc_limit_fs), and need to be all executed, perhaps a list is a more suitable choice.


>> +        ops->set_option(&ops_ctx);
> 
> A dumb question. This will only affect AF_SMC (or AF_INET[6]/IPPROTO_SMC) socket but not the 
> AF_INET[6]/IPPROTO_{TCP,UDP} socket?
> 

Yes, it only affects AF_SMC, AF_SMC6, or IPPROTO_SMC sockets. Due to only SMC sockets will set 
tp->syn_smc, and we will check it before calling the very ops.

Best wishes,
D.

> pw-bot: cr
> 
>> +    }
>> +    rcu_read_unlock();
>> +}




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ