lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cba4cb83-4a6e-00b5-82ed-facb783f4707@linux.alibaba.com>
Date: Wed, 17 May 2023 17:16:22 +0800
From: "D. Wythe" <alibuda@...ux.alibaba.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: kuba@...nel.org, davem@...emloft.net, netdev@...r.kernel.org,
 linux-s390@...r.kernel.org, linux-rdma@...r.kernel.org, bpf@...r.kernel.org,
 kgraul@...ux.ibm.com, wenjia@...ux.ibm.com, jaka@...ux.ibm.com,
 ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, pabeni@...hat.com,
 song@...nel.org, sdf@...gle.com, haoluo@...gle.com, yhs@...com,
 edumazet@...gle.com, john.fastabend@...il.com, kpsingh@...nel.org,
 jolsa@...nel.org, guwen@...ux.alibaba.com
Subject: Re: [PATCH bpf-next v1 2/5] net/smc: allow smc to negotiate protocols
 on policies



On 5/17/23 4:14 PM, Martin KaFai Lau wrote:
> On 5/17/23 12:08 AM, D. Wythe wrote:
>>
>>
>> On 5/16/23 6:52 AM, Martin KaFai Lau wrote:
>>> On 5/11/23 11:24 PM, D. Wythe wrote:
>>>> From: "D. Wythe" <alibuda@...ux.alibaba.com>
>>>>
>>>> As we all know, the SMC protocol is not suitable for all scenarios,
>>>> especially for short-lived. However, for most applications, they 
>>>> cannot
>>>> guarantee that there are no such scenarios at all. Therefore, apps
>>>> may need some specific strategies to decide shall we need to use SMC
>>>> or not.
>>>>
>>>> Just like the congestion control implementation in TCP, this patch
>>>> provides a generic negotiator implementation. If necessary,
>>>> we can provide different protocol negotiation strategies for
>>>> apps based on this implementation.
>>>>
>>>> But most importantly, this patch provides the possibility of
>>>> eBPF injection, allowing users to implement their own protocol
>>>> negotiation policy in userspace.
>>>>
>>>> Signed-off-by: D. Wythe <alibuda@...ux.alibaba.com>
>>>> ---
>>>>   include/net/smc.h        |  32 +++++++++++
>>>>   net/Makefile             |   1 +
>>>>   net/smc/Kconfig          |  11 ++++
>>>>   net/smc/af_smc.c         | 134 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++-
>>>>   net/smc/smc_negotiator.c | 119 
>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>   net/smc/smc_negotiator.h | 116 
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>   6 files changed, 412 insertions(+), 1 deletion(-)
>>>>   create mode 100644 net/smc/smc_negotiator.c
>>>>   create mode 100644 net/smc/smc_negotiator.h
>>>>
>>>> diff --git a/include/net/smc.h b/include/net/smc.h
>>>> index 6d076f5..191061c 100644
>>>> --- a/include/net/smc.h
>>>> +++ b/include/net/smc.h
>>>> @@ -296,6 +296,8 @@ struct smc_sock {                /* smc sock 
>>>> container */
>>>>       atomic_t                queued_smc_hs;  /* queued smc 
>>>> handshakes */
>>>>       struct inet_connection_sock_af_ops        af_ops;
>>>>       const struct inet_connection_sock_af_ops *ori_af_ops;
>>>> +    /* protocol negotiator ops */
>>>> +    const struct smc_sock_negotiator_ops *negotiator_ops;
>>>>                           /* original af ops */
>>>>       int            sockopt_defer_accept;
>>>>                           /* sockopt TCP_DEFER_ACCEPT
>>>> @@ -316,4 +318,34 @@ struct smc_sock {                /* smc sock 
>>>> container */
>>>>                            */
>>>>   };
>>>>   +#ifdef CONFIG_SMC_BPF
>>>> +/* BPF struct ops for smc protocol negotiator */
>>>> +struct smc_sock_negotiator_ops {
>>>> +
>>>> +    struct list_head    list;
>>>> +
>>>> +    /* ops name */
>>>> +    char        name[16];
>>>> +    /* key for name */
>>>> +    u32            key;
>>>> +
>>>> +    /* init with sk */
>>>> +    void (*init)(struct sock *sk);
>>>> +
>>>> +    /* release with sk */
>>>> +    void (*release)(struct sock *sk);
>>>> +
>>>> +    /* advice for negotiate */
>>>> +    int (*negotiate)(struct sock *sk);
>>>> +
>>>> +    /* info gathering timing */
>>>> +    void (*collect_info)(struct sock *sk, int timing);
>>>> +
>>>> +    /* module owner */
>>>> +    struct module *owner;
>>>> +};
>>>> +#else
>>>> +struct smc_sock_negotiator_ops {};
>>>> +#endif
>>>> +
>>>>   #endif    /* _SMC_H */
>>>> diff --git a/net/Makefile b/net/Makefile
>>>> index 4c4dc53..222916a 100644
>>>> --- a/net/Makefile
>>>> +++ b/net/Makefile
>>>> @@ -52,6 +52,7 @@ obj-$(CONFIG_TIPC)        += tipc/
>>>>   obj-$(CONFIG_NETLABEL)        += netlabel/
>>>>   obj-$(CONFIG_IUCV)        += iucv/
>>>>   obj-$(CONFIG_SMC)        += smc/
>>>> +obj-$(CONFIG_SMC_BPF)        += smc/smc_negotiator.o > 
>>>> obj-$(CONFIG_RFKILL)        += rfkill/
>>>>   obj-$(CONFIG_NET_9P)        += 9p/
>>>>   obj-$(CONFIG_CAIF)        += caif/
>>>> diff --git a/net/smc/Kconfig b/net/smc/Kconfig
>>>> index 1ab3c5a..bdcc9f1 100644
>>>> --- a/net/smc/Kconfig
>>>> +++ b/net/smc/Kconfig
>>>> @@ -19,3 +19,14 @@ config SMC_DIAG
>>>>         smcss.
>>>>           if unsure, say Y.
>>>> +
>>>> +config SMC_BPF
>>>> +    bool "SMC: support eBPF" if SMC
>>>
>>>
>>> so smc_negotiator will always be in the kernel image even af_smc is 
>>> compiled as a module? If the SMC_BPF needs to support af_smc as a 
>>> module, proper implementation needs to be added to bpf_struct_ops to 
>>> support module first. It is work-in-progress.
>>>
>>
>> smc_negotiator will not no in the kernel image when af_smc is 
>> compiled as a module,
>> it's requires config SMC_BPF also sets to be Y,  while it's default 
>> to be N. That's is,
>> even if af_smc is compiled as a module but with no SMC_BPF set, 
>> smc_negotiator
>> doesn't exist anywhere.
>
> CONFIG_SMC_BPF could be "y" while CONFIG_SMC is "m", no?
>
> Anyway, there is a build error when CONFIG_SMC is "m" :(
>

I am curious if users who proactively set CONFIG_SMC_BPF to Y would care 
about the issue you mentioned, while
CONFIG_SMC_BPF defaults to N ?

And I'm really sorry about this compilation error. Last time, I had got 
some comments about symbol export, so I tried to remove some symbol exports,
unfortunately, there are compilation issues when BPF_JIT is set 
(bpf_struct_ops_get is no exported), sorry for my incomplete
testing. I will fix this issue in the new version.

Anyway, if bpf_struct_ops can support module, that would be better, and 
can greatly reduce the trade-offs I make between modules and built-in.
Is there any details can shared on your progress ?

>>>> +    depends on BPF_SYSCALL
>>>> +    default n
>>>> +    help
>>>> +      Supports eBPF to allows user mode participation in SMC's 
>>>> protocol process
>>>> +      via ebpf programs. Alternatively, obtain information about 
>>>> the SMC socks
>>>> +      through the ebpf program.
>>>> +
>>>> +      If unsure, say N.
>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>> index 50c38b6..7406fd4 100644
>>>> --- a/net/smc/af_smc.c
>>>> +++ b/net/smc/af_smc.c
>>>> @@ -52,6 +52,7 @@
>>>>   #include "smc_close.h"
>>>>   #include "smc_stats.h"
>>>>   #include "smc_tracepoint.h"
>>>> +#include "smc_negotiator.h"
>>>>   #include "smc_sysctl.h"
>>>>     static DEFINE_MUTEX(smc_server_lgr_pending);    /* serialize 
>>>> link group
>>>> @@ -68,6 +69,119 @@
>>>>   static void smc_tcp_listen_work(struct work_struct *);
>>>>   static void smc_connect_work(struct work_struct *);
>>>>   +#ifdef CONFIG_SMC_BPF
>>>> +
>>>> +/* Check if sock should use smc */
>>>> +int smc_sock_should_select_smc(const struct smc_sock *smc)
>>>> +{
>>>> +    const struct smc_sock_negotiator_ops *ops;
>>>> +    int ret;
>>>> +
>>>> +    rcu_read_lock();
>>>> +    ops = READ_ONCE(smc->negotiator_ops);
>>>> +
>>>> +    /* No negotiator_ops supply or no negotiate func set,
>>>> +     * always pass it.
>>>> +     */
>>>> +    if (!ops || !ops->negotiate) {
>>>
>>> A smc_sock_negotiator_ops without ->negotiate? Is it useful at all 
>>> to allow the register in the first place?
>>>
>>
>> You are right, this can be avoid before registration. I'll fix it.
>>
>>>> +        rcu_read_unlock();
>>>> +        return SK_PASS;
>>>> +    }
>>>> +
>>>> +    ret = ops->negotiate((struct sock *)&smc->sk);
>>>> +    rcu_read_unlock();
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +void smc_sock_perform_collecting_info(const struct smc_sock *smc, 
>>>> int timing)
>>>> +{
>>>> +    const struct smc_sock_negotiator_ops *ops;
>>>> +
>>>> +    rcu_read_lock();
>>>> +    ops = READ_ONCE(smc->negotiator_ops);
>>>> +
>>>> +    if (!ops || !ops->collect_info) {
>>>> +        rcu_read_unlock();
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    ops->collect_info((struct sock *)&smc->sk, timing);
>>>> +    rcu_read_unlock();
>>>> +}
>>>> +
>>>> +int smc_sock_assign_negotiator_ops(struct smc_sock *smc, const 
>>>> char *name)
>>>> +{
>>>> +    struct smc_sock_negotiator_ops *ops;
>>>> +    int ret = -EINVAL;
>>>> +
>>>> +    /* already set */
>>>> +    if (READ_ONCE(smc->negotiator_ops))
>>>> +        smc_sock_cleanup_negotiator_ops(smc, /* might be still 
>>>> referenced */ false);
>>>> +
>>>> +    /* Just for clear negotiator_ops */
>>>> +    if (!name || !strlen(name))
>>>> +        return 0;
>>>> +
>>>> +    rcu_read_lock();
>>>> +    ops = smc_negotiator_ops_get_by_name(name);
>>>> +    if (likely(ops)) {
>>>> +        if (unlikely(!bpf_try_module_get(ops, ops->owner))) {
>>>> +            ret = -EACCES;
>>>> +        } else {
>>>> +            WRITE_ONCE(smc->negotiator_ops, ops);
>>>> +            /* make sure ops can be seen */
>>>> +            smp_wmb();
>>>
>>> This rcu_read_lock(), WRITE_ONCE, and smp_wmb() combo looks very 
>>> suspicious. smc->negotiator_ops is protected by rcu (+refcnt) or 
>>> lock_sock()?
>>>
>>
>> All access to ops is protected by RCU, and there are no lock_sock. 
>> WRITE_ONCE() and smp_wmb() do
>> not participate in any guarantee of the availability of ops, The 
>> purpose to using them is just wish the latest values
>> can be read as soon as possible , In fact, even if old value is read, 
>> there will be no problem in logic because all updates
>> will do synchronize_rcu() and all access to ops is under in 
>> rcu_read_lock().
>
> The explanation is not encouraging. No clear benefit while having this 
> kind of complexity here. Switching tcp congestion ops also does not 
> require this. Some of the new codes is in af_smc but bpf is the 
> primary user. It is not something that I would like to maintain and 
> then need to reason about this unusual pattern a year later. Beside, 
> this negotiator_ops assignment must be done under a lock_sock(). The 
> same probably is true for calling ops->negotiate() where the bpf prog 
> may be looking at the sk and calling bpf_setsockopt.

I got you point, If you feel that those code are complexity and 
unnecessary, I can remove them of course.

Additionally, smc_sock_assign_negotiator_ops is indeed executed under 
sock lock,  __smc_setsockopt will lock sock
for it. I misunderstood your meaning before.

As for ops ->negotiate(), thanks for this point, but considering 
performance,
I might prohibit calling setsockopt in negotiate().

>>
>>> I am going to stop reviewing here.
>>>
>>
>> Hoping my explanation can answer your questions and still looking 
>> forward to
>> your more feedback 😁.
>
> Sorry, based on the review so far (there was some RFC before), it is 
> not something that I want to continue to review and maintain a bpf 
> hook for it. You have to solicit other known community members for 
> review and sponsor this set from now on.

Okay, thank you very much for pointing out some issues and suggestions.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ