netdev - Re: [RFC] Socket termination for policy enforcement and load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <077d56ef-30cb-2d19-6f57-a92fd886b5f2@linux.dev>
Date:   Wed, 7 Sep 2022 19:26:34 -0700
From:   Martin KaFai Lau <martin.lau@...ux.dev>
To:     Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Aditi Ghag <aditivghag@...il.com>
Cc:     Martin KaFai Lau <kafai@...com>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, Daniel Borkmann <daniel@...earbox.net>,
        Yonghong Song <yhs@...com>,
        Kuniyuki Iwashima <kuniyu@...zon.com>
Subject: Re: [RFC] Socket termination for policy enforcement and
 load-balancing

On 9/4/22 2:24 PM, Kumar Kartikeya Dwivedi wrote:
> On Sun, 4 Sept 2022 at 20:55, Aditi Ghag <aditivghag@...il.com> wrote:
>>
>> On Wed, Aug 31, 2022 at 4:02 PM Martin KaFai Lau <kafai@...com> wrote:
>>>
>>> On Wed, Aug 31, 2022 at 09:37:41AM -0700, Aditi Ghag wrote:
>>>> - Use BPF (sockets) iterator to identify sockets connected to a
>>>> deleted backend. The BPF (sockets) iterator is network namespace aware
>>>> so we'll either need to enter every possible container network
>>>> namespace to identify the affected connections, or adapt the iterator
>>>> to be without netns checks [3]. This was discussed with my colleague
>>>> Daniel Borkmann based on the feedback he shared from the LSFMMBPF
>>>> conference discussions.
>>> Being able to iterate all sockets across different netns will
>>> be useful.
>>>
>>> It should be doable to ignore the netns check.  For udp, a quick
>>> thought is to have another iter target. eg. "udp_all_netns".
>>>  From the sk, the bpf prog should be able to learn the netns and
>>> the bpf prog can filter the netns by itself.
>>>
>>> The TCP side is going to have an 'optional' per netns ehash table [0] soon,
>>> not lhash2 (listening hash) though.  Ideally, the same bpf
>>> all-netns iter interface should work similarly for both udp and
>>> tcp case.  Thus, both should be considered and work at the same time.
>>>
>>> For udp, something more useful than plain udp_abort() could potentially
>>> be done.  eg. directly connect to another backend (by bpf kfunc?).
>>> There may be some details in socket locking...etc but should
>>> be doable and the bpf-iter program could be sleepable also.
>>
>> This won't be effective for connected udp though, will it? Interesting thought
>> around using bpf kfunchmm... why the bpf-prog doing the udp re-connect() won't be effective? 
I suspect we are talking about different thing.

Regardless, for tcp, I think the user space needs to handle the tcp 
aborted-error by redoing the connect().  Thus, lets stay with 
{tcp,udp}_abort() for now.  Try to expose {tcp,udp}_abort() as a kfunc 
instead of a new bpf_helper.

>>
>>> fwiw, we are iterating the tcp socket to retire some older
>>> bpf-tcp-cc (congestion control) on the long-lived connections
>>> by bpf_setsockopt(TCP_CONGESTION).
>>>
>>> Also, potentially, instead of iterating all,
>>> a more selective case can be done by
>>> bpf_prog_test_run()+bpf_sk_lookup_*()+udp_abort().
>>
>> Can you elaborate more on the more selective iterator approach?
If the 4 tuples (src/dst ip/port) is known, bpf_sk_lookup_*() can lookup 
a sk from the tcp_hashinfo or udp_table.  bpf_sk_lookup_*() also takes a 
netns_id argument.  However, yeah, it will still go back to the need to 
get all netns, so may not work well in the RFC case here.

>>
>> On a similar note, are there better ways as alternatives to the
>> sockets iterator approach.
>> Since we have BPF programs executed on cgroup BPF hooks (e.g.,
>> connect), we already know what client
>> sockets are connected to a backend. Can we somehow store these socket
>> pointers in a regular BPF map, and
>> when a backend is deleted, use a regular map iterator to invoke
>> sock_destroy() for these sockets? Does anyone have
>> experience using the "typed pointer support in BPF maps" APIs [0]?
> 
> I am not very familiar with how socket lifetime is managed, it may not
> be possible in case lifetime is managed by RCU only,
> or due to other limitations.
> Martin will probably be able to comment more on that.
sk is the usual refcnt+rcu_reader pattern.  afaik, the use case here is 
the sk should be removed from the map when there is a tcp_close() or 
udp_lib_close().  There is sock_map and sock_hash to store sk as the 
map-value.  iirc the sk will be automatically removed from the map 
during tcp_close() and udp_lib_close().  The sock_map and sock_hash have 
bpf iterator also.  Meaning a bpf-iter-prog can iterate the sock_map and 
sock_hash and then do abort on each sk, so it looks like most of the 
pieces are in place.