[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <062f5ac8-0533-bf29-d51c-b9390ad06d78@huawei.com>
Date: Tue, 5 Jul 2022 11:26:53 +0800
From: "Ziyang Xuan (William)" <william.xuanziyang@...wei.com>
To: Jakub Kicinski <kuba@...nel.org>,
Julien Salleyron <julien.salleyron@...il.com>
CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
Marc Vertes <mvertes@...e.fr>
Subject: Re: [PATCH] net: tls: fix tls with sk_redirect using a BPF verdict.
> On Tue, 28 Jun 2022 17:25:05 +0200 Julien Salleyron wrote:
>> This patch allows to use KTLS on a socket where we apply sk_redirect using a BPF
>> verdict program.
>>
>> Without this patch, we see that the data received after the redirection are
>> decrypted but with an incorrect offset and length. It seems to us that the
>> offset and length are correct in the stream-parser data, but finally not applied
>> in the skb. We have simply applied those values to the skb.
>>
>> In the case of regular sockets, we saw a big performance improvement from
>> applying redirect. This is not the case now with KTLS, may be related to the
>> following point.
>
> It's because kTLS does a very expensive reallocation and copy for the
> non-zerocopy case (which currently means all of TLS 1.3). I have
> code almost ready to fix that (just needs to be reshuffled into
> upstreamable patches). Brings us up from 5.9 Gbps to 8.4 Gbps per CPU
> on my test box with 16k records. Probably much more than that with
> smaller records.
>
>> It is still necessary to perform a read operation (never triggered) from user
>> space despite the redirection. It makes no sense, since this read operation is
>> not necessary on regular sockets without KTLS.
>>
>> We do not see how to fix this problem without a change of architecture, for
>> example by performing TLS decrypt directly inside the BPF verdict program.
>>
>> An example program can be found at
>> https://github.com/juliens/ktls-bpf_redirect-example/
>>
>> Co-authored-by: Marc Vertes <mvertes@...e.fr>
>> ---
>> net/tls/tls_sw.c | 6 ++++++
>> tools/testing/selftests/bpf/test_sockmap.c | 8 +++-----
>> 2 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 0513f82b8537..a409f8a251db 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1839,8 +1839,14 @@ int tls_sw_recvmsg(struct sock *sk,
>> if (bpf_strp_enabled) {
>> /* BPF may try to queue the skb */
>> __skb_unlink(skb, &ctx->rx_list);
>> +
>> err = sk_psock_tls_strp_read(psock, skb);
>> +
>> if (err != __SK_PASS) {
>> + if (err == __SK_REDIRECT) {
>> + skb->data += rxm->offset;
>> + skb->len = rxm->full_len;
>> + }
>
Read the code with the following ktls + ebpf redirect model,
we may find the problem for the above modfication.
sk_psock_tls_strp_read()
->sk_psock_tls_verdict_apply()
->sk_psock_skb_redirect()
->skb_queue_tail() // add skb to ingress_skb queue
schedule_work() [ sk_psock_backlog ] // sk_psock_tls_strp_read return
Walk through sk_psock->ingress_skb queue in sk_psock_backlog(),
after sk_psock_handle_skb(), it will kfree_skb() if not ingress.
So I think the above modfication has race problem about skb.
And I backport the above modfication to linux-5.10 as following:
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1882,6 +1882,10 @@ int tls_sw_recvmsg(struct sock *sk,
if (bpf_strp_enabled) {
err = sk_psock_tls_strp_read(psock, skb);
if (err != __SK_PASS) {
+ if (err == __SK_REDIRECT) {
+ skb->data += rxm->offset;
+ skb->len = rxm->full_len;
+ }
rxm->offset = rxm->offset + rxm->full_len;
Run the following ktls + ebpf redirect model, occur system panic.
ktls ktls
sender proxy_recv proxy_send recv
| | |
| verdict ---------------+ |
| | redirect_egress | |
+------------+ +-------------+
Call trace as following. It's may be not the race problem about skb,
but the above modfication is not OK for the ktls + ebpf redirect model.
[ 816.687852] BUG: unable to handle page fault for address: ffff8f057ffdf000
[ 816.687998] Oops: 0000 [#1] SMP PTI
[ 816.688025] CPU: 2 PID: 619 Comm: kworker/2:3 Kdump: loaded Not tainted 5.10.0-ktls+ #4
[ 816.688122] Workqueue: events sk_psock_backlog
[ 816.688152] RIP: 0010:memcpy_erms+0x6/0x10
[ 816.688263] RSP: 0018:ffff9c60c01cfc58 EFLAGS: 00010286
[ 816.688292] RAX: ffff8f054f880000 RBX: 0000000000008000 RCX: 00000000000044bb
[ 816.688330] RDX: 0000000000008000 RSI: ffff8f057ffdf000 RDI: ffff8f054f883b45
[ 816.688376] RBP: 0000000000008000 R08: 0000000000000001 R09: 0000000000000000
[ 816.688412] R10: 0000000000000001 R11: ffff9c60c01cfdd0 R12: ffff9c60c01cfdd0
[ 816.688447] R13: ffff8f054f880000 R14: ffff9c60c01cfdb0 R15: 0000000000008000
[ 816.688480] FS: 0000000000000000(0000) GS:ffff8f0775d00000(0000) knlGS:0000000000000000
[ 816.688524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 816.688564] CR2: ffff8f057ffdf000 CR3: 0000000120b16005 CR4: 0000000000370ee0
[ 816.688600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 816.688631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 816.688661] Call Trace:
[ 816.688681] _copy_from_iter_full+0x1ee/0x270
[ 816.688725] tcp_sendmsg_locked+0x5aa/0xdc0
[ 816.688749] tcp_sendmsg+0x28/0x40
[ 816.688769] sock_sendmsg+0x57/0x60
[ 816.688790] ? sendmsg_unlocked+0x20/0x20
[ 816.688811] __skb_send_sock+0x29b/0x2f0
[ 816.688847] ? linear_to_page+0xf0/0xf0
[ 816.688868] sk_psock_backlog+0x85/0x1d0
[ 816.688890] process_one_work+0x1b0/0x350
[ 816.688911] worker_thread+0x49/0x300
In addition, I'm trying to explain performance degradation for the
ktls + ebpf redirect model compare to the following general proxy
model using TLS 1.3 (which are all using non-zerocopy), even though
ktls + ebpf redirect model reduce two cross-kernel data copies.
Now I find a memcpy_erms hot spot for _copy_from_iter_full() in
tcp_sendmsg_locked within redirect_egress process. I suspect that
copy efficiency for the kvec is not good. But this just my guess, I
need to verify.
ktls ktls
sender proxy_recv proxy_send recv
| | | |
+----------------+ +-----------+
If you have some ideas, hope you can give me some advice.
Thand you!
> IDK what this is trying to do but I certainly depends on the fact
> we run skb_cow_data() and is not "generally correct" :S
> .
>
Powered by blists - more mailing lists