lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Mar 2023 22:26:55 +0200
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
        pabeni@...hat.com, borisp@...dia.com, john.fastabend@...il.com,
        maximmi@...dia.com, tariqt@...dia.com, vfedorenko@...ek.ru,
        Gal Pressman <gal@...dia.com>,
        Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH net-next v3 7/7] tls: rx: do not use the standard
 strparser



On 13/03/2023 20:22, Jakub Kicinski wrote:
> On Sun, 12 Mar 2023 19:59:57 +0200 Tariq Toukan wrote:
>> On 09/03/2023 19:54, Jakub Kicinski wrote:
>>> On Thu, 9 Mar 2023 17:15:26 +0200 Tariq Toukan wrote:
>>>> A few fixes were introduced for this patch, but it seems to still cause
>>>> issues.
>>>>
>>>> I'm running simple client/server test with wrk and nginx and TLS RX
>>>> device offload on.
>>>> It fails with TlsDecryptError on the client side for the large file
>>>> (256000b), while succeeding for the small one (10000b). See repro
>>>> details below.
>>>>
>>>> I narrowed the issue down to this offending patch, by applying a few
>>>> reverts (had to solve trivial conflicts):
>>>
>>> What's the sequence of records in terms of being offloaded vs fall back?
>>> Could you whip up a simple ring buffer to see if previous records were
>>> offloaded and what the skb geometries where?
>>
>> Interesting. All records go through the sw fallback.
>>
>> Command:
>> $ wrk_openssl_3_0_0 -b2.2.2.2 -t1 -c1 -d2 --timeout 5s
>> https://2.2.2.3:20443/256000b.img
> 
> Is wrk_openssl_3_0_0 a nginx command? Any CX6 DX card can do this?
> 

It's the standard wrk client, nothing special here.
We simply rename it to indicate that it was compiled against a specific 
openssl version.

>> Debug code:
>> @@ -1712,8 +1723,13 @@ static int tls_rx_one_record(struct sock *sk,
>> struct msghdr *msg,
>>           int err;
>>
>>           err = tls_decrypt_device(sk, msg, tls_ctx, darg);
>> -       if (!err)
>> +       if (!err) {
>>                   err = tls_decrypt_sw(sk, tls_ctx, msg, darg);
>> +               printk("sk: %p, tls_decrypt_sw, err = %d\n", sk, err);
>> +       } else {
>> +               printk("sk: %p, tls_decrypt_device, err = %d\n", sk, err);
>> +       }
>> +       skb_dump(KERN_ERR, darg->skb, false);
>>           if (err < 0)
>>                   return err;
>>
>> dmesg output including skb geometries is attached.
> 
> Hm, could you add to the debug the addresses of the fragments
> (and decrypted status) of the data queued to TCP by the driver?
> And then the frag addresses in skb_dump() ?
> 

I used the debug code below. Find dmesg output attached.
I disabled GRO to reduce complexity.

# cat /proc/net/tls_stat
TlsCurrTxSw                             0
TlsCurrRxSw                             0
TlsCurrTxDevice                         1
TlsCurrRxDevice                         1
TlsTxSw                                 0
TlsRxSw                                 0
TlsTxDevice                             2
TlsRxDevice                             2
TlsDecryptError                         2
TlsRxDeviceResync                       2
TlsDecryptRetry                         0
TlsRxNoPadViolation                     0

Insights:

In this repro, we always go through tls_decrypt_sw, never to 
tls_decrypt_device.

The corrupted SKB is the one with mixed offloaded and non-offloaded 
decryption. You can clearly see this by its len (2330624 bytes!), 
followed by a kernel panic.

Calling skb_dump on the corrupted skb also hits the kernel panic.

Code:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3f7b63d6616b..864288b5d412 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2262,6 +2262,7 @@ static void mlx5e_handle_rx_cqe_mpwrq(struct 
mlx5e_rq *rq, struct mlx5_cqe64 *cq
                         goto mpwrq_cqe_out;
                 }

+       printk("driver calling napi_gro_receive for skb %p, is decrypt = 
%d\n", skb, skb->decrypted);
         napi_gro_receive(rq->cq.napi, skb);

  mpwrq_cqe_out:
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 782d3701b86f..3858299196d3 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1710,10 +1710,19 @@ static int tls_rx_one_record(struct sock *sk, 
struct msghdr *msg,
         struct tls_prot_info *prot = &tls_ctx->prot_info;
         struct strp_msg *rxm;
         int err;
+       int i;

         err = tls_decrypt_device(sk, msg, tls_ctx, darg);
-       if (!err)
+       if (!err) {
                 err = tls_decrypt_sw(sk, tls_ctx, msg, darg);
+               printk("sk: %p, tls_decrypt_sw, err = %d\n", sk, err);
+       } else {
+               printk("sk: %p, tls_decrypt_device, err = %d\n", sk, err);
+       }
+       printk("skb: %p, len = %d, nfrags = %d\n", darg->skb, 
darg->skb->len, skb_shinfo(darg->skb)->nr_frags);
+       for (i = 0; i < skb_shinfo(darg->skb)->nr_frags; i++)
+               printk("frag#%d: %p, len %d\n", i, 
&skb_shinfo(darg->skb)->frags[i],
+ 
skb_frag_size(&skb_shinfo(darg->skb)->frags[i]));
         if (err < 0)
                 return err;

> tls_decrypt_sw() will also get used in partially decrypted records,
> right?

Right.

View attachment "dmesg_tls_rx_decrypt_err_2.txt" of type "text/plain" (41837 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ