netdev - Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Mon, 3 Jul 2023 15:57:28 +0200
From: Hannes Reinecke <hare@...e.de>
To: Sagi Grimberg <sagi@...mberg.me>, David Howells <dhowells@...hat.com>
Cc: Keith Busch <kbusch@...nel.org>, Christoph Hellwig <hch@....de>,
 linux-nvme@...ts.infradead.org, Jakub Kicinski <kuba@...nel.org>,
 Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 netdev@...r.kernel.org
Subject: Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS

On 7/3/23 15:42, Sagi Grimberg wrote:
> 
>>>> Hannes Reinecke <hare@...e.de> wrote:
>>>>
>>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>>
>>>> Can you be more specific about the crash?
>>>
>>> Hannes,
>>>
>>> See:
>>> [PATCH net] nvme-tcp: Fix comma-related oops
>>
>> Ah, right. That solves _that_ issue.
>>
>> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to 
>> your suggestion). Investigating.
> 
> Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
> does not interpret this as a transient status and simply returns from
> io_work?
> 
Unfortunately, yes.

static int tls_rx_reader_acquire(struct sock *sk, struct 
tls_sw_context_rx *ctx,
                                  bool nonblock)
{
         long timeo;

         timeo = sock_rcvtimeo(sk, nonblock);

         while (unlikely(ctx->reader_present)) {
                 DEFINE_WAIT_FUNC(wait, woken_wake_function);

                 ctx->reader_contended = 1;

                 add_wait_queue(&ctx->wq, &wait);
                 sk_wait_event(sk, &timeo,
                               !READ_ONCE(ctx->reader_present), &wait);

and sk_wait_event() does:
#define sk_wait_event(__sk, __timeo, __condition, __wait)              \
         ({      int __rc;                                              \
                 __sk->sk_wait_pending++;                               \
                 release_sock(__sk);                                    \
                 __rc = __condition;                                    \
                 if (!__rc) {                                           \
                         *(__timeo) = wait_woken(__wait,                \
                                                 TASK_INTERRUPTIBLE,    \
                                                 *(__timeo));           \
                 }                                                      \
                 sched_annotate_sleep();                                \
                 lock_sock(__sk);                                       \
                 __sk->sk_wait_pending--;                               \
                 __rc = __condition;                                    \
                 __rc;                                                  \
         })

so not calling 'lock_sock()' in tls_tx_reader_acquire() helps only _so_ 
much, we're still deadlocking.

Cheers,

Hannes