linux-kernel - Re: [PATCH v4] nvme-tcp: fix connect failure on receiving partial ICResp PDU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17d0b30a-758c-4a84-9879-0a070656f15e@suse.de>
Date: Mon, 3 Feb 2025 08:20:08 +0100
From: Hannes Reinecke <hare@...e.de>
To: Caleb Sander <csander@...estorage.com>, Sagi Grimberg <sagi@...mberg.me>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Maurizio Lombardi <mlombard@...hat.com>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] nvme-tcp: fix connect failure on receiving partial
 ICResp PDU

On 2/3/25 01:19, Caleb Sander wrote:
> On Fri, Jan 31, 2025 at 12:17 AM Sagi Grimberg <sagi@...mberg.me> wrote:
> ...
>> Caleb, can you please make sure to test this patch with TLS?
> 
> Can you point me to some documentation for that? I tried setting up a
> nvmet_tcp port to use TLS and connecting to it as described in the
> cover letter for the patch series (https://lwn.net/Articles/941139/).
> But the TLS Server Hello seems to fail with EACCES. Any idea what I'm
> doing wrong?
> $ modprobe nvmet_tcp
> $ modprobe null_blk nr_devices=1
> $ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet
> $ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/attr_allow_any_host
> $ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1
> $ echo /dev/nullb0 >
> /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/device_path
> $ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/enable
> $ mkdir /sys/kernel/config/nvmet/ports/1
> $ echo tcp > /sys/kernel/config/nvmet/ports/1/addr_trtype
> $ echo ipv4 > /sys/kernel/config/nvmet/ports/1/addr_adrfam
> $ echo 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
> $ echo 4420 > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
> $ echo required > /sys/kernel/config/nvmet/ports/1/addr_treq
> $ echo tls1.3 > /sys/kernel/config/nvmet/ports/1/addr_tsas
> $ ln -s /sys/kernel/config/nvmet/subsystems/nqn.nvmet
> /sys/kernel/config/nvmet/ports/1/subsystems
> $ nvme gen-tls-key --subsysnqn nqn.nvmet --insert
> Inserted TLS key 005e8a74
> $ nvme gen-tls-key --subsysnqn nqn.2014-08.org.nvmexpress.discovery --insert
> Inserted TLS key 22d676b8
> $ tlshd &
> $ nvme discover --transport tcp --traddr 127.0.0.1 --trsvcid 4420 --tls
> Failed to write to /dev/nvme-fabrics: Input/output error
> failed to add controller, error failed to write to nvme-fabrics device
> 
> With debug logs enabled, I see the following:
> $ dmesg | tail -6
> [  440.405298] nvme nvme0: connecting queue 0
> [  440.405403] nvmet_tcp: queue 0: TLS ServerHello
> [  440.405433] nvme nvme0: queue 0: start TLS with key 11b456f9
> [  440.407836] nvmet_tcp: queue 0: TLS handshake done, key 0, status -13
> [  440.422881] nvme nvme0: queue 0: TLS handshake done, key 0, status -13
> [  440.422932] nvme nvme0: queue 0: TLS handshake complete, error 13
> 
> A tcpdump shows the host sending a TLS Client Hello packet and the
> target immediately closing the connection.
> 

The PSK identification has to contain the host NQN _and_ the target NQN.
So you need to call gen-tls-key with both in order to generate a PSK
with an identitify which can be found for a connection attempt.

>> Do you have a reliable way to reproduce this?
> 
> Sure, here's a fake Python NVMe/TCP controller that immediately closes
> each connection after receiving the ICReq PDU:
> ```
> import socket
> 
> def recv_all(socket, length):
>      result = b''
>      while len(result) < length:
>          result += socket.recv(length - len(result))
>      return result
> 
> listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> listen_sock.bind(('', 4420))
> listen_sock.listen()
> while True:
>      client_sock, _ = listen_sock.accept()
>      recv_all(client_sock, 128) # ICReq
>      client_sock.close()
> ```
> Attempting to connect to it reports an error about the ICResp PDU type
> field even though no ICResp was sent:
> $ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
> Failed to write to /dev/nvme-fabrics: Invalid argument
> could not add new controller: invalid arguments/configuration
> $ dmesg | tail -1
> [1351639.614853] nvme_tcp: queue 0: bad type returned 0
> 
> Here's a valid scenario where the controller sends the ICResp Common
> Header and PDU Specific Header separately (using TCP_NODELAY to ensure
> the sends are not coalesced):
> ```
> import socket
> 
> def recv_all(socket, length):
>      result = b''
>      while len(result) < length:
>          result += socket.recv(length - len(result))
>      return result
> 
> def send_all(socket, data):
>      while data:
>          data = data[socket.send(data):]
> 
> listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> listen_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
> listen_sock.bind(('', 4420))
> listen_sock.listen()
> while True:
>      client_sock, _ = listen_sock.accept()
>      recv_all(client_sock, 128) # ICReq
>      common_header = bytes([
>          0x01, # PDU-type
>          0, # FLAGS
>          128, # HLEN
>          0, # PDO
>          128, 0, 0, 0, # PLEN
>      ])
>      send_all(client_sock, common_header)
>      ic_resp = bytes([
>          0, 0, # PFV
>          0, # CPDA
>          0, # DGST
>          0xFC, 0xFF, 0xFF, 0xFF, # MAXH2CDATA
>      ] + [0] * 112)
>      send_all(client_sock, ic_resp)
>      client_sock.close()
> ```
> The host refuses to connect, complaining that the MAXH2CDATA field in
> the ICResp is invalid. But that is because it only received the Common
> Header.
> $ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
> Failed to write to /dev/nvme-fabrics: Invalid argument
> could not add new controller: invalid arguments/configuration
> $ dmesg | tail -1
> [1351960.082011] nvme_tcp: queue 0: invalid maxh2cdata returned 0
> 
> With the patch applied, the controller closing the connection without
> sending a ICResp PDU correctly results in a "Connection reset by peer"
> error:
> $ nvme connect --transport tcp --traddr 192.168.2343.066666666671.12 --nqn nqn.abc
> Failed to write to /dev/nvme-fabrics: Connection reset by peer
> could not add new controller: failed to write to nvme-fabrics device
> $ dmesg | tail -1
> [  450.050463] nvme_tcp: queue 0: failed to receive icresp, error 0
> 
> And when the controller sends the Common Header separately from the
> PDU Specific Header, nvme_tcp_init_connection() now succeeds. The
> connection attempt instead times out waiting for a response to the
> Fabrics Connect command, since the fake NVMe/TCP controller doesn't
> implement that:
> $ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
> Failed to write to /dev/nvme-fabrics: Input/output error
> could not add new controller: failed to write to nvme-fabrics device
> $ dmesg | tail -3
> [  644.728894] nvme nvme0: I/O tag 0 (0000) type 4 opcode 0x7f
> (Connect) QID 0 timeout
> [  644.728974] nvme nvme0: Connect command failed, error wo/DNR bit: 881
> [  644.728999] nvme nvme0: failed to connect queue: 0 ret=881
> 
Thanks.

I'll see to give it a spin.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich