linux-kernel - Re: [PATCH v4] nvme-tcp: fix connect failure on receiving partial ICResp PDU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADUfDZq3ZqpP9hZhgG1gzyyggG6SCQiHqHdgbm95TU2oZ5QFUQ@mail.gmail.com>
Date: Sun, 2 Feb 2025 16:19:01 -0800
From: Caleb Sander <csander@...estorage.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Hannes Reinecke <hare@...e.de>, Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>, 
	Christoph Hellwig <hch@....de>, Maurizio Lombardi <mlombard@...hat.com>, linux-nvme@...ts.infradead.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] nvme-tcp: fix connect failure on receiving partial
 ICResp PDU

On Fri, Jan 31, 2025 at 12:17 AM Sagi Grimberg <sagi@...mberg.me> wrote:
...
> Caleb, can you please make sure to test this patch with TLS?

Can you point me to some documentation for that? I tried setting up a
nvmet_tcp port to use TLS and connecting to it as described in the
cover letter for the patch series (https://lwn.net/Articles/941139/).
But the TLS Server Hello seems to fail with EACCES. Any idea what I'm
doing wrong?
$ modprobe nvmet_tcp
$ modprobe null_blk nr_devices=1
$ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet
$ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/attr_allow_any_host
$ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1
$ echo /dev/nullb0 >
/sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/device_path
$ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/enable
$ mkdir /sys/kernel/config/nvmet/ports/1
$ echo tcp > /sys/kernel/config/nvmet/ports/1/addr_trtype
$ echo ipv4 > /sys/kernel/config/nvmet/ports/1/addr_adrfam
$ echo 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
$ echo 4420 > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
$ echo required > /sys/kernel/config/nvmet/ports/1/addr_treq
$ echo tls1.3 > /sys/kernel/config/nvmet/ports/1/addr_tsas
$ ln -s /sys/kernel/config/nvmet/subsystems/nqn.nvmet
/sys/kernel/config/nvmet/ports/1/subsystems
$ nvme gen-tls-key --subsysnqn nqn.nvmet --insert
Inserted TLS key 005e8a74
$ nvme gen-tls-key --subsysnqn nqn.2014-08.org.nvmexpress.discovery --insert
Inserted TLS key 22d676b8
$ tlshd &
$ nvme discover --transport tcp --traddr 127.0.0.1 --trsvcid 4420 --tls
Failed to write to /dev/nvme-fabrics: Input/output error
failed to add controller, error failed to write to nvme-fabrics device

With debug logs enabled, I see the following:
$ dmesg | tail -6
[  440.405298] nvme nvme0: connecting queue 0
[  440.405403] nvmet_tcp: queue 0: TLS ServerHello
[  440.405433] nvme nvme0: queue 0: start TLS with key 11b456f9
[  440.407836] nvmet_tcp: queue 0: TLS handshake done, key 0, status -13
[  440.422881] nvme nvme0: queue 0: TLS handshake done, key 0, status -13
[  440.422932] nvme nvme0: queue 0: TLS handshake complete, error 13

A tcpdump shows the host sending a TLS Client Hello packet and the
target immediately closing the connection.

> Do you have a reliable way to reproduce this?

Sure, here's a fake Python NVMe/TCP controller that immediately closes
each connection after receiving the ICReq PDU:
```
import socket

def recv_all(socket, length):
    result = b''
    while len(result) < length:
        result += socket.recv(length - len(result))
    return result

listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_sock.bind(('', 4420))
listen_sock.listen()
while True:
    client_sock, _ = listen_sock.accept()
    recv_all(client_sock, 128) # ICReq
    client_sock.close()
```
Attempting to connect to it reports an error about the ICResp PDU type
field even though no ICResp was sent:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Invalid argument
could not add new controller: invalid arguments/configuration
$ dmesg | tail -1
[1351639.614853] nvme_tcp: queue 0: bad type returned 0

Here's a valid scenario where the controller sends the ICResp Common
Header and PDU Specific Header separately (using TCP_NODELAY to ensure
the sends are not coalesced):
```
import socket

def recv_all(socket, length):
    result = b''
    while len(result) < length:
        result += socket.recv(length - len(result))
    return result

def send_all(socket, data):
    while data:
        data = data[socket.send(data):]

listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
listen_sock.bind(('', 4420))
listen_sock.listen()
while True:
    client_sock, _ = listen_sock.accept()
    recv_all(client_sock, 128) # ICReq
    common_header = bytes([
        0x01, # PDU-type
        0, # FLAGS
        128, # HLEN
        0, # PDO
        128, 0, 0, 0, # PLEN
    ])
    send_all(client_sock, common_header)
    ic_resp = bytes([
        0, 0, # PFV
        0, # CPDA
        0, # DGST
        0xFC, 0xFF, 0xFF, 0xFF, # MAXH2CDATA
    ] + [0] * 112)
    send_all(client_sock, ic_resp)
    client_sock.close()
```
The host refuses to connect, complaining that the MAXH2CDATA field in
the ICResp is invalid. But that is because it only received the Common
Header.
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Invalid argument
could not add new controller: invalid arguments/configuration
$ dmesg | tail -1
[1351960.082011] nvme_tcp: queue 0: invalid maxh2cdata returned 0

With the patch applied, the controller closing the connection without
sending a ICResp PDU correctly results in a "Connection reset by peer"
error:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Connection reset by peer
could not add new controller: failed to write to nvme-fabrics device
$ dmesg | tail -1
[  450.050463] nvme_tcp: queue 0: failed to receive icresp, error 0

And when the controller sends the Common Header separately from the
PDU Specific Header, nvme_tcp_init_connection() now succeeds. The
connection attempt instead times out waiting for a response to the
Fabrics Connect command, since the fake NVMe/TCP controller doesn't
implement that:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Input/output error
could not add new controller: failed to write to nvme-fabrics device
$ dmesg | tail -3
[  644.728894] nvme nvme0: I/O tag 0 (0000) type 4 opcode 0x7f
(Connect) QID 0 timeout
[  644.728974] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[  644.728999] nvme nvme0: failed to connect queue: 0 ret=881

Thanks,
Caleb