linux-kernel - Re: nvme-tcp: fix a possible UAF when failing to send request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <D7QKRU1EXDXJ.K6ZXC4V4ZD68@bsdbackstore.eu>
Date: Wed, 12 Feb 2025 16:33:41 +0100
From: "Maurizio Lombardi" <mlombard@...backstore.eu>
To: "zhang.guanghui@...tc.cn" <zhang.guanghui@...tc.cn>, "sagi"
 <sagi@...mberg.me>, "mgurtovoy" <mgurtovoy@...dia.com>, "kbusch"
 <kbusch@...nel.org>, "sashal" <sashal@...nel.org>, "chunguang.xu"
 <chunguang.xu@...pee.com>
Cc: "linux-kernel" <linux-kernel@...r.kernel.org>, "linux-nvme"
 <linux-nvme@...ts.infradead.org>, "linux-block"
 <linux-block@...r.kernel.org>
Subject: Re: nvme-tcp: fix a possible UAF when failing to send request

On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@...tc.cn wrote:
> Hello 
>
>     When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that  the request->mq_hctx was used after free. 
>
> CPU1                                                                   CPU2
>
> nvme_tcp_poll                                                          nvme_tcp_try_send  --failed to send reqrest 13 
>
>     nvme_tcp_try_recv                                                      nvme_tcp_fail_request
>
>         nvme_tcp_recv_skb                                                      nvme_tcp_end_request
>
>             nvme_tcp_recv_pdu                                                      nvme_complete_rq 
>
>                 nvme_tcp_handle_comp                                                   nvme_retry_req -- request->mq_hctx have been freed, is NULL.               
>                     nvme_tcp_process_nvme_cqe                                                                                    
>
>                         nvme_complete_rq
>
>                             nvme_end_req
>
>                                   blk_mq_end_request

Taking a step back. Let's take a different approach and try to avoid the
double completion.

The problem here is that apparently we received a nvme_tcp_rsp capsule
from the target, meaning that the command has been processed (I guess
the capsule has an error status?)

So maybe only part of the command has been sent?
Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
error by the controller?

Maurizio