[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D7QKRU1EXDXJ.K6ZXC4V4ZD68@bsdbackstore.eu>
Date: Wed, 12 Feb 2025 16:33:41 +0100
From: "Maurizio Lombardi" <mlombard@...backstore.eu>
To: "zhang.guanghui@...tc.cn" <zhang.guanghui@...tc.cn>, "sagi"
<sagi@...mberg.me>, "mgurtovoy" <mgurtovoy@...dia.com>, "kbusch"
<kbusch@...nel.org>, "sashal" <sashal@...nel.org>, "chunguang.xu"
<chunguang.xu@...pee.com>
Cc: "linux-kernel" <linux-kernel@...r.kernel.org>, "linux-nvme"
<linux-nvme@...ts.infradead.org>, "linux-block"
<linux-block@...r.kernel.org>
Subject: Re: nvme-tcp: fix a possible UAF when failing to send request
On Mon Feb 10, 2025 at 8:41 AM CET, zhang.guanghui@...tc.cn wrote:
> Hello
>
> When using the nvme-tcp driver in a storage cluster, the driver may trigger a null pointer causing the host to crash several times.
> By analyzing the vmcore, we know the direct cause is that the request->mq_hctx was used after free.
>
> CPU1 CPU2
>
> nvme_tcp_poll nvme_tcp_try_send --failed to send reqrest 13
>
> nvme_tcp_try_recv nvme_tcp_fail_request
>
> nvme_tcp_recv_skb nvme_tcp_end_request
>
> nvme_tcp_recv_pdu nvme_complete_rq
>
> nvme_tcp_handle_comp nvme_retry_req -- request->mq_hctx have been freed, is NULL.
> nvme_tcp_process_nvme_cqe
>
> nvme_complete_rq
>
> nvme_end_req
>
> blk_mq_end_request
Taking a step back. Let's take a different approach and try to avoid the
double completion.
The problem here is that apparently we received a nvme_tcp_rsp capsule
from the target, meaning that the command has been processed (I guess
the capsule has an error status?)
So maybe only part of the command has been sent?
Why we receive the rsp capsule at all? Shouldn't this be treated as a fatal
error by the controller?
Maurizio
Powered by blists - more mailing lists