[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2064070-b511-ba6d-bd64-0b3abc208356@grimberg.me>
Date: Mon, 15 Feb 2021 13:29:45 -0800
From: Sagi Grimberg <sagi@...mberg.me>
To: Daniel Wagner <dwagner@...e.de>, Hannes Reinecke <hare@...e.de>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme-tcp: Check if request has started before processing
it
>>>>>> blk_mq_tag_to_rq() will always return a request if the command_id is
>>>>>> in the valid range. Check if the request has been started. If we
>>>>>> blindly process the request we might double complete a request which
>>>>>> can be fatal.
>>>>>
>>>>> How did you get to this one? did the controller send a completion for
>>>>> a completed/bogus request?
>>>>
>>>> If that is the case, then that must mean it's possible the driver could
>>>> have started the command id just before the bogus completion check. Data
>>>> iorruption, right?
>
> 'during TCP LIF toggles and aggr relocates' testing the host
> crashes. TBH, I do not really know what is happening or what the test
> does. Still trying to figure out what's going on.
Well, I think we should probably figure out why that is happening first.
> I was just very
> surprised how much the code trusts the other side to behave correctly./
What does pci/rdma/fc does differently? What does scsi do here?
>>> Yes, which is why I don't think this check is very useful..
>>
>> I actually view that as a valid protection against spoofed frames.
>> Without it it's easy to crash the machine by injecting fake completions with
>> random command ids.
>
> In this test scenario it's not even a spoofed frame; maybe just a
> confused controller.
Maybe... I am still not sure how this patch helps here though...
Powered by blists - more mailing lists