linux-kernel - Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1989b8fe-7ef2-2145-75c5-5e938f74014c@grimberg.me>
Date:   Tue, 11 May 2021 11:16:10 -0700
From:   Sagi Grimberg <sagi@...mberg.me>
To:     Hannes Reinecke <hare@...e.de>, Keith Busch <kbusch@...nel.org>
Cc:     "Ewan D. Milne" <emilne@...hat.com>,
        Daniel Wagner <dwagner@...e.de>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Jens Axboe <axboe@...com>, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2] nvme-tcp: Check if request has started before
 processing it



On 5/9/21 4:30 AM, Hannes Reinecke wrote:
> On 5/8/21 1:22 AM, Sagi Grimberg wrote:
>>
>>>>> Well, that would require a modification to the CQE specification, no?
>>>>> fmds was not amused when I proposed that :-(
>>>>
>>>> Why would that require a modification to the CQE? it's just using say
>>>> 4 msbits of the command_id to a running sequence...
>>>
>>> I think Hannes was under the impression that the counter proposal wasn't
>>> part of the "command_id". The host can encode whatever it wants in that
>>> value, and the controller just has to return the same value.
>>
>> Yea, maybe something like this?
>> -- 
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index e6612971f4eb..7af48827ea56 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1006,7 +1006,7 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, 
>> struct request *req)
>>                 return BLK_STS_IOERR;
>>         }
>>
>> -       cmd->common.command_id = req->tag;
>> +       cmd->common.command_id = nvme_cid(req);
>>         trace_nvme_setup_cmd(req, cmd);
>>         return ret;
>> }
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 05f31a2c64bb..96abfb0e2ddd 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -158,6 +158,7 @@ enum nvme_quirks {
>> struct nvme_request {
>>         struct nvme_command     *cmd;
>>         union nvme_result       result;
>> +       u8                      genctr;
>>         u8                      retries;
>>         u8                      flags;
>>         u16                     status;
>> @@ -497,6 +498,48 @@ struct nvme_ctrl_ops {
>>         int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
>> };
>>
>> +/*
>> + * nvme command_id is constructed as such:
>> + * | xxxx | xxxxxxxxxxxx |
>> + *   gen    request tag
>> + */
>> +#define nvme_cid_install_genctr(gen)           ((gen & 0xf) << 12)
>> +#define nvme_genctr_from_cid(cid)              ((cid & 0xf000) >> 12)
>> +#define nvme_tag_from_cid(cid)                 (cid & 0xfff)
>> +
> 
> That is a good idea, but we should ensure to limit the number of 
> commands a controller can request, too.

We take the minimum between what the host does vs. what the controller
supports anyways.

> As per spec each controller can support a full 32 bit worth of requests, 
> and if we limit that arbitrarily from the stack we'll need to cap the 
> number of requests a controller or fabrics driver can request.

NVMF_MAX_QUEUE_SIZE is already 1024, you are right that we also need:
--
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 92e03f15c9f6..66a4a7f7c504 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -60,6 +60,7 @@ MODULE_PARM_DESC(sgl_threshold,
                 "Use SGLs when average request segment size is larger 
or equal to "
                 "this size. Use 0 to disable SGLs.");

+#define NVME_PCI_MAX_QUEUE_SIZE 4096
  static int io_queue_depth_set(const char *val, const struct 
kernel_param *kp);
  static const struct kernel_param_ops io_queue_depth_ops = {
         .set = io_queue_depth_set,
@@ -68,7 +69,7 @@ static const struct kernel_param_ops 
io_queue_depth_ops = {

  static unsigned int io_queue_depth = 1024;
  module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 
0644);
-MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
+MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2 and 
<= 4096");

  static int io_queue_count_set(const char *val, const struct 
kernel_param *kp)
  {
@@ -164,6 +165,9 @@ static int io_queue_depth_set(const char *val, const 
struct kernel_param *kp)
         if (ret != 0 || n < 2)
                 return -EINVAL;

+       if (n > NVME_PCI_MAX_QUEUE_SIZE)
+               return -EINVAL;
+
         return param_set_uint(val, kp);
  }

--