lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <253wmxvuq2z.fsf@nvidia.com> Date: Wed, 16 Aug 2023 15:30:44 +0300 From: Aurelien Aptel <aaptel@...dia.com> To: Sagi Grimberg <sagi@...mberg.me>, linux-nvme@...ts.infradead.org, netdev@...r.kernel.org, hch@....de, kbusch@...nel.org, axboe@...com, chaitanyak@...dia.com, davem@...emloft.net, kuba@...nel.org Cc: Boris Pismenny <borisp@...dia.com>, aurelien.aptel@...il.com, smalin@...dia.com, malin1024@...il.com, ogerlitz@...dia.com, yorayz@...dia.com, galshalom@...dia.com, mgurtovoy@...dia.com Subject: Re: [PATCH v12 07/26] nvme-tcp: Add DDP offload control path Sagi Grimberg <sagi@...mberg.me> writes: >>>> + if (!netdev || !queue) >>>> + return false; >>> >>> Is it reasonable to be called here with !netdev or !queue ? >> >> The check is needed only for the IO queue case but we can move it >> earlier in nvme_tcp_start_queue(). > > I still don't understand even on io queues how this can happen. In case where the netdev does not support DDP, when the admin queue is started, netdev->offloading_netdev will not be set and therefore will be NULL. Later, when the IO queue is started: netdev = queue->ctrl->offloading_netdev; <== NULL if (is_netdev_ulp_offload_active(netdev, queue)) { <== we pass NULL We can move the NULL check higher-up if you prefer, like so: if (queue->ctrl->offloading_netdev && is_netdev_ulp_offload_active(queue->ctrl->offloading_netdev, queue)) { >>>> + >>>> + /* If we cannot query the netdev limitations, do not offload */ >>>> + if (!nvme_tcp_ddp_query_limits(netdev, queue)) >>>> + return false; >>>> + >>>> + /* If netdev supports nvme-tcp ddp offload, we can offload */ >>>> + if (test_bit(ULP_DDP_C_NVME_TCP_BIT, netdev->ulp_ddp_caps.active)) >>>> + return true; >>> >>> This should be coming from the API itself, have the limits query >>> api fail if this is off. >> >> We can move the function to the ULP DDP layer. >> >>> btw, what is the active thing? is this driven from ethtool enable? >>> what happens if the user disables it while there is a ulp using it? >> >> The active bits are indeed driven by ethtool according to the design >> Jakub suggested. >> The nvme-tcp connection will have to be reconnected to see the effect of >> changing the bit. > > It should move inside the api as well. Don't want to care about it in > nvme. Ok, we will move it there. >>>> + >>>> + return false; >>> >>> This can be folded to the above function. >> >> We won't be able to check for TLS in a common wrapper. We think this >> should be kept. > > Why? any tcp ddp need to be able to support tls. Nothing specific to > nvme here. True, we will move it to the ULP wrapper. >>>> + /* >>>> + * The atomic operation guarantees that we don't miss any NIC driver >>>> + * resync requests submitted after the above checks. >>>> + */ >>>> + if (atomic64_cmpxchg(&queue->resync_req, pdu_val, >>>> + pdu_val & ~ULP_DDP_RESYNC_PENDING) != >>>> + atomic64_read(&queue->resync_req)) >>>> + netdev->netdev_ops->ulp_ddp_ops->resync(netdev, >>>> + queue->sock->sk, >>>> + pdu_seq); >>> >>> Who else is doing an atomic on this value? and what happens >>> if the cmpxchg fails? >> >> The driver thread can set queue->resync_req concurrently in patch >> "net/mlx5e: NVMEoTCP, data-path for DDP+DDGST offload" in function >> nvmeotcp_update_resync(). >> >> If the cmpxchg fails it means a new resync request was triggered by the >> HW, the old request will be dropped and the new one will be processed by >> a later PDU. > > So resync_req is actually the current tcp sequence number or something? > The name resync_req is very confusing. queue->resync_req is the TCP sequence for which the HW requested a resync operation. We can rename it with queue->resync_tcp_seq. >>>> + ret = nvme_tcp_offload_socket(queue); >>>> + if (ret) { >>>> + dev_info(nctrl->device, >>>> + "failed to setup offload on queue %d ret=%d\n", >>>> + idx, ret); >>>> + } >>>> + } >>>> + } else { >>>> ret = nvmf_connect_admin_queue(nctrl); >>>> + if (ret) >>>> + goto err; >>>> >>>> - if (!ret) { >>>> - set_bit(NVME_TCP_Q_LIVE, &queue->flags); >>>> - } else { >>>> - if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags)) >>>> - __nvme_tcp_stop_queue(queue); >>>> - dev_err(nctrl->device, >>>> - "failed to connect queue: %d ret=%d\n", idx, ret); >>>> + netdev = get_netdev_for_sock(queue->sock->sk); >>> >>> Is there any chance that this is a different netdev than what is >>> already recorded? doesn't make sense to me. >> >> The idea is that we are first starting the admin queue, which looks up >> the netdev associated with the socket and stored in the queue. Later, >> when the IO queues are started, we use the recorded netdev. >> >> In cases of bonding or vlan, a netdev can have lower device links, which >> get_netdev_for_sock() will look up. > > I think the code should in high level do: > if (idx) { > ret = nvmf_connect_io_queue(nctrl, idx); > if (ret) > goto err; > if (nvme_tcp_ddp_query_limits(queue)) > nvme_tcp_offload_socket(queue); > > } else { > ret = nvmf_connect_admin_queue(nctrl); > if (ret) > goto err; > ctrl->ddp_netdev = get_netdev_for_sock(queue->sock->sk); > if (nvme_tcp_ddp_query_limits(queue)) > nvme_tcp_offload_apply_limits(queue); > } Ok, we will follow this design. > ctrl->ddp_netdev should be cleared and put when the admin queue > is stopped/freed, similar to how async_req is handled. Thanks, we will clear ddp_netdev on queue stop/free. This will also prevent reusing a potentially wrong netdev after a reconnection. >>>> + /* >>>> + * release the device as no offload context is >>>> + * established yet. >>>> + */ >>>> + dev_put(netdev); >>> >>> the put is unclear, what does it pair with? the get_netdev_for_sock? >> >> Yes, get_netdev_for_sock() takes a reference, which we don't need at >> that point so we put it. > > Well, you store a pointer to it, what happens if it goes away while > the controller is being set up? It's a problem. We will remove the dev_put() to keep the first reference, and only release it when it is cleared from the admin queue.
Powered by blists - more mailing lists