[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTFqqNMR9i89puxB@infradead.org>
Date: Thu, 4 Dec 2025 03:04:08 -0800
From: Christoph Hellwig <hch@...radead.org>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: linux-block@...r.kernel.org, io-uring@...r.kernel.org,
Vishal Verma <vishal1.verma@...el.com>, tushar.gohad@...el.com,
Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-fsdevel@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org
Subject: Re: [RFC v2 07/11] nvme-pci: implement dma_token backed requests
> +static void nvme_sync_dma(struct nvme_dev *nvme_dev, struct request *req,
> + enum dma_data_direction dir)
> +{
> + struct blk_mq_dma_map *map = req->dma_map;
> + int length = blk_rq_payload_bytes(req);
> + bool for_cpu = dir == DMA_FROM_DEVICE;
> + struct device *dev = nvme_dev->dev;
> + dma_addr_t *dma_list = map->private;
> + struct bio *bio = req->bio;
> + int offset, map_idx;
> +
> + offset = bio->bi_iter.bi_bvec_done;
> + map_idx = offset / NVME_CTRL_PAGE_SIZE;
> + length += offset & (NVME_CTRL_PAGE_SIZE - 1);
> +
> + while (length > 0) {
> + u64 dma_addr = dma_list[map_idx++];
> +
> + if (for_cpu)
> + __dma_sync_single_for_cpu(dev, dma_addr,
> + NVME_CTRL_PAGE_SIZE, dir);
> + else
> + __dma_sync_single_for_device(dev, dma_addr,
> + NVME_CTRL_PAGE_SIZE, dir);
> + length -= NVME_CTRL_PAGE_SIZE;
> + }
This looks really inefficient. Usually the ranges in the dmabuf should
be much larger than a controller page.
> +static void nvme_unmap_premapped_data(struct nvme_dev *dev,
> + struct request *req)
> +{
> + struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> + if (rq_data_dir(req) == READ)
> + nvme_sync_dma(dev, req, DMA_FROM_DEVICE);
> + if (!(iod->flags & IOD_SINGLE_SEGMENT))
> + nvme_free_descriptors(req);
> +}
This doesn't really unmap anything :)
Also the dma ownership rules say that you always need to call the
sync_to_device helpers before I/O and the sync_to_cpu helpers after I/O,
no matters if it is a read or write. The implementations then makes
them a no-op where possible.
> +
> + offset = bio->bi_iter.bi_bvec_done;
> + map_idx = offset / NVME_CTRL_PAGE_SIZE;
> + offset &= (NVME_CTRL_PAGE_SIZE - 1);
> +
> + prp1_dma = dma_list[map_idx++] + offset;
> +
> + length -= (NVME_CTRL_PAGE_SIZE - offset);
> + if (length <= 0) {
> + prp2_dma = 0;
Urgg, why is this building PRPs instead of SGLs? Yes, SGLs are an
optional feature, but for devices where you want to micro-optimize
like this I think we should simply require them. This should cut
down on both the memory use and the amount of special mapping code.
Powered by blists - more mailing lists