linux-kernel - Re: [RFC v2 07/11] nvme-pci: implement dma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTFqqNMR9i89puxB@infradead.org>
Date: Thu, 4 Dec 2025 03:04:08 -0800
From: Christoph Hellwig <hch@...radead.org>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: linux-block@...r.kernel.org, io-uring@...r.kernel.org,
	Vishal Verma <vishal1.verma@...el.com>, tushar.gohad@...el.com,
	Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	Christian König <christian.koenig@....com>,
	linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
	linux-fsdevel@...r.kernel.org, linux-media@...r.kernel.org,
	dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org
Subject: Re: [RFC v2 07/11] nvme-pci: implement dma_token backed requests

> +static void nvme_sync_dma(struct nvme_dev *nvme_dev, struct request *req,
> +			  enum dma_data_direction dir)
> +{
> +	struct blk_mq_dma_map *map = req->dma_map;
> +	int length = blk_rq_payload_bytes(req);
> +	bool for_cpu = dir == DMA_FROM_DEVICE;
> +	struct device *dev = nvme_dev->dev;
> +	dma_addr_t *dma_list = map->private;
> +	struct bio *bio = req->bio;
> +	int offset, map_idx;
> +
> +	offset = bio->bi_iter.bi_bvec_done;
> +	map_idx = offset / NVME_CTRL_PAGE_SIZE;
> +	length += offset & (NVME_CTRL_PAGE_SIZE - 1);
> +
> +	while (length > 0) {
> +		u64 dma_addr = dma_list[map_idx++];
> +
> +		if (for_cpu)
> +			__dma_sync_single_for_cpu(dev, dma_addr,
> +						  NVME_CTRL_PAGE_SIZE, dir);
> +		else
> +			__dma_sync_single_for_device(dev, dma_addr,
> +						     NVME_CTRL_PAGE_SIZE, dir);
> +		length -= NVME_CTRL_PAGE_SIZE;
> +	}

This looks really inefficient.  Usually the ranges in the dmabuf should
be much larger than a controller page.


> +static void nvme_unmap_premapped_data(struct nvme_dev *dev,
> +				      struct request *req)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	if (rq_data_dir(req) == READ)
> +		nvme_sync_dma(dev, req, DMA_FROM_DEVICE);
> +	if (!(iod->flags & IOD_SINGLE_SEGMENT))
> +		nvme_free_descriptors(req);
> +}

This doesn't really unmap anything :)

Also the dma ownership rules say that you always need to call the
sync_to_device helpers before I/O and the sync_to_cpu helpers after I/O,
no matters if it is a read or write.  The implementations then makes
them a no-op where possible.

> +
> +	offset = bio->bi_iter.bi_bvec_done;
> +	map_idx = offset / NVME_CTRL_PAGE_SIZE;
> +	offset &= (NVME_CTRL_PAGE_SIZE - 1);
> +
> +	prp1_dma = dma_list[map_idx++] + offset;
> +
> +	length -= (NVME_CTRL_PAGE_SIZE - offset);
> +	if (length <= 0) {
> +		prp2_dma = 0;

Urgg, why is this building PRPs instead of SGLs?  Yes, SGLs are an
optional feature, but for devices where you want to micro-optimize
like this I think we should simply require them.  This should cut
down on both the memory use and the amount of special mapping code.