linux-kernel - Re: [PATCH v8 10/10] nvme: Atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2ddb92d2-97e8-4eb3-9c76-8c5438bb2a44@oracle.com>
Date: Mon, 17 Jun 2024 19:04:23 +0100
From: John Garry <john.g.garry@...cle.com>
To: Kanchan Joshi <joshi.k@...sung.com>, axboe@...nel.dk, kbusch@...nel.org,
        hch@....de, sagi@...mberg.me, jejb@...ux.ibm.com,
        martin.petersen@...cle.com, viro@...iv.linux.org.uk,
        brauner@...nel.org, dchinner@...hat.com, jack@...e.cz
Cc: djwong@...nel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
        linux-scsi@...r.kernel.org, ojaswin@...ux.ibm.com, linux-aio@...ck.org,
        linux-btrfs@...r.kernel.org, io-uring@...r.kernel.org,
        nilay@...ux.ibm.com, ritesh.list@...il.com, willy@...radead.org,
        agk@...hat.com, snitzer@...nel.org, mpatocka@...hat.com,
        dm-devel@...ts.linux.dev, hare@...e.de,
        Alan Adamson <alan.adamson@...cle.com>
Subject: Re: [PATCH v8 10/10] nvme: Atomic write support

On 17/06/2024 18:24, Kanchan Joshi wrote:
> On 6/10/2024 4:13 PM, John Garry wrote:
>> +static bool nvme_valid_atomic_write(struct request *req)
>> +{
>> +	struct request_queue *q = req->q;
>> +	u32 boundary_bytes = queue_atomic_write_boundary_bytes(q);
>> +
>> +	if (blk_rq_bytes(req) > queue_atomic_write_unit_max_bytes(q))
>> +		return false;
>> +
>> +	if (boundary_bytes) {
>> +		u64 mask = boundary_bytes - 1, imask = ~mask;
>> +		u64 start = blk_rq_pos(req) << SECTOR_SHIFT;
>> +		u64 end = start + blk_rq_bytes(req) - 1;
>> +
>> +		/* If greater then must be crossing a boundary */
>> +		if (blk_rq_bytes(req) > boundary_bytes)
>> +			return false;
> 
> Nit: I'd cache blk_rq_bytes(req), since that is repeating and this
> function is called for each atomic IO.

blk_rq_bytes() is just a wrapper for rq->__data_len. I suppose that we 
could cache that value to stop re-reading that memory, but I would 
hope/expect that memory to be in the CPU cache anyway.

> 
>> +
>> +		if ((start & imask) != (end & imask))
>> +			return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>    static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>>    		struct request *req, struct nvme_command *cmnd,
>>    		enum nvme_opcode op)
>> @@ -941,6 +965,12 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>>    
>>    	if (req->cmd_flags & REQ_RAHEAD)
>>    		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
>> +	/*
>> +	 * Ensure that nothing has been sent which cannot be executed
>> +	 * atomically.
>> +	 */
>> +	if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
>> +		return BLK_STS_INVAL;
>>    
> 
> Is this validity check specific to NVMe or should this be moved up to
> block layer as it also knows the limits?

Only NVMe supports an LBA space boundary, so that part is specific to NVMe.

Regardless, the block layer already should ensure that the atomic write 
length and boundary is respected. nvme_valid_atomic_write() is just an 
insurance policy against the block layer or some other component not 
doing its job.

For SCSI, the device would error - for example - if the atomic write 
length was larger than the device supported. NVMe silently just does not 
execute the write atomically in that scenario, which we must avoid.

Thanks,
John