lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e1600a7-24d4-454a-87d6-6528b1c4c88d@pengutronix.de>
Date: Wed, 7 May 2025 18:24:34 +0200
From: Ahmad Fatoum <a.fatoum@...gutronix.de>
To: "Darrick J. Wong" <djwong@...nel.org>, Christoph Hellwig <hch@....de>,
 Jens Axboe <axboe@...nel.dk>, Ulf Hansson <ulf.hansson@...aro.org>
Cc: Jan Lübbe <jlu@...gutronix.de>,
 "kernel@...gutronix.de" <kernel@...gutronix.de>,
 linux-block@...r.kernel.org, linux-mmc@...r.kernel.org,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: [RFC] block: Make MMC respect REQ_NOUNMAP?

Hi,

We would like to tell storage (mostly eMMC, but not exclusively though) to
discard a block range, so it reads back zeroes and is preferably unmapped
to give the storage device the most flexibility. For eMMCs, this is possible
right now with blkdiscard -z (BLKZEROOUT), but digging through the code with Jan,
I am starting to question whether eMMC is correctly implementing REQ_OP_WRITE_ZEROES
(granted, the expected semantics seem to be spelt out nowhere).

Read along for what MMC does, what semantics BLKZEROOUT seems to should have,
and what I think might need to be done to address this.


eMMC supports a number of different commands for erasing/discarding data[1].
Relevant to my question are two commands:

  DISCARD: Host no longer needs the data and doesn't care about value
           read from it. Card may remove or unmap portions or all of it.

  TRIM:    implies DISCARD and additionally guarantees to
           return all-zero or all-ones on read.

These are made available to the block layer as follows:

  REQ_OP_DISCARD      -> DISCARD
  REQ_OP_WRITE_ZEROES -> TRIM (if erased_byte == 0, otherwise
  if all-ones, blk_queue_max_write_zeroes_sectors() is not called)

Looking at it from the ioctl side:

blkdev_common_ioctl(..., BLKZEROOUT, ...)
    blk_ioctl_zeroout
        blkdev_issue_zeroout(..., BLKDEV_ZERO_NOUNMAP)
            __blkdev_issue_write_zeroes(..., BLKDEV_ZERO_NOUNMAP)
	    submit_bio_wait(bio->bi_opf = REQ_NOUNMAP)

__REQ_NOUNMAP has comment saying 'do not free blocks when zeroing',
but as shown above, TRIM allows the card to unmap the indicated region.

REQ_NOUNMAP has no other documentation, but virtio inverts it and translates
it to VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP, which is documented as follows[2]:

"For write zeroes commands, if the unmap is set, the device MAY deallocate
the specified range of sectors in the device backend storage, as if the
discard command had been sent.".

I.e. REQ_NOUNMAP -> device MAY NOT deallocate

This is at odds with the MMC implementation, which ignores REQ_NOUNMAP
completely it seems. I don't believe there is a MMC command for write zeroes
without discard short of actually writing zeroes, so it sounds like the
correct implementation for MMC would be:

	if (req->cmd_flags & REQ_NOUNMAP)
		// or w/e causes fallback to __blkdev_issue_zero_pages
		return -EOPNOTSUPP;
	// do TRIM as before

Of course, this will change user visible behavior: blkdiscard -z will start
taking much longer for most users. These users will have to migrate to
using fallocate instead:

blkdev_fallocate(mode = FALLOC_FL_ZERO_RANGE):
	blkdev_issue_zeroout(..., BLKDEV_ZERO_NOUNMAP)

blkdev_fallocate(mode = FALLOC_FL_PUNCH_HOLE)
	blkdev_issue_zeroout(..., BLKDEV_ZERO_NOFALLBACK)

So, it's not a drop-in replacement. I guess user code can punch hole
with fallback to BLKZEROOUT if it fails in order to get back the old
behavior.

I must admit I don't even know why one would write zeroes and care about
them remaining mapped on the storage device, but that seems to be
what's expected with BLKZEROOUT.


Thoughts? What did I miss?

Thanks,
Ahmad

[1]: For a short and incomplete summary, see:
     https://github.com/barebox/barebox/commit/91a11c7d50df91
[1]: https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html
     5.2.6.2 Device Requirements: Device Operation


-- 
Pengutronix e.K.                  |                             |
Steuerwalder Str. 21              | http://www.pengutronix.de/  |
31137 Hildesheim, Germany         | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686  | Fax:   +49-5121-206917-5555 |


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ