lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 16 Jan 2024 11:35:47 +0000
From: John Garry <john.g.garry@...cle.com>
To: Christoph Hellwig <hch@....de>
Cc: "Darrick J. Wong" <djwong@...nel.org>, axboe@...nel.dk, kbusch@...nel.org,
        sagi@...mberg.me, jejb@...ux.ibm.com, martin.petersen@...cle.com,
        viro@...iv.linux.org.uk, brauner@...nel.org, dchinner@...hat.com,
        jack@...e.cz, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        tytso@....edu, jbongio@...gle.com, linux-scsi@...r.kernel.org,
        ming.lei@...hat.com, jaswin@...ux.ibm.com, bvanassche@....org
Subject: Re: [PATCH v2 00/16] block atomic writes

On 21/12/2023 13:22, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 01:18:33PM +0000, John Garry wrote:
>>> For SGL-capable devices that would be
>>> BIO_MAX_VECS, otherwise 1.
>> ok, but we would need to advertise that or whatever segment limit. A statx
>> field just for that seems a bit inefficient in terms of space.
> I'd rather not hard code BIO_MAX_VECS in the ABI, which suggest we
> want to export is as a field.  Network file systems also might have
> their own limits for one reason or another.

Hi Christoph,

I have been looking at this issue again and I am not sure if telling the 
user the max number of segments allowed is the best option. I’m worried 
that resultant atomic write unit max will be too small.

The background again is that we want to tell the user what the maximum 
atomic write unit size is, such that we can always guarantee to fit the 
write in a single bio. And there would be no iovec length or alignment 
rules.

The max segments value advertised would be min(queue max segments, 
BIO_MAX_VECS), so it would be 256 when the request queue is not limiting.

The worst case scenario for iovec layout (most inefficient) which the 
user could provide would be like .iov_base = 0x...0E00 and .iov_length = 
0x400, which would mean that we would have 2x pages and 2x DMA sg elems 
required for each 1024B-length iovec. I am assuming that we will still 
use the direct IO rule of LBS length and alignment.

As such, we then need to set atomic write unit max = min(queue max 
segments, BIO_MAX_VECS) * LBS. That would mean atomic write unit max 256 
* 512 = 128K (for 512B LBS). For a DMA controller of max segments 64, 
for example, then we would have 32K. These seem too low.

Alternative I'm thinking that we should just limit to 1x iovec always, 
and then atomic write unit max = (min(queue max segments, BIO_MAX_VECS) 
- 1) * PAGE_SIZE [ignoring first/last iovec contents]. It also makes 
support for non-enterprise NVMe drives more straightforward. If someone 
wants, they can introduce support for multi-iovec later, but it would 
prob require some more iovec length/alignment rules.

Please let me know your thoughts.

Thanks,
John


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ