lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zcnx1pP_iZBf6Y-t@casper.infradead.org>
Date: Mon, 12 Feb 2024 10:24:22 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Ritesh Harjani <ritesh.list@...il.com>
Cc: "Darrick J. Wong" <djwong@...nel.org>,
	Zhang Yi <yi.zhang@...weicloud.com>, linux-ext4@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, tytso@....edu,
	adilger.kernel@...ger.ca, jack@...e.cz, hch@...radead.org,
	zokeefe@...gle.com, yi.zhang@...wei.com, chengzhihao1@...wei.com,
	yukuai3@...wei.com, wangkefeng.wang@...wei.com
Subject: Re: [RFC PATCH v3 00/26] ext4: use iomap for regular file's buffered
 IO path and enable large foilo

On Mon, Feb 12, 2024 at 02:46:10PM +0530, Ritesh Harjani wrote:
> "Darrick J. Wong" <djwong@...nel.org> writes:
> > though iirc willy never got the performance to match because iomap
> 
> Ohh, can you help me provide details on what performance benchmark was
> run? I can try and run them when I rebase.

I didn't run a benchmark, we just knew what would happen (on rotating
storage anyway).

> > didn't have a mechanism for the caller to tell it "run the IO now even
> > though you don't have a complete page, because the indirect block is the
> > next block after the 11th block".
> 
> Do you mean this for a large folio? I still didn't get the problem you
> are referring here. Can you please help me explain why could that be a
> problem?

A classic ext2 filesystem lays out a 16kB file like this (with 512
byte blocks):

file offset	disk block
0-6KiB		1000-1011
6KiB-16KiB	1013-1032

What's in block 1012?  The indirect block!  The block which tells ext2
that blocks 12-31 of the file are in disk blocks 1013-1032.  So we can't
issue the read for them until we've finished the read for block 1012.

Buffer heads have a solution for this, BH_Boundary.  ext2 sets it for
block 11 which prompts mpage.c to submit the read immediately (see
the various calls to buffer_boundary()).  Then ext2 will submit the read
for block 1012 and the two reads will be coalesced by the IO scheduler.
So we still end up doing two reads instead of one, but that's
unavoidable because fragmentation might have meant that 6KiB-16KiB were
not stored at 1013-1032.

There's no equivalent iomap solution.  What needs to happen is:

 - iomap_folio_state->read_bytes_pending needs to be initialised to
   folio_size(), not 0.
 - Remove "ifs->read_bytes_pending += plen" from iomap_readpage_iter()
 - Subtract plen in the iomap_block_needs_zeroing() case
 - Submit a bio at the end of each iomap_readpage_iter() call

Now iomap will behave the same way as mpage, only without needing a
flag to do it (instead it will assume that the filesystem coalesces
adjacent ranges, which it should do anyway for good performance).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ