linux-kernel - Re: Support for I/O to a bitbucket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200907005642.GN12096@dread.disaster.area>
Date:   Mon, 7 Sep 2020 10:56:42 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org
Subject: Re: Support for I/O to a bitbucket

On Tue, Aug 18, 2020 at 06:22:31PM +0100, Matthew Wilcox wrote:
> One of the annoying things in the iomap code is how we handle
> block-misaligned I/Os.  Consider a write to a file on a 4KiB block size
> filesystem (on a 4KiB page size kernel) which starts at byte offset 5000
> and is 4133 bytes long.
> 
> Today, we allocate page 1 and read bytes 4096-8191 of the file into
> it, synchronously.  Then we allocate page 2 and read bytes 8192-12287
> into it, again, synchronously.  Then we copy the user's data into the
> pagecache and mark it dirty.  This is a fairly significant delay for
> the user who normally sees the latency of a memcpy() now has to wait
> for two non-overlapping reads to complete.
> 
> What I'd love to be able to do is allocate pages 1 & 2, copy the user
> data into it and submit one read which targets:
> 
> 0-903: page 1, offset 0, length 904
> 904-5036: bitbucket, length 4133
> 5037-8191: page 2, offset 942, length 3155
> 
> That way, we don't even need to wait for the read to complete.

I'm not sure that offloading the page cache's job of isolating
unaligned IO from the block layer to the block layer is the write
way to do this.

Essentially you are moving the RMW down in the block layer where it
will have to allocate memory to do IO on sector based boundaries so
it doesn't trash the data you've already copied into the pages in
the bio.

Either way, you need a secondary buffer to do this - one for the
read IO to DMA into with sector alignment, the other to contain the
user data that is sungle byte aligned.

This seems to me like it could be done entirely at the iomap level
just by linking the async read IO buffer back to the page cache page
and holding the "data to copy in" state in a struct attached to the
async IO buffer's page->private. It adds a little complexity to the
read IO completion (i.e. iomap_read_finish()), but it's no worse
than anything we do with write IO completions...

And if the two pages are adjacent like the above, it could be done
with a single async reads, or even two separate async reads that
get merged into one IO at the block layer via plugging...

> Anyway, I don't have time to take on this work, but I thought I'd throw
> it out in case anyone's looking for a project.  Or if it's a stupid idea,
> someone can point out why.

I think it's pretty straight forward to do it in the iomap layer...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com