linux-ext4 - Re: always fall back to buffered I/O after invalidation failures, was: Re: [PATCH 2/6] iomap: IOMAP_DIO_RWF_NO_STALE_PAGECACHE return if page invalidation fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200708135437.GP25523@casper.infradead.org>
Date:   Wed, 8 Jul 2020 14:54:37 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     Christoph Hellwig <hch@....de>,
        Goldwyn Rodrigues <rgoldwyn@...e.de>,
        linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org,
        fdmanana@...il.com, dsterba@...e.cz, darrick.wong@...cle.com,
        cluster-devel@...hat.com, linux-ext4@...r.kernel.org,
        linux-xfs@...r.kernel.org
Subject: Re: always fall back to buffered I/O after invalidation failures,
 was: Re: [PATCH 2/6] iomap: IOMAP_DIO_RWF_NO_STALE_PAGECACHE return if page
 invalidation fails

On Wed, Jul 08, 2020 at 04:51:27PM +1000, Dave Chinner wrote:
> On Tue, Jul 07, 2020 at 03:00:30PM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 07, 2020 at 01:57:05PM +0100, Matthew Wilcox wrote:
> > > Indeed, I'm in favour of not invalidating
> > > the page cache at all for direct I/O.  For reads, I think the page cache
> > > should be used to satisfy any portion of the read which is currently
> > > cached.  For writes, I think we should write into the page cache pages
> > > which currently exist, and then force those pages to be written back,
> > > but left in cache.
> > 
> > Something like that, yes.
> 
> So are we really willing to take the performance regression that
> occurs from copying out of the page cache consuming lots more CPU
> than an actual direct IO read? Or that direct IO writes suddenly
> serialise because there are page cache pages and now we have to do
> buffered IO?
> 
> Direct IO should be a deterministic, zero-copy IO path to/from
> storage. Using the CPU to copy data during direct IO is the complete
> opposite of the intended functionality, not to mention the behaviour
> that many applications have been careful designed and tuned for.

Direct I/O isn't deterministic though.  If the file isn't shared, then
it works great, but as soon as you get mixed buffered and direct I/O,
everything is already terrible.  Direct I/Os perform pagecache lookups
already, but instead of using the data that we found in the cache, we
(if it's dirty) write it back, wait for the write to complete, remove
the page from the pagecache and then perform another I/O to get the data
that we just wrote out!  And then the app that's using buffered I/O has
to read it back in again.

Nobody's proposing changing Direct I/O to exclusively work through the
pagecache.  The proposal is to behave less weirdly when there's already
data in the pagecache.

I have had an objection raised off-list.  In a scenario with a block
device shared between two systems and an application which does direct
I/O, everything is normally fine.  If one of the systems uses tar to
back up the contents of the block device then the application on that
system will no longer see the writes from the other system because
there's nothing to invalidate the pagecache on the first system.

Unfortunately, this is in direct conflict with the performance
problem caused by some little arsewipe deciding to do:

$ while true; do dd if=/lib/x86_64-linux-gnu/libc-2.30.so iflag=direct of=/dev/null; done

... doesn't hurt me because my root filesystem is on ext4 which doesn't
purge the cache.  But anything using iomap gets all the pages for libc
kicked out of the cache, and that's a lot of fun.