lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1186271.1628174281@warthog.procyon.org.uk>
Date:   Thu, 05 Aug 2021 15:38:01 +0100
From:   David Howells <dhowells@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     dhowells@...hat.com, linux-fsdevel@...r.kernel.org,
        jlayton@...nel.org, Christoph Hellwig <hch@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        dchinner@...hat.com, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Could it be made possible to offer "supplementary" data to a DIO write ?

Matthew Wilcox <willy@...radead.org> wrote:

> You can already get 400Gbit ethernet.

Sorry, but that's not likely to become relevant any time soon.  Besides, my
laptop's wifi doesn't really do that yet.

> Saving 500 bytes by sending just the 12 bytes that changed is optimising the
> wrong thing.

In one sense, at least, you're correct.  The cost of setting up an RPC to do
the write and setting up crypto is high compared to transmitting 3 bytes vs 4k
bytes.

> If you have two clients accessing the same file at byte granularity, you've
> already lost.

Doesn't stop people doing it, though.  People have sqlite, dbm, mail stores,
whatever in the homedirs from the desktop environments.  Granted, most of the
time people don't log in twice with the same homedir from two different
machines (and it doesn't - or didn't - used to work with Gnome or KDE).

> Extent based filesystems create huge extents anyway:

Okay, so it's not feasible.  That's fine.

> This has already happened when you initially wrote to the file backing
> the cache.  Updates are just going to write to the already-allocated
> blocks, unless you've done something utterly inappropriate to the
> situation like reflinked the files.

Or the file is being read random-access and we now have a block we didn't have
before that is contiguous to another block we already have.

> If you want to take leases at byte granularity, and then not writeback
> parts of a page that are outside that lease, feel free.  It shouldn't
> affect how you track dirtiness or how you writethrough the page cache
> to the disk cache.

Indeed.  Handling writes to the local disk cache is different from handling
writes to the server(s).  The cache has a larger block size but I don't have
to worry about third-party conflicts on it, whereas the server can be taken as
having no minimum block size, but my write can clash with someone else's.

Generally, I prefer to write back the minimum I can get away with (as does the
Linux NFS client AFAICT).

However, if everyone agrees that we should only ever write back a multiple of
a certain block size, even to network filesystems, what block size should that
be?  Note that PAGE_SIZE varies across arches and folios are going to
exacerbate this.  What I don't want to happen is that you read from a file, it
creates, say, a 4M (or larger) folio; you change three bytes and then you're
forced to write back the entire 4M folio.

Note that when content crypto or compression is employed, some multiple of the
size of the encrypted/compressed blocks would be a requirement.

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ