lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 05 Aug 2021 17:35:33 +0100
From:   David Howells <dhowells@...hat.com>
To:     Anna Schumaker <anna.schumaker@...app.com>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Jeff Layton <jlayton@...hat.com>,
        Steve French <sfrench@...ba.org>,
        Dominique Martinet <asmadeus@...ewreck.org>,
        Mike Marshall <hubcap@...ibond.com>,
        Miklos Szeredi <miklos@...redi.hu>
Cc:     dhowells@...hat.com,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Shyam Prasad N <nspmangalore@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-cachefs@...hat.com, linux-afs@...ts.infradead.org,
        linux-nfs@...r.kernel.org, linux-cifs@...r.kernel.org,
        ceph-devel@...r.kernel.org, v9fs-developer@...ts.sourceforge.net,
        devel@...ts.orangefs.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Canvassing for network filesystem write size vs page size

With Willy's upcoming folio changes, from a filesystem point of view, we're
going to be looking at folios instead of pages, where:

 - a folio is a contiguous collection of pages;

 - each page in the folio might be standard PAGE_SIZE page (4K or 64K, say) or
   a huge pages (say 2M each);

 - a folio has one dirty flag and one writeback flag that applies to all
   constituent pages;

 - a complete folio currently is limited to PMD_SIZE or order 8, but could
   theoretically go up to about 2GiB before various integer fields have to be
   modified (not to mention the memory allocator).

Willy is arguing that network filesystems should, except in certain very
special situations (eg. O_SYNC), only write whole folios (limited to EOF).

Some network filesystems, however, currently keep track of which byte ranges
are modified within a dirty page (AFS does; NFS seems to also) and only write
out the modified data.

Also, there are limits to the maximum RPC payload sizes, so writing back large
pages may necessitate multiple writes, possibly to multiple servers.

What I'm trying to do is collate each network filesystem's properties (I'm
including FUSE in that).

So we have the following filesystems:

 Plan9
 - Doesn't track bytes
 - Only writes single pages

 AFS
 - Max RPC payload theoretically ~5.5 TiB (OpenAFS), ~16EiB (Auristor/kAFS)
 - kAFS (Linux kernel)
   - Tracks bytes, only writes back what changed
   - Writes from up to 65535 contiguous pages.
 - OpenAFS/Auristor (UNIX/Linux)
   - Deal with cache-sized blocks (configurable, but something from 8K to 2M),
     reads and writes in these blocks
 - OpenAFS/Auristor (Windows)
   - Track bytes, write back only what changed

 Ceph
 - File divided into objects (typically 2MiB in size), which may be scattered
   over multiple servers.
 - Max RPC size is therefore object size.
 - Doesn't track bytes.

 CIFS/SMB
 - Writes back just changed bytes immediately under some circumstances
 - Doesn't track bytes and writes back whole pages otherwise.
 - SMB3 has a max RPC size of 16MiB, with a default of 4MiB

 FUSE
 - Doesn't track bytes.
 - Max 'RPC' size of 256 pages (I think).

 NFS
 - Tracks modified bytes within a page.
 - Max RPC size of 1MiB.
 - Files may be constructed of objects scattered over different servers.

 OrangeFS
 - Doesn't track bytes.
 - Multipage writes possible.

If you could help me fill in the gaps, that would be great.

Thanks,
David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ