lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Sep 2021 16:06:55 -0500
From:   Eric Blake <eblake@...hat.com>
To:     tytso@....edu, linux-ext4@...r.kernel.org
Cc:     libguestfs@...hat.com
Subject: e2fsprogs concurrency questions

TL;DR summary: is there documented best practices for parallel access
to the same inode of an ext2 filesystem from multiple threads?

First, a meta-question: is there a publicly archived mailing list for
questions on e2fsprogs?  The README merely mentions Ted's email
address, and http://e2fsprogs.sourceforge.net/ is silent on contact
information, although with some googling, I found at least
https://patchwork.ozlabs.org/project/linux-ext4/patch/20201205045856.895342-6-tytso@mit.edu/
which suggests linux-ext4@...r.kernel.org as a worthwhile list to
mention on the web page.

Now, on to my real reason for writing.  The nbdkit project is using
the ext2fs library to provide an ext2/3/4 filter on top of any data
being served over NBD (Network Block Device protocol) in userspace:
https://libguestfs.org/nbdkit-ext2-filter.1.html

Searching for the word 'thread' or 'concurrent' in libext2fs.info came
up with no hits, so I'm going off of minimal documentation, and mostly
what I can ascertain from existing examples (of which I'm not seeing
very many).

Right now, the nbdkit filter forces some rather strict serialization
in order to be conservatively safe: for every client that wants to
connect, the nbdkit filter calls ext2fs_open(), then eventually
ext2fs_file_open2(), then exposes the contents of that one extracted
file over NBD one operation at a time, then closes everything back
down before accepting a second client.  But we'd LOVE to add some
parallelization; the NBD protocol allows multiple clients, as well as
out-of-order processing of requests from a single client.

Right away, I already know that calling ext2fs_open() more than once
on the same file system is a recipe for disaster (it is equivalent to
mounting the same block device at once through more than one bare
metal OS, and won't work).  So I've got a proposal for how to rework
the nbdkit code to open the file system exactly once and share that
handle among multiple NBD clients:
https://listman.redhat.com/archives/libguestfs/2021-May/msg00028.html

However, in my testing, I quickly found that while it would let me
visit two independent inodes at once through two separate clients, I
was seeing inconsistencies when trying to visit the SAME inode through
two independent clients.  That is, with (abbreviated code):

ext2fs_open(..., &fs);
ext2fs_namei(fs, ... "/foo", &ino);
ext2fs_file_open2(fs, ino, NULL, flags, &f1); // hand f1 to thread 1
ext2fs_file_open2(fs, ino, NULL, flags, &f2); // hand f2 to thread 2
// thread 1
ext2fs_file_read(f1, buf...);
// thread 2
ext2fs_file_write(f2, buf...);
ext2fs_file_flush(f2);
// thread 1
ext2fs_file_flush(f1);
ext2fs_file_read(f1, buf...);

the two open file handles carried independent buffering state - even
though thread 2 (tried to) flush everything, the handle f1 STILL
reports the data read prior to thread 2 doing any modification.

Is it okay to have two concurrent handles open to the same inode, or
do I need to implement a hash map on my end so that two NBD clients
requesting access to the same file within the ext2 filesystem share a
single inode?  If concurrent handles are supported, what mechanism can
I use to ensure that a flush performed on one handle will be visible
for reading from the other handle, as ext2fs_file_flush does not seem
to be strong enough?

Next, when using a single open ext2_ino_t, are there any concurrency
restrictions that I must observe when using that handle from more than
one thread at a time?  For example, is it safe to have two threads
both in the middle of a call to ext2_file_read() on that same handle,
or must I add my own mutex locking to ensure that a second thread
doesn't read data until the first thread is complete with its call?
Or put another way, are the ext2fs_* calls re-entrant?

Next, the nbdkit ext2 filter is using a custom io handler for
converting client requests as filtered through ext2fs back into raw
read/write/flush calls to pass to the real underlying NBD storage.
Among others, I implemented the io_flush(io_channel) callback, but in
debugging it, I see it only gets invoked during ext2fs_close(), and
not during ext2fs_file_flush().  Is this a symptom of me not calling
ext2fs_flush2() at points where I want to be sure actions on a single
file within the filesystem are flushed to persistant storage?

Finally, I see with
https://patchwork.ozlabs.org/project/linux-ext4/patch/20201205045856.895342-6-tytso@mit.edu/
that you recently added EXT2_FLAG_THREADS, as well as
CHANNEL_FLAGS_THREADS.  I think it should be fairly straightforward to
tweak my nbdkit custom IO manager to advertise CHANNEL_FLAGS_THREADS
(as the NBD protocol really DOES support parallel outstanding IO
requests), and then add EXT2_FLAG_THREADS into the flags I pss to
ext2fs_file_open2(), to try and get ext2fs to take advantage of
parallel access to the underlying storage (regardless of whether the
clients are parallel coming into ext2fs).  Are there any concurrency
issues I should be aware of on that front when updating my code?

Obviously, when the kernel accesses an ext2/3/4 file system, it DOES
support full concurrency (separate user space processes can open
independent handles to the same file, and the processes must
coordinate with well-timed fsync() or similar any time there is an
expectation of a happens-before relation where actions from one
process must be observed from another).  But nbdkit is all about
accessing the data of an ext2 filesystem from userspace, without any
kernel bio involvement, and is thus reliant on whatever concurrency
guarantees the ext2progs library has (or lacks).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Powered by blists - more mailing lists