[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.2101061245100.30542@file01.intranet.prod.int.rdu2.redhat.com>
Date: Thu, 7 Jan 2021 08:15:41 -0500 (EST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Ira Weiny <ira.weiny@...el.com>,
Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
Steven Whitehouse <swhiteho@...hat.com>,
Eric Sandeen <esandeen@...hat.com>,
Dave Chinner <dchinner@...hat.com>,
"Theodore Ts'o" <tytso@....edu>,
Wang Jianchao <jianchao.wan9@...il.com>,
"Kani, Toshi" <toshi.kani@....com>,
"Norton, Scott J" <scott.norton@....com>,
"Tadakamadla, Rajesh" <rajesh.tadakamadla@....com>
cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-nvdimm@...ts.01.org
Subject: [RFC v2] nvfs: a filesystem for persistent memory
Hi
I announce a new version of NVFS - a filesystem for persistent memory.
http://people.redhat.com/~mpatocka/nvfs/
git://leontynka.twibright.com/nvfs.git
Changes since the last release:
* I added a microjournal to the filesystem, it can hold up to 16 entries.
Each CPU has it's own journal, so that there is no lock contention. The
journal is used to provide atomicity of reaname() and extended attribute
replace.
(note that file creation or deletion doesn't use the journal, because
these operations can be deterministically cleaned up by fsck)
* I created a framework that can be used to verify the filesystem driver.
It logs all writes and memory barriers to a file, the entries in the
file are randomly reordered (to simulate reordering in the CPU
write-combining buffers), the sequence is cut at a random point (to
simulate a system crash) and the result is replayed on a filesystem
image.
With this framework, we can for example check that if a crash happens
during rename(), either old file or new file will be present in a
directory.
This framework helped to find a few bugs in sequencing the writes.
* If we map an executable image, we turn off the DAX flag on the inode
(because executables run 4% slower from persistent memory). There is
also a switch that can turn DAX always off or always on.
I'd like to ask about this piece of code in __kernel_read:
if (unlikely(!file->f_op->read_iter || file->f_op->read))
return warn_unsupported...
and __kernel_write:
if (unlikely(!file->f_op->write_iter || file->f_op->write))
return warn_unsupported...
- It exits with an error if both read_iter and read or write_iter and
write are present.
I found out that on NVFS, reading a file with the read method has 10%
better performance than the read_iter method. The benchmark just reads the
same 4k page over and over again - and the cost of creating and parsing
the kiocb and iov_iter structures is just that high.
So, I'd like to have both read and read_iter methods. Could the above
conditions be changed, so that they don't fail with an error if the "read"
or "write" method is present?
Mikulas
Powered by blists - more mailing lists