[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <913f15a9f74615d6243391452206db53@natalenko.name>
Date: Wed, 04 Nov 2020 20:53:03 +0100
From: Oleksandr Natalenko <oleksandr@...alenko.name>
To: Kent Overstreet <kent.overstreet@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: bcachefs-for-review
Hi.
On 27.10.2020 21:04, Kent Overstreet wrote:
> Here's where bcachefs is at and what I'd like to get merged:
>
> https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-for-review
Please excuse my ignorance if I missed things in other discussions, but
if this is what's expected to be reviewed, why the submission is not
splitted into reviewable patches?
>
> Non bcachefs prep patches:
>
> Compiler Attributes: add __flatten
> locking: SIX locks (shared/intent/exclusive)
> mm: export find_get_pages_range()
> mm: Add a mechanism to disable faults for a specific mapping
> mm: Bring back vmalloc_exec
> fs: insert_inode_locked2()
> fs: factor out d_mark_tmpfile()
> block: Add some exports for bcachefs
> block: Add blk_status_to_str()
> bcache: move closures to lib/
> closures: closure_wait_event()
>
> block/bio.c | 2 +
> block/blk-core.c | 13 +-
> drivers/md/bcache/Kconfig | 10 +-
> drivers/md/bcache/Makefile | 4 +-
> drivers/md/bcache/bcache.h | 2 +-
> drivers/md/bcache/super.c | 1 -
> drivers/md/bcache/util.h | 3 +-
> fs/dcache.c | 10 +-
> fs/inode.c | 40 ++
> include/linux/blkdev.h | 1 +
> {drivers/md/bcache => include/linux}/closure.h | 39 +-
> include/linux/compiler_attributes.h | 5 +
> include/linux/dcache.h | 1 +
> include/linux/fs.h | 1 +
> include/linux/sched.h | 1 +
> include/linux/six.h | 197 +++++++++
> include/linux/vmalloc.h | 1 +
> init/init_task.c | 1 +
> kernel/Kconfig.locks | 3 +
> kernel/locking/Makefile | 1 +
> kernel/locking/six.c | 553
> +++++++++++++++++++++++++
> kernel/module.c | 4 +-
> lib/Kconfig | 3 +
> lib/Kconfig.debug | 9 +
> lib/Makefile | 2 +
> {drivers/md/bcache => lib}/closure.c | 35 +-
> mm/filemap.c | 1 +
> mm/gup.c | 7 +
> mm/nommu.c | 18 +
> mm/vmalloc.c | 21 +
> 30 files changed, 937 insertions(+), 52 deletions(-)
> rename {drivers/md/bcache => include/linux}/closure.h (94%)
> create mode 100644 include/linux/six.h
> create mode 100644 kernel/locking/six.c
> rename {drivers/md/bcache => lib}/closure.c (89%)
>
> New since last posting that's relevant to the rest of the kernel:
> - Re: the DIO cache coherency issue, we finally have a solution that
> hopefully
> everyone will find palatable. We no longer try to do any fancy
> recursive
> locking stuff: if userspace issues a DIO read/write where the buffer
> points
> to the same address space as the file being read/written to, we just
> return
> an error.
>
> This requires a small change to gup.c, to add the check after the
> VMA lookup.
> My patch passes the mapping to check against via a new task_struct
> member,
> which is ugly because plumbing a new argument all the way to
> __get_user_pages
> is also going to be ugly and if I have to do that I'm likely to go
> on a
> refactoring binge, which gup.c looks like it needs anyways.
>
> - vmalloc_exec() is needed because bcachefs dynamically generates x86
> machine
> code - per btree node unpack functions.
>
> Bcachefs changes since last posting:
> - lots
> - reflink is done
> - erasure coding (reed solomon raid5/6) is maturing; I have declared
> it ready
> for beta testers and gotten some _very_ positive feedback on its
> performance.
> - btree key cache code is done and merged, big improvements to
> multithreaded
> write workloads
> - inline data extents
> - major improvements to how the btree code handles extents (still
> todo:
> re-implement extent merging)
> - huge improvements to mount/unmount times on huge filesystems
> - many, many bugfixes; bug reports are slowing and the bugs that are
> being
> reported look less and less concerning. In particular repair code is
> getting
> better and more polished.
>
> TODO:
> - scrub, repair of replicated data when one of the replicas fail the
> checksum
> check
> - erasure coding needs repair code (it'll do reconstruct reads, but we
> don't
> have code to rewrite bad blocks in a stripe yet. this is going to be
> a hassle
> until we get backpointers)
> - fsck isn't checking refcounts of reflinked extents yet
> - bcachefs tests in ktest need to be moved to xfstests
> - user docs are still very minimal
>
> So that's roughly where things are at. I think erasure coding is going
> to to be
> bcachefs's killer feature (or at least one of them), and I'm pretty
> excited
> about it: it's a completely new approach unlike ZFS and btrfs, no write
> hole (we
> don't update existing stripes in place) and we don't have to fragment
> writes
> either like ZFS does. Add to that the caching that we already do and
> it's
> turning into a pretty amazing tool for managing a whole bunch of mixed
> storage.
--
Oleksandr Natalenko (post-factum)
Powered by blists - more mailing lists