lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4gh=QaDB61_9_QTgtt-pZuTFdR6td0orE0VMH6=6SA2vw@mail.gmail.com>
Date:   Tue, 15 Sep 2020 08:16:11 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Mikulas Patocka <mpatocka@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Dave Jiang <dave.jiang@...el.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        Eric Sandeen <esandeen@...hat.com>,
        Dave Chinner <dchinner@...hat.com>,
        "Kani, Toshi" <toshi.kani@....com>,
        "Norton, Scott J" <scott.norton@....com>,
        "Tadakamadla, Rajesh (DCIG/CDI/HPS Perf)" 
        <rajesh.tadakamadla@....com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>
Subject: Re: [RFC] nvfs: a filesystem for persistent memory

On Tue, Sep 15, 2020 at 5:35 AM Mikulas Patocka <mpatocka@...hat.com> wrote:
>
> Hi
>
> I am developing a new filesystem suitable for persistent memory - nvfs.

Nice!

> The goal is to have a small and fast filesystem that can be used on
> DAX-based devices. Nvfs maps the whole device into linear address space
> and it completely bypasses the overhead of the block layer and buffer
> cache.

So does device-dax, but device-dax lacks read(2)/write(2).

> In the past, there was nova filesystem for pmem, but it was abandoned a
> year ago (the last version is for the kernel 5.1 -
> https://github.com/NVSL/linux-nova ). Nvfs is smaller and performs better.
>
> The design of nvfs is similar to ext2/ext4, so that it fits into the VFS
> layer naturally, without too much glue code.
>
> I'd like to ask you to review it.
>
>
> tarballs:
>         http://people.redhat.com/~mpatocka/nvfs/
> git:
>         git://leontynka.twibright.com/nvfs.git
> the description of filesystem internals:
>         http://people.redhat.com/~mpatocka/nvfs/INTERNALS
> benchmarks:
>         http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS
>
>
> TODO:
>
> - programs run approximately 4% slower when running from Optane-based
> persistent memory. Therefore, programs and libraries should use page cache
> and not DAX mapping.

This needs to be based on platform firmware data f(ACPI HMAT) for the
relative performance of a PMEM range vs DRAM. For example, this
tradeoff should not exist with battery backed DRAM, or virtio-pmem.

>
> - when the fsck.nvfs tool mmaps the device /dev/pmem0, the kernel uses
> buffer cache for the mapping. The buffer cache slows does fsck by a factor
> of 5 to 10. Could it be possible to change the kernel so that it maps DAX
> based block devices directly?

We've been down this path before.

5a023cdba50c block: enable dax for raw block devices
9f4736fe7ca8 block: revert runtime dax control of the raw block device
acc93d30d7d4 Revert "block: enable dax for raw block devices"

EXT2/4 metadata buffer management depends on the page cache and we
eliminated a class of bugs by removing that support. The problems are
likely tractable, but there was not a straightforward fix visible at
the time.

> - __copy_from_user_inatomic_nocache doesn't flush cache for leading and
> trailing bytes.

You want copy_user_flushcache(). See how fs/dax.c arranges for
dax_copy_from_iter() to route to pmem_copy_from_iter().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ