[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200304165845.3081-1-vgoyal@redhat.com>
Date: Wed, 4 Mar 2020 11:58:25 -0500
From: Vivek Goyal <vgoyal@...hat.com>
To: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nvdimm@...ts.01.org, virtio-fs@...hat.com, miklos@...redi.hu
Cc: vgoyal@...hat.com, stefanha@...hat.com, dgilbert@...hat.com,
mst@...hat.com
Subject: [PATCH 00/20] virtiofs: Add DAX support
Hi,
This patch series adds DAX support to virtiofs filesystem. This allows
bypassing guest page cache and allows mapping host page cache directly
in guest address space.
When a page of file is needed, guest sends a request to map that page
(in host page cache) in qemu address space. Inside guest this is
a physical memory range controlled by virtiofs device. And guest
directly maps this physical address range using DAX and hence gets
access to file data on host.
This can speed up things considerably in many situations. Also this
can result in substantial memory savings as file data does not have
to be copied in guest and it is directly accessed from host page
cache.
Most of the changes are limited to fuse/virtiofs. There are couple
of changes needed in generic dax infrastructure and couple of changes
in virtio to be able to access shared memory region.
These patches apply on top of 5.6-rc4 and are also available here.
https://github.com/rhvgoyal/linux/commits/vivek-04-march-2020
Any review or feedback is welcome.
Performance
===========
I have basically run bunch of fio jobs to get a sense of speed of
various operations. I wrote a simple wrapper script to run fio jobs
3 times and take their average and report it. These scripts and fio
jobs are available here.
https://github.com/rhvgoyal/virtiofs-tests
I set up a directory on ramfs on host and exported that directory inside
guest using virtio-fs and ran tests inside guests. Ran tests with
cache=none both with dax enabled and disabled. cache=none option
enforces no caching happens in guest both for data and metadata.
Test Setup
-----------
- A fedora 29 host with 376Gi RAM, 2 sockets (20 cores per socket, 2
threads per core)
- Using ramfs on host as backing store. 4 fio files of 8G each.
- Created a VM with 64 VCPUS and 64GB memory. An 64GB cache window (for dax
mmap).
Test Results
------------
- Results in two configurations have been reported.
virtio-fs (cache=none) and virtio-fs (cache=none + dax).
There are other caching modes as well but to me cache=none seemed most
interesting for now because it does not cache anything in guest
and provides strong coherence. Other modes which provide less strong
coherence and hence are faster are yet to be benchmarked.
- Three fio ioengines psync, libaio and mmap have been used.
- I/O Workload of randread, radwrite, seqread and seqwrite have been run.
- Each file size is 8G. Block size 4K. iodepth=16
- "multi" means same operation was done with 4 jobs and each job is
operating on a file of size 8G.
- Some results are "0 (KiB/s)". That means that particular operation is
not supported in that configuration.
NAME I/O Operation BW(Read/Write)
virtiofs-cache-none seqread-psync 35(MiB/s)
virtiofs-cache-none-dax seqread-psync 643(MiB/s)
virtiofs-cache-none seqread-psync-multi 219(MiB/s)
virtiofs-cache-none-dax seqread-psync-multi 2132(MiB/s)
virtiofs-cache-none seqread-mmap 0(KiB/s)
virtiofs-cache-none-dax seqread-mmap 741(MiB/s)
virtiofs-cache-none seqread-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax seqread-mmap-multi 2530(MiB/s)
virtiofs-cache-none seqread-libaio 293(MiB/s)
virtiofs-cache-none-dax seqread-libaio 425(MiB/s)
virtiofs-cache-none seqread-libaio-multi 207(MiB/s)
virtiofs-cache-none-dax seqread-libaio-multi 1543(MiB/s)
virtiofs-cache-none randread-psync 36(MiB/s)
virtiofs-cache-none-dax randread-psync 572(MiB/s)
virtiofs-cache-none randread-psync-multi 211(MiB/s)
virtiofs-cache-none-dax randread-psync-multi 1764(MiB/s)
virtiofs-cache-none randread-mmap 0(KiB/s)
virtiofs-cache-none-dax randread-mmap 719(MiB/s)
virtiofs-cache-none randread-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax randread-mmap-multi 2005(MiB/s)
virtiofs-cache-none randread-libaio 300(MiB/s)
virtiofs-cache-none-dax randread-libaio 413(MiB/s)
virtiofs-cache-none randread-libaio-multi 327(MiB/s)
virtiofs-cache-none-dax randread-libaio-multi 1326(MiB/s)
virtiofs-cache-none seqwrite-psync 34(MiB/s)
virtiofs-cache-none-dax seqwrite-psync 494(MiB/s)
virtiofs-cache-none seqwrite-psync-multi 223(MiB/s)
virtiofs-cache-none-dax seqwrite-psync-multi 1680(MiB/s)
virtiofs-cache-none seqwrite-mmap 0(KiB/s)
virtiofs-cache-none-dax seqwrite-mmap 1217(MiB/s)
virtiofs-cache-none seqwrite-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax seqwrite-mmap-multi 2359(MiB/s)
virtiofs-cache-none seqwrite-libaio 282(MiB/s)
virtiofs-cache-none-dax seqwrite-libaio 348(MiB/s)
virtiofs-cache-none seqwrite-libaio-multi 320(MiB/s)
virtiofs-cache-none-dax seqwrite-libaio-multi 1255(MiB/s)
virtiofs-cache-none randwrite-psync 32(MiB/s)
virtiofs-cache-none-dax randwrite-psync 458(MiB/s)
virtiofs-cache-none randwrite-psync-multi 213(MiB/s)
virtiofs-cache-none-dax randwrite-psync-multi 1343(MiB/s)
virtiofs-cache-none randwrite-mmap 0(KiB/s)
virtiofs-cache-none-dax randwrite-mmap 663(MiB/s)
virtiofs-cache-none randwrite-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax randwrite-mmap-multi 1820(MiB/s)
virtiofs-cache-none randwrite-libaio 292(MiB/s)
virtiofs-cache-none-dax randwrite-libaio 341(MiB/s)
virtiofs-cache-none randwrite-libaio-multi 322(MiB/s)
virtiofs-cache-none-dax randwrite-libaio-multi 1094(MiB/s)
Conclusion
===========
- virtio-fs with dax enabled is significantly faster and memory
effiecient as comapred to non-dax operation.
Note:
Right now dax window is 64G and max fio file size is 32G as well (4
files of 8G each). That means everything fits into dax window and no
reclaim is needed. Dax window reclaim logic is slower and if file
size is bigger than dax window size, performance slows down.
Thanks
Vivek
Sebastien Boeuf (3):
virtio: Add get_shm_region method
virtio: Implement get_shm_region for PCI transport
virtio: Implement get_shm_region for MMIO transport
Stefan Hajnoczi (2):
virtio_fs, dax: Set up virtio_fs dax_device
fuse,dax: add DAX mmap support
Vivek Goyal (15):
dax: Modify bdev_dax_pgoff() to handle NULL bdev
dax: Create a range version of dax_layout_busy_page()
virtiofs: Provide a helper function for virtqueue initialization
fuse: Get rid of no_mount_options
fuse,virtiofs: Add a mount option to enable dax
fuse,virtiofs: Keep a list of free dax memory ranges
fuse: implement FUSE_INIT map_alignment field
fuse: Introduce setupmapping/removemapping commands
fuse, dax: Implement dax read/write operations
fuse, dax: Take ->i_mmap_sem lock during dax page fault
fuse,virtiofs: Define dax address space operations
fuse,virtiofs: Maintain a list of busy elements
fuse: Release file in process context
fuse: Take inode lock for dax inode truncation
fuse,virtiofs: Add logic to free up a memory range
drivers/dax/super.c | 3 +-
drivers/virtio/virtio_mmio.c | 32 +
drivers/virtio/virtio_pci_modern.c | 107 +++
fs/dax.c | 66 +-
fs/fuse/dir.c | 2 +
fs/fuse/file.c | 1162 +++++++++++++++++++++++++++-
fs/fuse/fuse_i.h | 109 ++-
fs/fuse/inode.c | 148 +++-
fs/fuse/virtio_fs.c | 250 +++++-
include/linux/dax.h | 6 +
include/linux/virtio_config.h | 17 +
include/uapi/linux/fuse.h | 42 +-
include/uapi/linux/virtio_fs.h | 3 +
include/uapi/linux/virtio_mmio.h | 11 +
include/uapi/linux/virtio_pci.h | 11 +-
15 files changed, 1888 insertions(+), 81 deletions(-)
--
2.20.1
Powered by blists - more mailing lists