lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJN0pvgA2TqOQ9BC@dread.disaster.area>
Date:   Thu, 22 Jun 2023 08:07:34 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Hannes Reinecke <hare@...e.de>
Cc:     Pankaj Raghav <p.raghav@...sung.com>, willy@...radead.org,
        gost.dev@...sung.com, mcgrof@...nel.org, hch@....de,
        jwong@...nel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/4] minimum folio order support in filemap

On Wed, Jun 21, 2023 at 11:00:24AM +0200, Hannes Reinecke wrote:
> On 6/21/23 10:38, Pankaj Raghav wrote:
> > There has been a lot of discussion recently to support devices and fs for
> > bs > ps. One of the main plumbing to support buffered IO is to have a minimum
> > order while allocating folios in the page cache.
> > 
> > Hannes sent recently a series[1] where he deduces the minimum folio
> > order based on the i_blkbits in struct inode. This takes a different
> > approach based on the discussion in that thread where the minimum and
> > maximum folio order can be set individually per inode.
> > 
> > This series is based on top of Christoph's patches to have iomap aops
> > for the block cache[2]. I rebased his remaining patches to
> > next-20230621. The whole tree can be found here[3].
> > 
> > Compiling the tree with CONFIG_BUFFER_HEAD=n, I am able to do a buffered
> > IO on a nvme drive with bs>ps in QEMU without any issues:
> > 
> > [root@...hlinux ~]# cat /sys/block/nvme0n2/queue/logical_block_size
> > 16384
> > [root@...hlinux ~]# fio -bs=16k -iodepth=8 -rw=write -ioengine=io_uring -size=500M
> > 		    -name=io_uring_1 -filename=/dev/nvme0n2 -verify=md5
> > io_uring_1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=io_uring, iodepth=8
> > fio-3.34
> > Starting 1 process
> > Jobs: 1 (f=1): [V(1)][100.0%][r=336MiB/s][r=21.5k IOPS][eta 00m:00s]
> > io_uring_1: (groupid=0, jobs=1): err= 0: pid=285: Wed Jun 21 07:58:29 2023
> >    read: IOPS=27.3k, BW=426MiB/s (447MB/s)(500MiB/1174msec)
> >    <snip>
> > Run status group 0 (all jobs):
> >     READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s), io=500MiB (524MB), run=1174-1174msec
> >    WRITE: bw=198MiB/s (207MB/s), 198MiB/s-198MiB/s (207MB/s-207MB/s), io=500MiB (524MB), run=2527-2527msec
> > 
> > Disk stats (read/write):
> >    nvme0n2: ios=35614/4297, merge=0/0, ticks=11283/1441, in_queue=12725, util=96.27%
> > 
> > One of the main dependency to work on a block device with bs>ps is
> > Christoph's work on converting block device aops to use iomap.
> > 
> > [1] https://lwn.net/Articles/934651/
> > [2] https://lwn.net/ml/linux-kernel/20230424054926.26927-1-hch@lst.de/
> > [3] https://github.com/Panky-codes/linux/tree/next-20230523-filemap-order-generic-v1
> > 
> > Luis Chamberlain (1):
> >    block: set mapping order for the block cache in set_init_blocksize
> > 
> > Matthew Wilcox (Oracle) (1):
> >    fs: Allow fine-grained control of folio sizes
> > 
> > Pankaj Raghav (2):
> >    filemap: use minimum order while allocating folios
> >    nvme: enable logical block size > PAGE_SIZE
> > 
> >   block/bdev.c             |  9 ++++++++
> >   drivers/nvme/host/core.c |  2 +-
> >   include/linux/pagemap.h  | 46 ++++++++++++++++++++++++++++++++++++----
> >   mm/filemap.c             |  9 +++++---
> >   mm/readahead.c           | 34 ++++++++++++++++++++---------
> >   5 files changed, 82 insertions(+), 18 deletions(-)
> > 
> 
> Hmm. Most unfortunate; I've just finished my own patchset (duplicating much
> of this work) to get 'brd' running with large folios.
> And it even works this time, 'fsx' from the xfstest suite runs happily on
> that.

So you've converted a filesystem to use bs > ps, too? Or is the
filesystem that fsx is running on just using normal 4kB block size?
If the latter, then fsx is not actually testing the large folio page
cache support, it's mostly just doing 4kB aligned IO to brd....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ