[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4270b5c7-04b4-28e0-6181-ef98d1f5130c@suse.de>
Date: Thu, 22 Jun 2023 07:51:08 +0200
From: Hannes Reinecke <hare@...e.de>
To: Dave Chinner <david@...morbit.com>
Cc: Pankaj Raghav <p.raghav@...sung.com>, willy@...radead.org,
gost.dev@...sung.com, mcgrof@...nel.org, hch@....de,
jwong@...nel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/4] minimum folio order support in filemap
On 6/22/23 00:07, Dave Chinner wrote:
> On Wed, Jun 21, 2023 at 11:00:24AM +0200, Hannes Reinecke wrote:
>> On 6/21/23 10:38, Pankaj Raghav wrote:
>>> There has been a lot of discussion recently to support devices and fs for
>>> bs > ps. One of the main plumbing to support buffered IO is to have a minimum
>>> order while allocating folios in the page cache.
>>>
>>> Hannes sent recently a series[1] where he deduces the minimum folio
>>> order based on the i_blkbits in struct inode. This takes a different
>>> approach based on the discussion in that thread where the minimum and
>>> maximum folio order can be set individually per inode.
>>>
>>> This series is based on top of Christoph's patches to have iomap aops
>>> for the block cache[2]. I rebased his remaining patches to
>>> next-20230621. The whole tree can be found here[3].
>>>
>>> Compiling the tree with CONFIG_BUFFER_HEAD=n, I am able to do a buffered
>>> IO on a nvme drive with bs>ps in QEMU without any issues:
>>>
>>> [root@...hlinux ~]# cat /sys/block/nvme0n2/queue/logical_block_size
>>> 16384
>>> [root@...hlinux ~]# fio -bs=16k -iodepth=8 -rw=write -ioengine=io_uring -size=500M
>>> -name=io_uring_1 -filename=/dev/nvme0n2 -verify=md5
>>> io_uring_1: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=io_uring, iodepth=8
>>> fio-3.34
>>> Starting 1 process
>>> Jobs: 1 (f=1): [V(1)][100.0%][r=336MiB/s][r=21.5k IOPS][eta 00m:00s]
>>> io_uring_1: (groupid=0, jobs=1): err= 0: pid=285: Wed Jun 21 07:58:29 2023
>>> read: IOPS=27.3k, BW=426MiB/s (447MB/s)(500MiB/1174msec)
>>> <snip>
>>> Run status group 0 (all jobs):
>>> READ: bw=426MiB/s (447MB/s), 426MiB/s-426MiB/s (447MB/s-447MB/s), io=500MiB (524MB), run=1174-1174msec
>>> WRITE: bw=198MiB/s (207MB/s), 198MiB/s-198MiB/s (207MB/s-207MB/s), io=500MiB (524MB), run=2527-2527msec
>>>
>>> Disk stats (read/write):
>>> nvme0n2: ios=35614/4297, merge=0/0, ticks=11283/1441, in_queue=12725, util=96.27%
>>>
>>> One of the main dependency to work on a block device with bs>ps is
>>> Christoph's work on converting block device aops to use iomap.
>>>
>>> [1] https://lwn.net/Articles/934651/
>>> [2] https://lwn.net/ml/linux-kernel/20230424054926.26927-1-hch@lst.de/
>>> [3] https://github.com/Panky-codes/linux/tree/next-20230523-filemap-order-generic-v1
>>>
>>> Luis Chamberlain (1):
>>> block: set mapping order for the block cache in set_init_blocksize
>>>
>>> Matthew Wilcox (Oracle) (1):
>>> fs: Allow fine-grained control of folio sizes
>>>
>>> Pankaj Raghav (2):
>>> filemap: use minimum order while allocating folios
>>> nvme: enable logical block size > PAGE_SIZE
>>>
>>> block/bdev.c | 9 ++++++++
>>> drivers/nvme/host/core.c | 2 +-
>>> include/linux/pagemap.h | 46 ++++++++++++++++++++++++++++++++++++----
>>> mm/filemap.c | 9 +++++---
>>> mm/readahead.c | 34 ++++++++++++++++++++---------
>>> 5 files changed, 82 insertions(+), 18 deletions(-)
>>>
>>
>> Hmm. Most unfortunate; I've just finished my own patchset (duplicating much
>> of this work) to get 'brd' running with large folios.
>> And it even works this time, 'fsx' from the xfstest suite runs happily on
>> that.
>
> So you've converted a filesystem to use bs > ps, too? Or is the
> filesystem that fsx is running on just using normal 4kB block size?
> If the latter, then fsx is not actually testing the large folio page
> cache support, it's mostly just doing 4kB aligned IO to brd....
>
I have been running fsx on an xfs with bs=16k, and it worked like a charm.
I'll try to run the xfstest suite once I'm finished with merging
Pankajs patches into my patchset.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
Powered by blists - more mailing lists