[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <053b60a6-133e-5d59-0732-464d5160772a@amazon.com>
Date: Wed, 25 Jan 2023 16:33:54 -0800
From: "Bhatnagar, Rishabh" <risbhat@...zon.com>
To: Jan Kara <jack@...e.cz>, <tytso@....edu>,
<akpm@...ux-foundation.org>
CC: <linux-mm@...ck.org>, <linux-ext4@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <abuehaze@...zon.com>
Subject: EXT4 IOPS degradation between 4.14 and 5.10
Hi Jan
As discussed in the previous thread I'm chasing IOPS regression between
4.14 -> 5.10 kernels.
https://lore.kernel.org/lkml/20230112113820.hjwvieq3ucbwreql@quack3/T/
<https://lore.kernel.org/lkml/20230112113820.hjwvieq3ucbwreql@quack3/T/>
Last issue we discussed was difficult to resolve so keeping it on the
back burner for now.
I did some more bisecting and saw another series of patches that
potentially impacts iops score:
72b045aecdd856b083521f2a963705b4c2e59680 (mm: implement
find_get_pages_range_tag())
Running fio tests on tip as 9c19a9cb1642c074aa8bc7693cd4c038643960ae
(including the 16 patch series) vs tip as
6b4c54e3787bc03e810062bd257a3b05fd9c72d6 (without the above series)
shows an IOPS jump.
Fio with buffered io/fsync=1/randwrite
With HEAD as 9c19a9cb1642c074aa8bc7693cd4c038643960ae (with the above
series)
write: io=445360KB, bw=7418.6KB/s, *iops=463*, runt= 60033msec
clat (usec): min=4, max=32132, avg=311.90, stdev=1812.74
lat (usec): min=5, max=32132, avg=312.28, stdev=1812.74
clat percentiles (usec):
| 1.00th=[ 8], 5.00th=[ 10], 10.00th=[ 16], 20.00th=[ 25],
| 30.00th=[ 36], 40.00th=[ 47], 50.00th=[ 60], 60.00th=[ 71],
| 70.00th=[ 84], 80.00th=[ 97], 90.00th=[ 111], 95.00th=[ 118],
| 99.00th=[11840], 99.50th=[15936], 99.90th=[21888], 99.95th=[23936],
With HEAD as 6b4c54e3787bc03e810062bd257a3b05fd9c72d6(without the above
series)
write: io=455184KB, bw=7583.4KB/s, *iops=473*, runt= 60024msec
clat (usec): min=6, max=24325, avg=319.72, stdev=1694.52
lat (usec): min=6, max=24326, avg=319.99, stdev=1694.53
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 11], 10.00th=[ 17], 20.00th=[ 26],
| 30.00th=[ 38], 40.00th=[ 50], 50.00th=[ 60], 60.00th=[ 73],
| 70.00th=[ 85], 80.00th=[ 98], 90.00th=[ 111], 95.00th=[ 118],
| 99.00th=[ 9792], 99.50th=[14016], 99.90th=[21888], 99.95th=[22400],
| 99.99th=[24192]
I also see that number of handles per transaction were much higher
before this patch series
0ms waiting for transaction
0ms request delay
20ms running transaction
0ms transaction was being locked
0ms flushing data (in ordered mode)
10ms logging transaction
*13524us average transaction commit time*
*73 handles per transaction*
0 blocks per transaction
1 logged blocks per transaction
vs after the patch series.
0ms waiting for transaction
0ms request delay
20ms running transaction
0ms transaction was being locked
0ms flushing data (in ordered mode)
20ms logging transaction
*21468us average transaction commit time*
*66 handles per transaction*
1 blocks per transaction
1 logged blocks per transaction
This is probably again helping in bunching the writeback transactions
and increasing throughput.
I looked at the code to understand what might be going on.
It seems like commit 72b045aecdd856b083521f2a963705b4c2e59680 changes
the behavior of find_get_pages_range_tag.
Before this commit if find_get_pages_tag cannot find nr_pages
(PAGEVEC_SIZE) it returns the number of pages found as ret and
sets the *index to the last page it found + 1. After the commit the
behavior changes such that if we don’t find nr_pages pages
we set the index to end and not to the last found page. (added diff from
above commit)
Since pagevec_lookup_range_tag is always called in a while loop (index
<= end) the code before the commit helps in coalescing
writeback of pages if there are multiple threads doing write as it might
keep finding new dirty (tagged) pages since it doesn’t set index to end.
+ /*
+ * We come here when we got at @end. We take care to not overflow the
+ * index @index as it confuses some of the callers. This breaks the
+ * iteration when there is page at index -1 but that is already broken
+ * anyway.
+ */
+ if (end == (pgoff_t)-1)
+ *index = (pgoff_t)-1;
+ else
+ *index = end + 1;
+out:
rcu_read_unlock();
- if (ret)
- *index = pages[ret - 1]->index + 1;
-
From the description of the patch i didn't see any mention of this
functional change.
Was this change intentional and did help some usecase or general
performance improvement?
Thanks
Rishabh
Powered by blists - more mailing lists