lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150917121359.GD8624@ret.masoncoding.com>
Date:	Thu, 17 Sep 2015 08:13:59 -0400
From:	Chris Mason <clm@...com>
To:	Dave Chinner <david@...morbit.com>
CC:	Jan Kara <jack@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Josef Bacik <jbacik@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Neil Brown <neilb@...e.de>, Christoph Hellwig <hch@....de>,
	Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()

On Thu, Sep 17, 2015 at 02:30:08PM +1000, Dave Chinner wrote:
> On Wed, Sep 16, 2015 at 11:48:59PM -0400, Chris Mason wrote:
> > On Thu, Sep 17, 2015 at 10:37:38AM +1000, Dave Chinner wrote:
> > > [cc Tejun]
> > > 
> > > On Thu, Sep 17, 2015 at 08:07:04AM +1000, Dave Chinner wrote:
> > > #  ./fs_mark  -D  10000  -S0  -n  10000  -s  4096  -L  120  -d  /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7
> > > #       Version 3.3, 8 thread(s) starting at Thu Sep 17 08:08:36 2015
> > > #       Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
> > > #       Directories:  Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory.
> > > #       File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
> > > #       Files info: size 4096 bytes, written with an IO size of 16384 bytes per write
> > > #       App overhead is time in microseconds spent in the test not doing file writing related system calls.
> > > 
> > > FSUse%        Count         Size    Files/sec     App Overhead
> > >      0        80000         4096     106938.0           543310
> > >      0       160000         4096     102922.7           476362
> > >      0       240000         4096     107182.9           538206
> > >      0       320000         4096     107871.7           619821
> > >      0       400000         4096      99255.6           622021
> > >      0       480000         4096     103217.8           609943
> > >      0       560000         4096      96544.2           640988
> > >      0       640000         4096     100347.3           676237
> > >      0       720000         4096      87534.8           483495
> > >      0       800000         4096      72577.5          2556920
> > >      0       880000         4096      97569.0           646996
> > > 
> > > <RAM fills here, sustained performance is now dependent on writeback>
> > 
> > I think too many variables have changed here.
> > 
> > My numbers:
> > 
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      0       160000         4096     356407.1          1458461
> >      0       320000         4096     368755.1          1030047
> >      0       480000         4096     358736.8           992123
> >      0       640000         4096     361912.5          1009566
> >      0       800000         4096     342851.4          1004152
> 
> <snip>
> 
> > I can push the dirty threshold lower to try and make sure we end up in
> > the hard dirty limits but none of this is going to be related to the
> > plugging patch.
> 
> The point of this test is to drive writeback as hard as possible,
> not to measure how fast we can create files in memory.  i.e. if the
> test isn't pushing the dirty limits on your machines, then it really
> isn't putting a meaningful load on writeback, and so the plugging
> won't make significant difference because writeback isn't IO
> bound....

It does end up IO bound on my rig, just because we do eventually hit the
dirty limits.  Otherwise there would be zero benefits in fs_mark from
any patches vs plain v4.2

But I setup a run last night with a dirty_ratio_bytes at 3G and
dirty_background_ratio_bytes at 1.5G.

There is definitely variation, but nothing like what you saw:

FSUse%        Count         Size    Files/sec     App Overhead
     0       160000         4096     317427.9          1524951
     0       320000         4096     319723.9          1023874
     0       480000         4096     336696.4          1053884
     0       640000         4096     257113.1          1190851
     0       800000         4096     257644.2          1198054
     0       960000         4096     254896.6          1225610
     0      1120000         4096     241052.6          1203227
     0      1280000         4096     214961.2          1386236
     0      1440000         4096     239985.7          1264659
     0      1600000         4096     232174.3          1310018
     0      1760000         4096     250477.9          1227289
     0      1920000         4096     221500.9          1276223
     0      2080000         4096     235212.1          1284989
     0      2240000         4096     238580.2          1257260
     0      2400000         4096     224182.6          1326821
     0      2560000         4096     234628.7          1236402
     0      2720000         4096     244675.3          1228400
     0      2880000         4096     234364.0          1268408
     0      3040000         4096     229712.6          1306148
     0      3200000         4096     241170.5          1254490
     0      3360000         4096     220487.8          1331456
     0      3520000         4096     215831.7          1313682
     0      3680000         4096     210934.7          1235750
     0      3840000         4096     218435.4          1258077
     0      4000000         4096     232127.7          1271555
     0      4160000         4096     212017.6          1381525
     0      4320000         4096     216309.3          1370558
     0      4480000         4096     239072.4          1269086
     0      4640000         4096     221959.1          1333164
     0      4800000         4096     228396.8          1213160
     0      4960000         4096     225747.5          1318503
     0      5120000         4096     115727.0          1237327
     0      5280000         4096     184171.4          1547357
     0      5440000         4096     209917.8          1380510
     0      5600000         4096     181074.7          1391764
     0      5760000         4096     263516.7          1155172
     0      5920000         4096     236405.8          1239719
     0      6080000         4096     231587.2          1221408
     0      6240000         4096     237118.8          1244272
     0      6400000         4096     236773.2          1201428
     0      6560000         4096     243987.5          1240527
     0      6720000         4096     232428.0          1283265
     0      6880000         4096     234839.9          1209152
     0      7040000         4096     234947.3          1223456
     0      7200000         4096     231463.1          1260628
     0      7360000         4096     226750.3          1290098
     0      7520000         4096     213632.0          1236409
     0      7680000         4096     194710.2          1411595
     0      7840000         4096     213963.1          4146893
     0      8000000         4096     225109.8          1323573
     0      8160000         4096     251322.1          1380271
     0      8320000         4096     220167.2          1159390
     0      8480000         4096     210991.2          1110593
     0      8640000         4096     197922.8          1126072
     0      8800000         4096     203539.3          1143501
     0      8960000         4096     193041.7          1134329
     0      9120000         4096     184667.9          1119222
     0      9280000         4096     165968.7          1172738
     0      9440000         4096     192767.3          1098361
     0      9600000         4096     227115.7          1158097
     0      9760000         4096     232139.8          1264245
     0      9920000         4096     213320.5          1270505
     0     10080000         4096     217013.4          1324569
     0     10240000         4096     227171.6          1308668
     0     10400000         4096     208591.4          1392098
     0     10560000         4096     212006.0          1359188
     0     10720000         4096     213449.3          1352084
     0     10880000         4096     219890.1          1326240
     0     11040000         4096     215907.7          1239180
     0     11200000         4096     214207.2          1334846
     0     11360000         4096     212875.2          1338429
     0     11520000         4096     211690.0          1249519
     0     11680000         4096     217013.0          1262050
     0     11840000         4096     204730.1          1205087
     0     12000000         4096     191146.9          1188635
     0     12160000         4096     207844.6          1157033
     0     12320000         4096     208857.7          1168111
     0     12480000         4096     198256.4          1388368
     0     12640000         4096     214996.1          1305412
     0     12800000         4096     212332.9          1357814
     0     12960000         4096     210325.8          1336127
     0     13120000         4096     200292.1          1282419
     0     13280000         4096     202030.2          1412105
     0     13440000         4096     216553.7          1424076
     0     13600000         4096     218721.7          1298149
     0     13760000         4096     202037.4          1266877
     0     13920000         4096     224032.3          1198159
     0     14080000         4096     206105.6          1336489
     0     14240000         4096     227540.3          1160841
     0     14400000         4096     236921.7          1190394
     0     14560000         4096     229343.3          1147451
     0     14720000         4096     199435.1          1284374
     0     14880000         4096     215177.3          1178542
     0     15040000         4096     206194.1          1170832
     0     15200000         4096     215762.3          1125633
     0     15360000         4096     194511.0          1122947
     0     15520000         4096     179008.5          1292603
     0     15680000         4096     208636.9          1094960
     0     15840000         4096     192173.1          1237891
     0     16000000         4096     212888.9          1111551
     0     16160000         4096     218403.0          1143400
     0     16320000         4096     207260.5          1233526
     0     16480000         4096     202123.2          1151509
     0     16640000         4096     191033.0          1257706
     0     16800000         4096     196865.4          1154520
     0     16960000         4096     210361.2          1128930
     0     17120000         4096     201755.2          1160469
     0     17280000         4096     196946.6          1173529
     0     17440000         4096     199677.8          1165750
     0     17600000         4096     194248.4          1234944
     0     17760000         4096     200027.9          1256599
     0     17920000         4096     206507.0          1166820
     0     18080000         4096     215082.7          1167599
     0     18240000         4096     201475.5          1212202
     0     18400000         4096     208247.6          1252255
     0     18560000         4096     205482.7          1311436
     0     18720000         4096     200111.9          1358784
     0     18880000         4096     200028.3          1351332
     0     19040000         4096     198873.4          1287400
     0     19200000         4096     209609.3          1268400
     0     19360000         4096     203538.6          1249787
     0     19520000         4096     203427.9          1294105
     0     19680000         4096     201905.3          1280714
     0     19840000         4096     209642.9          1283281
     0     20000000         4096     203438.9          1315427
     0     20160000         4096     199690.7          1252267
     0     20320000         4096     185965.2          1398905
     0     20480000         4096     203221.6          1214029
     0     20640000         4096     208654.8          1232679
     0     20800000         4096     212488.6          1298458
     0     20960000         4096     189701.1          1356640
     0     21120000         4096     198522.1          1361240
     0     21280000         4096     203857.3          1263402
     0     21440000         4096     204616.8          1362853
     0     21600000         4096     196310.6          1266710
     0     21760000         4096     203275.4          1391150
     0     21920000         4096     205998.5          1378741
     0     22080000         4096     205434.2          1283787
     0     22240000         4096     195918.0          1415912
     0     22400000         4096     186193.0          1413623
     0     22560000         4096     192911.3          1393471
     0     22720000         4096     203726.3          1264281
     0     22880000         4096     204853.4          1221048
     0     23040000         4096     222803.2          1153031
     0     23200000         4096     198558.6          1346256
     0     23360000         4096     201001.4          1278817
     0     23520000         4096     206225.2          1270440
     0     23680000         4096     190894.2          1425299
     0     23840000         4096     198555.6          1334122
     0     24000000         4096     202386.4          1332157
     0     24160000         4096     205103.1          1313607

> 
> > I do see lower numbers if I let the test run even
> > longer, but there are a lot of things in the way that can slow it down
> > as the filesystem gets that big.
> 
> Sure, that's why I hit the dirty limits early in the test - so it
> measures steady state performance before the fs gets to any
> significant scalability limits....
> 
> > > The baseline of no plugging is a full 3 minutes faster than the
> > > plugging behaviour of Linus' patch. The IO behaviour demonstrates
> > > that, sustaining between 25-30,000 IOPS and throughput of
> > > 130-150MB/s.  Hence, while Linus' patch does change the IO patterns,
> > > it does not result in a performance improvement like the original
> > > plugging patch did.
> > 
> > How consistent is this across runs?
> 
> That's what I'm trying to work out. I didn't report it until I got
> consistently bad results - the numbers I reported were from the
> third time I ran the comparison, and they were representative and
> reproducable. I also ran my inode creation workload that is similar
> (but has not data writeback so doesn't go through writeback paths at
> all) and that shows no change in performance, so this problem
> (whatever it is) is only manifesting itself through data
> writeback....

The big change between Linus' patch and your patch is with Linus kblockd
is probably doing most of the actual unplug work (except for the last
super block in the list).  If a process is waiting for dirty writeout
progress, it has to wait for that context switch to kblockd.

In the VM, that's going to hurt more then my big two socket mostly idle
machine.

> 
> The only measurable change I've noticed in my monitoring graphs is
> that there is a lot more iowait time than I normally see, even when
> the plugging appears to be working as desired. That's what I'm
> trying to track down now, and once I've got to the bottom of that I
> should have some idea of where the performance has gone....
> 
> As it is, there are a bunch of other things going wrong with
> 4.3-rc1+ right now that I'm working through - I haven't updated my
> kernel tree for 10 days because I've been away on holidays so I'm
> doing my usual "-rc1 is broken again" dance that I do every release
> cycle.  (e.g every second boot hangs because systemd appears to be
> waiting for iscsi devices to appear without first starting the iscsi
> target daemon.  Never happened before today, every new kernel I've
> booted today has hung on the first cold boot of the VM).

I've been doing 4.2 plus patches because rc1 didn't boot on this strange
box.  Let me nail that down and rerun.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ