linux-ext4 - Re: [PATCH v4 0/3] dioread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100219212557.GM29604@tux1.beaverton.ibm.com>
Date:	Fri, 19 Feb 2010 13:25:57 -0800
From:	"Darrick J. Wong" <djwong@...ibm.com>
To:	Jiaying Zhang <jiayingz@...gle.com>
Cc:	"Theodore Ts'o" <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v4 0/3] dioread_nolock patch

On Wed, Feb 17, 2010 at 11:34:32AM -0800, Jiaying Zhang wrote:
> Hi Darrick,
> 
> Thank you for running these tests!

No problem.

> On Tue, Feb 16, 2010 at 1:07 PM, Darrick J. Wong <djwong@...ibm.com> wrote:
> > On Fri, Jan 15, 2010 at 02:30:09PM -0500, Theodore Ts'o wrote:
> >
> >> The plan is to merge this for 2.6.34.  I've looked this over pretty
> >> carefully, but another pair of eyes would be appreciated, especially if
> >
> > I don't have a high speed disk but it was suggested that I give this patchset a
> > whirl anyway, so down the rabbit hole I went.  I created a 16GB ext4 image in
> > an equally big tmpfs, then ran the read/readall directio tests in ffsb to see
> > if I could observe any difference.  The kernel is 2.6.33-rc8, and the machine
> > in question has 2 Xeon E5335 processors and 24GB of RAM.  I reran the test
> > several times, with varying thread counts, to produce the table below.  The
> > units are MB/s.
> >
> > For the dio_lock case, mount options were: rw,relatime,barrier=1,data=ordered.
> > For the dio_nolock case, they were: rw,relatime,barrier=1,data=ordered,dioread_nolock.
> >
> >        dio_nolock      dio_lock
> > threads read    readall read    readall
> > 1       37.6    149     39      159
> > 2       59.2    245     62.4    246
> > 4       114     453     112     445
> > 8       111     444     115     459
> > 16      109     442     113     448
> > 32      114     443     121     484
> > 64      106     422     108     434
> > 128     104     417     101     393
> > 256     101     412     90.5    366
> > 512     93.3    377     84.8    349
> > 1000    87.1    353     88.7    348
> >
> > It would seem that the old code paths are faster with a small number of
> > threads, but the new patch seems to be faster when the thread counts become
> > very high.  That said, I'm not all that familiar with what exactly tmpfs does,
> > or how well it mimicks an SSD (though I wouldn't be surprised to hear
> > "poorly").  This of course makes me wonder--do other people see results like
> > this, or is this particular to my harebrained setup?
> The dioread_nolock patch set is to eliminate the need of holding i_mutex lock
> during DIO read. That is why we usually see more improvements as the number
> of threads increases on high-speed SSDs. The performance difference is
> also more obvious as the bandwidth of device increases.

Running my streaming profiler, it looks like I can "get" 1500MB/s off the
ramdisk.

> I am surprised to see around 6% performance drop on single thread case.
> The dioread_nolock patches change the ext4 buffer write code path a lot but on
> the dio read code path, the only change is to not grab the i_mutex lock.
> I haven't seen such difference in my tests. I mostly use fio test for
> performance
> comparison. I will give ffsb test a try.

Ok, I'll attach the config file and script I was using.  Make sure /mnt is the
filesystem to test, and then you can run the script via:

$ ./readwrite 1 2 4 8 16 32 64 128 256 512

> Meanwhile, could you also post the stdev numbers?

I don't have that spreadsheet on this computer, but I recall that the std
deviations weren't more than about 10 for the first run.

Oddly, I tried a second computer, and saw very little difference (units MB/s):

threads	lock avg	nolock avg	lock stdev	nolock stdev
1	235		214		1		5.57
2	318		316.67		3		2.52
4	589.67		581.67		8.14		22.14
8	594.67		583		15.7		4
16	596.67		576		8.96		8.72
32	578		576.67		7.81		5.69
64	570.33		575.67		1.15		7.51
128	573.67		573.67		10.69		10.69
256	575.33		570		8.14		6.08
512	539.67		544.33		3.21		4.04
1000	479.33		482		3.21		2

This one has somewhat faster RAM (ECC registered vs FBDIMMs) and 8x 2.5GHz Xeon
L5420 CPUs.

> > For that matter, do I need to have more patches than just 2.6.33-rc8 and the
> > four posted in this thread?
> >
> > I also observed that I could make the kernel spit up "Process hung for more
> > than 120s!" messages if I happened to be running ffsb on a real disk during a
> > heavy directio write load.  I'll poke around on that a little more and write
> > back when I have more details.
> 
> Did the hang happen only with dioread_nolock or it also happened without
> the patches applied? It is not surprising to see such messages on slow disk
> since the processes are all waiting for IOs.

To clarify: Nothing hung; I simply got the "hung task" warning.  It
happened only with the patches applied, though for all I know without the
patches applied the tasks could be starving for 119s.

> > For poweroff testing, could one simulate a power failure by running IO
> > workloads in a VM and then SIGKILLing the VM?  I don't remember seeing any sort
> > of powerfail test suite from the Googlers, but my mail client has been drinking
> > out of firehoses lately. ;)
> As far as I know, these numbers are not posted yet but will come out soon.

Uh... I was more curious if anyone had a testing suite, not results necessarily.

--D

View attachment "djwong-readwrite.ffsb" of type "text/plain" (1433 bytes)

Download attachment "readwrite.sh" of type "application/x-sh" (241 bytes)