[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFgt=MDE9SB6TuzZ7vNF5bqQv_ugY7hk7=phidvRdrts8n4Jjw@mail.gmail.com>
Date: Fri, 19 Aug 2011 10:55:02 -0700
From: Jiaying Zhang <jiayingz@...gle.com>
To: Michael Tokarev <mjt@....msk.ru>
Cc: Tao Ma <tm@....ma>, "Ted Ts'o" <tytso@....edu>,
Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
sandeen@...hat.com
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)
On Fri, Aug 19, 2011 at 12:05 AM, Michael Tokarev <mjt@....msk.ru> wrote:
> On 19.08.2011 07:18, Tao Ma wrote:
>> Hi Michael,
>> On 08/18/2011 02:49 PM, Michael Tokarev wrote:
> []
>>> What about current situation, how do you think - should it be ignored
>>> for now, having in mind that dioread_nolock isn't used often (but it
>>> gives _serious_ difference in read speed), or, short term, fix this
>>> very case which have real-life impact already, while implementing a
>>> long-term solution?
>
>> So could you please share with us how you test and your test result
>> with/without dioread_nolock? A quick test with fio and intel ssd does't
>> see much improvement here.
>
> I used my home-grown quick-n-dirty microbenchmark for years to measure
> i/o subsystem performance. Here are the results from 3.0 kernel on
> some Hitachi NAS (FC, on brocade adaptors), 14-drive raid10 array.
>
> The numbers are all megabytes/sec transferred (read or written), summed
> for all threads. Leftmost column is the block size; next column is the
> number of concurrent threads of the same type. And the columns are
> tests: linear read, random read, linear write, random write, and
> concurrent random read and write.
>
> For a raw device:
>
> BlkSz Trd linRd rndRd linWr rndWr rndR/W
> 4k 1 18.3 0.8 14.5 9.6 0.1/ 9.1
> 4 2.5 9.4 0.4/ 8.4
> 32 10.0 9.3 4.7/ 5.4
> 16k 1 59.4 2.5 49.9 35.7 0.3/ 34.7
> 4 10.3 36.1 1.5/ 31.4
> 32 38.5 36.2 17.5/ 20.4
> 64k 1 118.4 9.1 136.0 106.5 1.1/105.8
> 4 37.7 108.5 4.7/102.6
> 32 153.0 108.5 57.9/ 73.3
> 128k 1 125.9 16.5 138.8 125.8 1.1/125.6
> 4 68.7 128.7 6.3/122.8
> 32 277.0 128.7 70.3/ 98.6
> 1024k 1 89.9 81.2 138.9 134.4 5.0/132.3
> 4 254.7 137.6 19.2/127.1
> 32 390.7 137.5 117.2/ 90.1
>
> For ext4fs, 1Tb file, default mount options:
>
> BlkSz Trd linRd rndRd linWr rndWr rndR/W
> 4k 1 15.7 0.6 15.4 9.4 0.0/ 9.0
> 4 2.6 9.3 0.0/ 8.9
> 32 10.0 9.3 0.0/ 8.9
> 16k 1 47.6 2.5 53.2 34.6 0.1/ 33.6
> 4 10.2 34.6 0.0/ 33.5
> 32 39.9 34.8 0.1/ 33.6
> 64k 1 100.5 9.0 137.0 106.2 0.2/105.8
> 4 37.8 107.8 0.1/106.1
> 32 153.9 107.8 0.2/105.9
> 128k 1 115.4 16.3 138.6 125.2 0.3/125.3
> 4 68.8 127.8 0.2/125.6
> 32 274.6 127.8 0.2/126.2
> 1024k 1 124.5 54.2 138.9 133.6 1.0/133.3
> 4 159.5 136.6 0.2/134.3
> 32 349.7 136.5 0.3/133.6
>
> And for a 1tb file on ext4fs with dioread_nolock:
>
> BlkSz Trd linRd rndRd linWr rndWr rndR/W
> 4k 1 15.7 0.6 14.6 9.4 0.1/ 9.0
> 4 2.6 9.4 0.3/ 8.6
> 32 10.0 9.4 4.5/ 5.3
> 16k 1 50.9 2.4 56.7 36.0 0.3/ 35.2
> 4 10.1 36.4 1.5/ 34.6
> 32 38.7 36.4 17.3/ 21.0
> 64k 1 95.2 8.9 136.5 106.8 1.0/106.3
> 4 37.7 108.4 5.2/103.3
> 32 152.7 108.6 57.4/ 74.0
> 128k 1 115.1 16.3 138.8 125.8 1.2/126.4
> 4 68.9 128.5 5.7/124.0
> 32 276.1 128.6 70.8/ 98.5
> 1024k 1 128.5 81.9 138.9 134.4 5.1/132.3
> 4 253.4 137.4 19.1/126.8
> 32 385.1 137.4 111.7/ 92.3
>
> These are complete test results. First 4 result
> columns are merely identical, the difference is
> within last column. Here they are together:
>
> BlkSz Trd Raw Ext4nolock Ext4dflt
> 4k 1 0.1/ 9.1 0.1/ 9.0 0.0/ 9.0
> 4 0.4/ 8.4 0.3/ 8.6 0.0/ 8.9
> 32 4.7/ 5.4 4.5/ 5.3 0.0/ 8.9
> 16k 1 0.3/ 34.7 0.3/ 35.2 0.1/ 33.6
> 4 1.5/ 31.4 1.5/ 34.6 0.0/ 33.5
> 32 17.5/ 20.4 17.3/ 21.0 0.1/ 33.6
> 64k 1 1.1/105.8 1.0/106.3 0.2/105.8
> 4 4.7/102.6 5.2/103.3 0.1/106.1
> 32 57.9/ 73.3 57.4/ 74.0 0.2/105.9
> 128k 1 1.1/125.6 1.2/126.4 0.3/125.3
> 4 6.3/122.8 5.7/124.0 0.2/125.6
> 32 70.3/ 98.6 70.8/ 98.5 0.2/126.2
> 1024k 1 5.0/132.3 5.1/132.3 1.0/133.3
> 4 19.2/127.1 19.1/126.8 0.2/134.3
> 32 117.2/ 90.1 111.7/ 92.3 0.3/133.6
>
> Ext4 with dioread_nolock (middle column) behaves close to
> raw device. But default ext4 greatly prefers writes over
> reads, reads are almost non-existent.
>
> This is, again, more or less a microbenchmark. Where it
> come from is my attempt to simulate an (oracle) database
> workload (many years ago, when larger and more standard
> now benchmarks weren't (freely) available). And there,
> on a busy DB, the difference is quite well-visible.
> In short, any writer makes all readers to wait. Once
> we start writing something, all users immediately notice.
> With dioread_nolock they don't complain anymore.
>
> There's some more background around this all. Right
> now I'm evaluating a new machine for our current database.
> Old hardware had 2Gb RAM so it had _significant_ memory
> pressure, and lots of stuff weren't able to be cached.
> New machine has 128Gb of RAM, which will ensure that
> all important stuff is in cache. So the effect of this
> read/write disbalance will be much less visible.
>
> For example, we've a dictionary (several tables) with
> addresses - towns, streets, even buildings. When they
> enter customer information they search in these dicts.
> With current 2Gb memory thses dictionaries can't be
> kept in memory, so they gets read from disk again every
> time someone enters customer information, and this is
> what they do all the time. So no doubt disk access is
> very important here.
>
> On a new hardware, obviously, all these dictionaries will
> be in memory after first access, so even if all reads will
> wait till any write completes, it wont be as dramatic as
> it is now.
>
> That to say, -- maybe I'm really paying too much attention
> for a wrong problem. So far, on a new machine, I don't see
> actual noticeable difference between dioread_nolock and
> without that option.
>
> (BTW, I found no way to remount a filesystem to EXclude
> that option, I have to umount and mount it in order to
> switch from using dioread_nolock to not using it. Is
> there a way?)
I think the command to do this is:
mount -o remount,dioread_lock /dev/xxx <mountpoint>
Now looking at this, I guess it is not very intuitive that the option to
turn off dioread_nolock is dioread_lock instead of nodioread_nolock,
but nodioread_nolock does look ugly. Maybe we should try to support
both ways.
Jiaying
>
> Thanks,
>
> /mjt
>
>> We are based on RHEL6, and dioread_nolock isn't there by now and a large
>> number of our product system use direct read and buffer write. So if
>> your test proves to be promising, I guess our company can arrange some
>> resources to try to work it out.
>>
>> Thanks
>> Tao
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists