lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Aug 2011 11:05:21 +0400
From:	Michael Tokarev <mjt@....msk.ru>
To:	Tao Ma <tm@....ma>
CC:	Ted Ts'o <tytso@....edu>, Jiaying Zhang <jiayingz@...gle.com>,
	Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
	sandeen@...hat.com
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)

On 19.08.2011 07:18, Tao Ma wrote:
> Hi Michael,
> On 08/18/2011 02:49 PM, Michael Tokarev wrote:
[]
>> What about current situation, how do you think - should it be ignored
>> for now, having in mind that dioread_nolock isn't used often (but it
>> gives _serious_ difference in read speed), or, short term, fix this
>> very case which have real-life impact already, while implementing a
>> long-term solution?

> So could you please share with us how you test and your test result
> with/without dioread_nolock? A quick test with fio and intel ssd does't
> see much improvement here.

I used my home-grown quick-n-dirty microbenchmark for years to measure
i/o subsystem performance.  Here are the results from 3.0 kernel on
some Hitachi NAS (FC, on brocade adaptors), 14-drive raid10 array.

The numbers are all megabytes/sec transferred (read or written), summed
for all threads.  Leftmost column is the block size; next column is the
number of concurrent threads of the same type.  And the columns are
tests: linear read, random read, linear write, random write, and
concurrent random read and write.

For a raw device:

BlkSz Trd linRd rndRd linWr rndWr  rndR/W
   4k   1  18.3   0.8  14.5   9.6   0.1/  9.1
        4         2.5         9.4   0.4/  8.4
       32        10.0         9.3   4.7/  5.4
  16k   1  59.4   2.5  49.9  35.7   0.3/ 34.7
        4        10.3        36.1   1.5/ 31.4
       32        38.5        36.2  17.5/ 20.4
  64k   1 118.4   9.1 136.0 106.5   1.1/105.8
        4        37.7       108.5   4.7/102.6
       32       153.0       108.5  57.9/ 73.3
 128k   1 125.9  16.5 138.8 125.8   1.1/125.6
        4        68.7       128.7   6.3/122.8
       32       277.0       128.7  70.3/ 98.6
1024k   1  89.9  81.2 138.9 134.4   5.0/132.3
        4       254.7       137.6  19.2/127.1
       32       390.7       137.5 117.2/ 90.1

For ext4fs, 1Tb file, default mount options:

BlkSz Trd linRd rndRd linWr rndWr  rndR/W
   4k   1  15.7   0.6  15.4   9.4   0.0/  9.0
        4         2.6         9.3   0.0/  8.9
       32        10.0         9.3   0.0/  8.9
  16k   1  47.6   2.5  53.2  34.6   0.1/ 33.6
        4        10.2        34.6   0.0/ 33.5
       32        39.9        34.8   0.1/ 33.6
  64k   1 100.5   9.0 137.0 106.2   0.2/105.8
        4        37.8       107.8   0.1/106.1
       32       153.9       107.8   0.2/105.9
 128k   1 115.4  16.3 138.6 125.2   0.3/125.3
        4        68.8       127.8   0.2/125.6
       32       274.6       127.8   0.2/126.2
1024k   1 124.5  54.2 138.9 133.6   1.0/133.3
        4       159.5       136.6   0.2/134.3
       32       349.7       136.5   0.3/133.6

And for a 1tb file on ext4fs with dioread_nolock:

BlkSz Trd linRd rndRd linWr rndWr  rndR/W
   4k   1  15.7   0.6  14.6   9.4   0.1/  9.0
        4         2.6         9.4   0.3/  8.6
       32        10.0         9.4   4.5/  5.3
  16k   1  50.9   2.4  56.7  36.0   0.3/ 35.2
        4        10.1        36.4   1.5/ 34.6
       32        38.7        36.4  17.3/ 21.0
  64k   1  95.2   8.9 136.5 106.8   1.0/106.3
        4        37.7       108.4   5.2/103.3
       32       152.7       108.6  57.4/ 74.0
 128k   1 115.1  16.3 138.8 125.8   1.2/126.4
        4        68.9       128.5   5.7/124.0
       32       276.1       128.6  70.8/ 98.5
1024k   1 128.5  81.9 138.9 134.4   5.1/132.3
        4       253.4       137.4  19.1/126.8
       32       385.1       137.4 111.7/ 92.3

These are complete test results.  First 4 result
columns are merely identical, the difference is
within last column.  Here they are together:

BlkSz Trd     Raw      Ext4nolock  Ext4dflt
   4k   1   0.1/  9.1   0.1/  9.0  0.0/  9.0
        4   0.4/  8.4   0.3/  8.6  0.0/  8.9
       32   4.7/  5.4   4.5/  5.3  0.0/  8.9
  16k   1   0.3/ 34.7   0.3/ 35.2  0.1/ 33.6
        4   1.5/ 31.4   1.5/ 34.6  0.0/ 33.5
       32  17.5/ 20.4  17.3/ 21.0  0.1/ 33.6
  64k   1   1.1/105.8   1.0/106.3  0.2/105.8
        4   4.7/102.6   5.2/103.3  0.1/106.1
       32  57.9/ 73.3  57.4/ 74.0  0.2/105.9
 128k   1   1.1/125.6   1.2/126.4  0.3/125.3
        4   6.3/122.8   5.7/124.0  0.2/125.6
       32  70.3/ 98.6  70.8/ 98.5  0.2/126.2
1024k   1   5.0/132.3   5.1/132.3  1.0/133.3
        4  19.2/127.1  19.1/126.8  0.2/134.3
       32 117.2/ 90.1 111.7/ 92.3  0.3/133.6

Ext4 with dioread_nolock (middle column) behaves close to
raw device.  But default ext4 greatly prefers writes over
reads, reads are almost non-existent.

This is, again, more or less a microbenchmark.  Where it
come from is my attempt to simulate an (oracle) database
workload (many years ago, when larger and more standard
now benchmarks weren't (freely) available).  And there,
on a busy DB, the difference is quite well-visible.
In short, any writer makes all readers to wait.  Once
we start writing something, all users immediately notice.
With dioread_nolock they don't complain anymore.

There's some more background around this all.  Right
now I'm evaluating a new machine for our current database.
Old hardware had 2Gb RAM so it had _significant_ memory
pressure, and lots of stuff weren't able to be cached.
New machine has 128Gb of RAM, which will ensure that
all important stuff is in cache.  So the effect of this
read/write disbalance will be much less visible.

For example, we've a dictionary (several tables) with
addresses - towns, streets, even buildings.  When they
enter customer information they search in these dicts.
With current 2Gb memory thses dictionaries can't be
kept in memory, so they gets read from disk again every
time someone enters customer information, and this is
what they do all the time.  So no doubt disk access is
very important here.

On a new hardware, obviously, all these dictionaries will
be in memory after first access, so even if all reads will
wait till any write completes, it wont be as dramatic as
it is now.

That to say, -- maybe I'm really paying too much attention
for a wrong problem.  So far, on a new machine, I don't see
actual noticeable difference between dioread_nolock and
without that option.

(BTW, I found no way to remount a filesystem to EXclude
that option, I have to umount and mount it in order to
switch from using dioread_nolock to not using it.  Is
there a way?)

Thanks,

/mjt

> We are based on RHEL6, and dioread_nolock isn't there by now and a large
> number of our product system use direct read and buffer write. So if
> your test proves to be promising, I guess our company can arrange some
> resources to try to work it out.
> 
> Thanks
> Tao

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ