lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140110104837.GG26378@quack.suse.cz>
Date:	Fri, 10 Jan 2014 11:48:37 +0100
From:	Jan Kara <jack@...e.cz>
To:	Sergey Meirovich <rathamahata@...il.com>
Cc:	Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@...radead.org>,
	linux-scsi <linux-scsi@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Gluk <git.user@...il.com>
Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN
 environment. ~3 times slower then Solars 10 with the same HBA/Storage.

On Fri 10-01-14 12:36:22, Sergey Meirovich wrote:
> Hi Jan,
> 
> On 10 January 2014 11:36, Jan Kara <jack@...e.cz> wrote:
> > On Thu 09-01-14 12:11:16, Sergey Meirovich wrote:
> ...
> >> I've done preallocation on fnic/XtremIO as Christoph suggested.
> >>
> >> [root@...-poc-gtsxdb3 mnt]# sysbench --max-requests=0
> >> --file-extra-flags=direct  --test=fileio --num-threads=4
> >> --file-total-size=10G --file-io-mode=async --file-async-backlog=1024
> >> --file-rw-ratio=1 --file-fsync-freq=0 --max-requests=0
> >> --file-test-mode=seqwr --max-time=100 --file-block-size=4K prepare
> >> sysbench 0.4.12:  multi-threaded system evaluation benchmark
> >>
> >> 128 files, 81920Kb each, 10240Mb total
> >> Creating files for the test...
> >> [root@...-poc-gtsxdb3 mnt]# du -k test_file.* | awk '{print $1}' |sort |uniq
> >> 81920
> >> [root@...-poc-gtsxdb3 mnt]# fallocate -l 81920k test_file.*
> >>
> >>              Results: 13.042Mb/sec 3338.73 Requests/sec
> >>
> >> Probably sysbench is still triggering append DIO scenario. Will say
> >> simple wrapper over io_submit() against already preallocated (and even
> >> filled with data) file provide much better throughput if your theory
> >> is valid?
> >   So I was experimenting a bit. "sysbench prepare" seems to always do
> > synchronous IO from a single thread in the 'prepare' phase regardless of
> > the arguments. So there the reported throughput isn't really relevant.
> >
> > In the 'run' phase it obeys the arguments and indeed when I run fallocate
> > to preallocate files during 'run' phase, it significantly helps the
> > throughput (from 20 MB/s to 55 MB/s on my SATA drive).
> 
> Sorry, Jan. Seems that I presented my findings in a previous mail in
> ambiguous style . I know that prepare phase of sysbench is
> synchronous/probably buffered (because I saw 512k chunks sent down to
> HBA)? IO. I played with blocktrace and have seen that myself during
> prepare:
> 
> [root@...-poc-gtsxdb3 mnt]# sysbench --max-requests=0
> --file-extra-flags=direct  --test=fileio --num-threads=4
> --file-total-size=10G --file-io-mode=async --file-async-backlog=1024
> --file-rw-ratio=1 --file-fsync-freq=0 --max-requests=0
> --file-test-mode=seqwr --max-time=100 --file-block-size=4K prepare
> ...
> 
> Leads to:
> 
> [root@...-poc-gtsxdb3 mnt]# blktrace -d /dev/sdg -o - | blkparse -i -
> | grep 'D  W'
>   8,96  14      604    53.129805520 28114  D  WS 1116160 + 1024 [sysbench]
>   8,96  14      607    53.129843345 28114  D  WS 1120256 + 1024 [sysbench]
>   8,96  14      610    53.129873782 28114  D  WS 1124352 + 1024 [sysbench]
>   8,96  14      613    53.129903703 28114  D  WS 1128448 + 1024 [sysbench]
>   8,96  14      616    53.130957213 28114  D  WS 1132544 + 1024 [sysbench]
>   8,96  14      619    53.130988835 28114  D  WS 1136640 + 1024 [sysbench]
>   8,96  14      622    53.131018854 28114  D  WS 1140736 + 1024 [sysbench]
> ...
  Ah, ok. I misuderstood what you wrote then.

> That result  "13.042Mb/sec 3338.73 Requests/sec" was from run phase
> and before it fallocate had been made.
> 
> blktrace from run phase looks very different. 4k as expected.
> [root@...-poc-gtsxdb3 ~]# blktrace -d /dev/sdg -o - | blkparse -i -  |
> grep 'D  W'
>   8,96   5        3     0.000001874 28212  D  WS 1847296 + 8 [sysbench]
>   8,96   5        7     0.001213728 28212  D  WS 1847304 + 8 [sysbench]
>   8,96   5       11     0.002779304 28212  D  WS 1847312 + 8 [sysbench]
>   8,96   5       15     0.004486445 28212  D  WS 1847320 + 8 [sysbench]
>   8,96   5       19     0.006012133 28212  D  WS 22691864 + 8 [sysbench]
>   8,96   5       23     0.007781553 28212  D  WS 22691896 + 8 [sysbench]
>   8,96   5       27     0.009043404 28212  D  WS 22691928 + 8 [sysbench]
>   8,96   5       31     0.010546829 28212  D  WS 22691960 + 8 [sysbench]
>   8,96   5       35     0.012214468 28212  D  WS 22691992 + 8 [sysbench]
>   8,96   5       39     0.013792616 28212  D  WS 22692024 + 8 [sysbench]
> ...
  Strange - I see:
  8,32   7        2     0.000086080     0  D  WS 1869752 + 1024 [swapper]
  8,32   7        7     0.041126425     0  D  WS 1874712 + 24 [swapper]
  8,32   7        6     0.041054543     0  D  WS 1871792 + 416 [swapper]
  8,32   7        7     0.041126425     0  D  WS 1874712 + 24 [swapper]
  8,32   6      118     0.042761949 28952  D  WS 1875416 + 528 [sysbench]
  8,32   6      143     0.042995928 28952  D  WS 1876888 + 48 [sysbench]
  8,32   5      352     0.045154160 28955  D  WS 1876936 + 168 [sysbench]
  8,32   6      444     0.045527660 28952  D  WS 1878296 + 992 [sysbench]
  ...

  Not ideal but significantly better. The only idea I have: Didn't you run
fallocate(1) before you started the 'run' phase? Because 'run' phase
truncates the files before doing io to them. Can you check that during run
phase (after fallocate is run) the file size is constantly at 80MB?

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ