[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52CEFE62.7070009@interlog.com>
Date: Thu, 09 Jan 2014 14:54:10 -0500
From: Douglas Gilbert <dgilbert@...erlog.com>
To: Sergey Meirovich <rathamahata@...il.com>,
James Smart <james.smart@...lex.com>
CC: Jan Kara <jack@...e.cz>, linux-scsi <linux-scsi@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Gluk <git.user@...il.com>
Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN
environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On 14-01-08 08:57 AM, Sergey Meirovich wrote:
> Hi James,
>
> On 7 January 2014 22:57, James Smart <james.smart@...lex.com> wrote:
>> Sergey,
>>
>> The Thor chipset is a bit old - a 4Gig adapter. Most of our performance
>> improvements, including parallelization, have gone into the 8G and 16G
>> adapters. But you still should have seen significantly beyond what you
>> reported.
>
> First of all - thanks a lot!
>
> I took Thor because we have exactly the same Thors in some of our
> Solaris servers. I've also tried 6 different qlogics (mostly 8G) and
> fnic (10G) as well. Surprisingly enough Thor was the fastest one for
> seqwr 4k. Though in most of the cases machines were from our different
> DCs and hence each one connected to yet another storage.
>
>>
>> We did a sanity check some hardware we already had set up with a Thor
>> adapter. We saw 23555 iop/s and 92.1 MB/s without needing to do much, well
>> beyond what you've reported, and still not up to what we know the card can
>> do. There are some inefficiencies from the linux kernel and some locking
>> deltas between our solaris and linux drivers - but not enough to account for
>> what you are seeing.
>>
>> I expect the Direct IO filesystem behavior is the root issue.
>
> The strangest thing to me that this is the problem with sequential
> write. For example the fnic one machine is zoned to EMC XtremIO and
> had results: 14.43Mb/sec 3693.65 Requests/sec for sequential 4k. The
> same fnic machine perfrormed rather impressive for random 4k
> 451.11Mb/sec 115485.02 Requests/sec
You could bypass O_DIRECT and use ddpt together with
a bsg pass-through (bsg is a little faster than sg
for these purposes).
For example:
# lsscsi -g
[0:0:0:0] disk ATA INTEL SSDSC2CW12 400i /dev/sda /dev/sg0
[14:0:0:0] disk Linux scsi_debug 0004 - /dev/sg1
# ddpt if=/dev/bsg/14:0:0:0 bs=512 bpt=128 count=1m
Output file not specified so no copy, just reading input
1048576+0 records in
0+0 records out
time to read data: 0.283566 secs at 1893.28 MB/sec
bs= should match the block size of the storage device and
the size of each SCSI READ is dictated by bpt= (so 64 KB
in this case).
Such a test should show you if your performance problem
is in the block layer or below, or above the block layer
(at least the point where pass-through commands are
injected).
Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists