linux-kernel - Re: 3ware disk latency?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0607261440470.4568@twinlark.arctic.org>
Date:	Wed, 26 Jul 2006 14:53:17 -0700 (PDT)
From:	dean gaudet <dean@...tic.org>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
cc:	Jan Kasprzak <kas@...muni.cz>, linux-kernel@...r.kernel.org
Subject: Re: 3ware disk latency?

On Wed, 26 Jul 2006, Alan Cox wrote:

> Ar Mer, 2006-07-26 am 09:52 -0700, ysgrifennodd dean gaudet:
> > On Mon, 10 Jul 2006, Jan Kasprzak wrote:
> > 
> > > 	Does anybody experience the similar latency problems with
> > > 3ware 95xx controllers? Thanks,
> > 
> > i think the 95xx is really bad at sharing its cache fairly amongst 
> > multiple devices... i have a 9550sx-8lp with 3 exported units: a raid1, a 
> > raid10 and a jbod.  i was zeroing the jbod with dd and it absolutely 
> > destroyed the latency of the other two units.
> 
> Even on the older 3ware I found it neccessary to bump up the queue sizes
> for the 3ware. There's a thread from about 6 months ago (maybe a bit
> more) with Jeff Merkey included where it made a huge difference to his
> performance playing with the parameters. Might be worth a scan through
> the archive.

yeah i've done some experiments with a multithreaded random O_DIRECT i/o 
generator where i showed the 9550sx peaking at around 192 threads... where 
"peak" means the latency was still reasonable (measured by iostat).  
obviously when going past 128 threads i found it necessary to increase 
nr_requests to see an improvement in io/s.  there was some magic number 
past 256 where i found the 3ware must have choked its own intnernal queue 
and latency became awful (istr it being 384ish but it probably depends on 
the io workload).

unfortunately when i did the experiment i neglected to perform 
simultaneous tests on more than one 3ware unit on the same controller.  i 
got great results from a raid1 or from a raid10 (even a raid5)... but 
never i only tested one unit at a time.

my production config has a raid1 and a raid10 unit on the same controller 
-- with 2 spare ports.  i popped a disk into one of the spare ports, told 
3ware it was JBOD and then proceeded to "dd if=/dev/zero of=/dev/sdX 
bs=1M"... and wham my system load avg shot up to 80+ (from ~4), all in D 
state.

i tried a bunch of things... lowering and raising nr_requests on all the 
devices, changing between cfq, deadline, etc.  the only solution i found 
to be effective in retaining acceptable latencies on the raid1/raid10 was 
to use "oflag=direct" for dd, and tell the 3ware to not use nvram for the 
jbod disk.

my suspicion is the 3ware lacks any sort of "fairness" in its sharing of 
buffer space between multiple units on the same controller.  and by 
disabling the write caching it limits the amount of controller memory that 
the busy disk can consume.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/