linux-kernel - NCQ/TCQ performance review (was: SATA RAID5 speed drop of 100 MB/s)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <200706241749.14338.a1426z@gawab.com>
Date:	Sun, 24 Jun 2007 17:49:14 +0300
From:	Al Boldi <a1426z@...ab.com>
To:	linux-kernel@...r.kernel.org
Subject: NCQ/TCQ performance review (was: SATA RAID5 speed drop of 100 MB/s)

Michael Tokarev wrote:
> Jeff Garzik wrote:
> > IN THEORY, RAID performance should /increase/ due to additional queued
> > commands available to be sent to the drive.  NCQ == command queueing ==
> > sending multiple commands to the drive, rather than one-at-a-time like
> > normal.
> >
> > But hdparm isn't the best test for that theory, since it does not
> > simulate the transactions like real-world MD device usage does.
> >
> > We have seen buggy NCQ firmwares where performance decreases, so it is
> > possible that NCQ just isn't good on your drives.
>
> By the way, I did some testing of various drives, and NCQ/TCQ indeed
> shows some difference -- with multiple I/O processes (like "server"
> workload), IF NCQ/TCQ is implemented properly, especially in the
> drive.
>
> For example, this is a good one:
>
> Single Seagate 74Gb SCSI drive (10KRPM)
>
> BlkSz Trd linRd rndRd linWr  rndWr  linR/W     rndR/W
>    4k   1  66.4   0.5   0.6   0.5   0.6/ 0.6   0.4/ 0.2
>         2         0.6         0.6              0.5/ 0.1
>         4         0.7         0.6              0.6/ 0.2
>   16k   1  84.8   2.0   2.5   1.9   2.5/ 2.5   1.6/ 0.6
>         2         2.3         2.1              2.0/ 0.6
>         4         2.7         2.5              2.3/ 0.6
>   64k   1  84.8   7.4   9.3   7.2   9.4/ 9.3   5.8/ 2.2
>         2         8.6         7.9              7.3/ 2.1
>         4         9.9         9.1              8.1/ 2.2
>  128k   1  84.8  13.6  16.7  12.9  16.9/16.6  10.6/ 3.9
>         2        15.6        14.4             13.5/ 3.2
>         4        17.9        16.4             15.7/ 2.7
>  512k   1  84.9  34.0  41.9  33.3  29.0/27.1  22.4/13.2
>         2        36.9        34.5             30.7/ 8.1
>         4        40.5        38.1             33.2/ 8.3
> 1024k   1  83.1  36.0  55.8  34.6  28.2/27.6  20.3/19.4
>         2        45.2        44.1             36.4/ 9.9
>         4        48.1        47.6             40.7/ 7.1
>
> The tests are direct-I/O over whole drive (/dev/sdX), with
> either 1, 2, or 4 threads doing sequential or random reads
> or writes in blocks of a given size.  For the R/W tests,
> we've 2, 4 or 8 threads running in total (1, 2 or 4 readers
> and the same amount of writers).  Numbers are MB/sec, as
> totals (summary) for all threads.
>
> Especially interesting is the very last column - random R/W
> in parallel.  In almost all cases, more threads gives larger
> total speed (I *guess* it's due to internal optimisations in
> the drive -- with more threads the drive has more chances to
> reorder commands to minimize seek time etc).
>
> The only thing I don't understand is why with larger I/O block
> size we see write speed drop with multiple threads.

It seems that the drive favors reads over writes.

> And in contrast to the above, here's another test run, now
> with Seagate SATA ST3250620AS ("desktop" class) 250GB
> 7200RPM drive:
>
> BlkSz Trd linRd rndRd linWr rndWr   linR/W    rndR/W
>    4k   1  47.5   0.3   0.5   0.3   0.3/ 0.3   0.1/ 0.1
>         2         0.3         0.3              0.2/ 0.1
>         4         0.3         0.3              0.2/ 0.2
>   16k   1  78.4   1.1   1.8   1.1   0.9/ 0.9   0.6/ 0.6
>         2         1.2         1.1              0.6/ 0.6
>         4         1.3         1.2              0.6/ 0.6
>   64k   1  78.4   4.3   6.7   4.0   3.5/ 3.5   2.1/ 2.2
>         2         4.5         4.1              2.2/ 2.3
>         4         4.7         4.2              2.3/ 2.4
>  128k   1  78.4   8.0  12.6   7.2   6.2/ 6.2   3.9/ 3.8
>         2         8.2         7.3              4.1/ 4.0
>         4         8.7         7.7              4.3/ 4.3
>  512k   1  78.5  23.1  34.0  20.3  17.1/17.1  11.3/10.7
>         2        23.5        20.6             11.3/11.4
>         4        24.7        21.3             11.6/11.8
> 1024k   1  78.4  34.1  33.5  24.6  19.6/19.5  16.0/12.7
>         2        33.3        24.6             15.4/13.8
>         4        34.3        25.0             14.7/15.0
>
> Here, the (total) I/O speed does not depend on the number
> of threads.  From which I conclude that the drive does
> not reorder/optimize commands internally, even if NCQ is
> enabled (queue depth is 32).
>
> (And two notes.  First of all, for some, those tables may
> look.. strange, showing too low speed.  Note the block
> size, and note I'm doing *direct* *random* I/O, without
> buffering in the kernel.  Yes, even the most advanced
> modern drives are very slow in this workload, due to
> seek times and rotation latency -- the disk is maxing
> out at the theoretical requests/secound -- take average
> seek time plus rotation latency (usually given in the
> drive specs), and divide one secound to the calculated
> value -- you'll see about 200..250 - that's requests/sec.
> And the numbers - like 0.3Mb/sec write - are very close
> to those 200..250.  In any way, this is not a typical
> workload - file server for example is not like this.
> But it's more or less resembles database workload.
>
> And second, so far I haven't seen a case where a drive
> with NCQ/TCQ enabled works worse than without.  I don't
> want to say there aren't such drives/controllers, but
> it just happen that I haven't seen any.)

Maybe you can post the benchmark source so people can contribute their 
results.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/