linux-kernel - Re: Slow disks.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTi=VAJbVSzX_DH9_==a3RonLJmawYu3MzF-uqGXC@mail.gmail.com>
Date:	Mon, 20 Dec 2010 13:32:44 -0500
From:	Greg Freemyer <greg.freemyer@...il.com>
To:	Bruno Prémont <bonbons@...ux-vserver.org>
Cc:	Rogier Wolff <R.E.Wolff@...wizard.nl>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Slow disks.

On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont
<bonbons@...ux-vserver.org> wrote:
> Hi,
>
> [ccing linux-ide]
>
> Please provide the part of kernel log showing initialization of your
> disk controller(s) as well as detection of all the discs.
> Verbose lspci output for the disc controller and $(smartctl -i -A $disk)
> output might be useful as well.
>
> Did you try the individual discs on a completely different system (e.g.
> plain desktop system) and what revision of SATA are both components
> supporting?
>
> Bruno
>
>
> On Mon, 20 December 2010 Rogier Wolff <R.E.Wolff@...Wizard.nl> wrote:
>> Hi,
>>
>> A friend of mine has a server in a datacenter somewhere. His machine
>> is not working properly: most of his disks take 10-100 times longer
>> to process each IO request than normal.
>>
>> iostat -kx 10 output:
>> Device: rrqm/s wrqm/s r/s  w/s  rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
>> sdd     0.30   0.00   0.40 1.20 2.80  1.10  4.88     0.43  271.50 271.44  43.43
>>
>> shows that in this 10 second period, the disk was busy for 4.3 seconds
>> and serviced 15-16 requests during that time.
>>
>> Normal disks show "svctm" of around 10-20ms.
>>
>> Now you might say: It's his disk that's broken.
>> Well no: I don't believe that all four of his disks are broken.
>> (I just showed you output about one disk, but there are 4 disks in there
>> all behaving similar, but some are worse than others.)
>>
>> Or you might say: It's his controller that's broken. So we thought
>> too. We replaced the onboard sata controller with a 4-port sata
>> card. Now they are running off the external sata card... Slightly
>> better, but not by much.
>>
>> Or you might say: it's hardware. But suppose the disk doesn't properly
>> transfer the data 9 times out of 10, wouldn't the driver tell us
>> SOMETHING in the syslog that things are not fine and dandy? Moreover,
>> In the case above, 12kb were transferred in 4.3 seconds. If CRC errors
>> were happening, the interface would've been able to transfer over
>> 400Mb during that time. So every transfer would need to be retried on
>> average 30000 times... Not realistic. If that were the case, we'd
>> surely hit a maximum retry limit every now and then?
>>
>>
>> These syptoms started when the system was running 2.6.33, but are
>> still present now the system has been upgraded to 2.6.36.
>>
>> Is there anything you can suggest to get to the root of this problem?
>> Could this be a software issue with the driver? Can we enable some
>> driver debugging to find out what is wrong?
>>
>> Any help will be appreciated.
>>
>>       Roger.

My personal guess would definitely be hardware.  The only common
component I can think of is power.  SATA is very sensitive to
requiring high-quality power.  Much more so than IDE.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/