lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 20 Dec 2010 15:15:53 +0100 From: Rogier Wolff <R.E.Wolff@...Wizard.nl> To: linux-kernel@...r.kernel.org Subject: Slow disks. Hi, A friend of mine has a server in a datacenter somewhere. His machine is not working properly: most of his disks take 10-100 times longer to process each IO request than normal. iostat -kx 10 output: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdd 0.30 0.00 0.40 1.20 2.80 1.10 4.88 0.43 271.50 271.44 43.43 shows that in this 10 second period, the disk was busy for 4.3 seconds and serviced 15-16 requests during that time. Normal disks show "svctm" of around 10-20ms. Now you might say: It's his disk that's broken. Well no: I don't believe that all four of his disks are broken. (I just showed you output about one disk, but there are 4 disks in there all behaving similar, but some are worse than others.) Or you might say: It's his controller that's broken. So we thought too. We replaced the onboard sata controller with a 4-port sata card. Now they are running off the external sata card... Slightly better, but not by much. Or you might say: it's hardware. But suppose the disk doesn't properly transfer the data 9 times out of 10, wouldn't the driver tell us SOMETHING in the syslog that things are not fine and dandy? Moreover, In the case above, 12kb were transferred in 4.3 seconds. If CRC errors were happening, the interface would've been able to transfer over 400Mb during that time. So every transfer would need to be retried on average 30000 times... Not realistic. If that were the case, we'd surely hit a maximum retry limit every now and then? These syptoms started when the system was running 2.6.33, but are still present now the system has been upgraded to 2.6.36. Is there anything you can suggest to get to the root of this problem? Could this be a software issue with the driver? Can we enable some driver debugging to find out what is wrong? Any help will be appreciated. Roger. -- ** R.E.Wolff@...Wizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. Does it sit on the couch all day? Is it unemployed? Please be specific! Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists