linux-kernel - xfs, aacraid 2.6.27 => 2.6.32 results in 6 times slowdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C0E13A7.20402@msgid.tls.msk.ru>
Date:	Tue, 08 Jun 2010 13:55:51 +0400
From:	Michael Tokarev <mjt@....msk.ru>
To:	Linux-kernel <linux-kernel@...r.kernel.org>
Subject: xfs, aacraid 2.6.27 => 2.6.32 results in 6 times slowdown

Hello.

I've got a.. difficult issue here, and am asking if anyone else
has some expirence or information about it.

Production environment (database).  Machine with an Adaptec
RAID SCSI controller, 6 drives in raid10 array, XFS filesystem
and Oracle database on top of it (with - hopefully - proper
sunit/swidth).

Upgrading kernel from 2.6.27 to 2.6.32, and users starts screaming
about very bad performance.  Iostat reports increased I/O latencies,
I/O time increases from ~5ms to ~30ms.  Switching back to 2.6.27,
and everything is back to normal (or, rather, usual).

I tried testing I/O with a sample program which performs direct random
I/O on a given device, and all speeds are actually better in .32
compared with .27, except of random concurrent r+w test, where .27
gives a bit more chances to reads than .32.  Looking at the synthetic
tests I'd expect .32 to be faster, but apparently it is not.

This is only one machine here which is still running 2.6.27, all the
rest are upgraded to 2.6.32, and I see good performance of .32 there.
But this is also the only machine with hardware raid controller, which
is onboard and hence not easy to get rid of, so I'm sorta forced to
use it (I prefer software raid solution because of numerous reasons).

One possible cause of this that comes to mind is block device write
barriers.  But I can't find when they're actually implemented.

The most problematic issue here is that this is only one machine that
behaves like this, and it is a production server, so I've very little
chances to experiment with it.

So before the next try, I'd love to have some suggestions about what
to look for.   In particular, I think it's worth the effort to look
at write barriers, but again, I don't know how to check if they're
actually being used.

Anyone have suggestions for me to collect and to look at?

Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/