linux-kernel - Re: howto combat highly pathologic latencies on a server?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <72dbd3151003101544w18afc65ubbc85d5bfc435198@mail.gmail.com>
Date:	Wed, 10 Mar 2010 15:44:54 -0800
From:	David Rees <drees76@...il.com>
To:	Hans-Peter Jansen <hpj@...la.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: howto combat highly pathologic latencies on a server?

On Wed, Mar 10, 2010 at 9:17 AM, Hans-Peter Jansen <hpj@...la.net> wrote:
> While this system usually operates fine, it suffers from delays, that are
> displayed in latencytop as: "Writing page to disk:     8425,5 ms":
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
>
> From other observations, this issue "feels" like it is induced by single
> syncronisation points in the block layer, eg. if I create heavy IO load on
> one RAID array, say resizing a VMware disk image, it can take up to a
> minute to log in by ssh, although the ssh login does not touch this area at
> all (different RAID arrays). Note, that the latencytop snapshots above are
> made during normal operation, not this kind of load..
>
> Might later kernels mitigate this problem? As this is a production system,
> that is used 6.5 days a week, I cannot do dangerous experiments, also
> switching to 64 bit is a problem due to the legacy stuff described above...
> OTOH, my users suffer from this, and anything helping in this respect is
> highly appreciated.

Seems like a 2.6.32 based kernel which has per-BDI writeback and "CFQ
low latency mode" changes might help a good deal.  I know that on one
of my bigger machines (similar in specs to yours) which has a lot of
processes which do a decent amount of IO, latency and load average has
gone down after going to a 2.6.32 kernel from a 2.6.31 kernel (Fedora
11 system).

Like Chris suggested, I've also heard that using the noop IO scheduler
can work well on Areca controllers on some kernels and workloads.
It's worth a shot and you can even try changing it at run-time.

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/