linux-kernel - Re: howto combat highly pathologic latencies on a server?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100310232940.GB16344@discord.disaster>
Date:	Thu, 11 Mar 2010 10:29:40 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Hans-Peter Jansen <hpj@...la.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: howto combat highly pathologic latencies on a server?

On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> in a commercial setting, with all those evil elements at work like VMware, 
> NFS, XFS, openSUSE, diskless fat clients, you name it...
> 
> System description:
> 
> Dual socket board: Tyan S2892, 2 * AMD Opteron 285 @ 2.6 GHz, 8 GB RAM, 
> PRO/1000 MT Dual Port Server NIC, Areca ARC-1261 16 channel RAID 
> controller, with 3 sets of RAID 5 arrays attached:
> System is running from: 4 * WD Raptor 150GB (WDC WD1500ADFD-00NLR5)
> VMware (XP-) images used via NFS: 6 * WD Raptor 74 GB (WDC WD740GD-00FLA0)
> Homes, diskless clients, appl. data: 4 * Hitachi 1 GB (HDE721010SLA330).
> 
> All filesystems are xfs. The server serves about 20 diskless PC's, most use 
> an Intel Pro/1000 GT NIC, all attached on a 3com 3870 48-port 10/100/1000 
> switch.
> 
> OS is openSUSE 11.1/i586 with kernel 2.6.27.45 (the same kernel as SLE 11).
> 
> It serves mostly NFS, SMB, and does mild database (MySQL) and email 
> processing (Cyrus IMAP, Postfix...). It also drives an ancient (but very 
> important) terminal based transport order mgmt system, that often syncs 
> it's data. Unfortunately, it's also used for running a VMware-Server 
> (1.0.10) XP-client, that itself does simple database stuff (employers time 
> registration).
> 
> Users generally describe this system as slow, although the load on the 
> server is less than 1.5 most of the time. Interestingly, the former system, 
> using ancient kernels (2.6.11, SuSE 9.3) was perceived significantly 
> quicker (but not fast..).
> 
> The diskless clients are started once in the morning (taking 60-90 sec), use 
> an aufs2 layered NFS mount for their openSUSE 11.1 system, and simple NFS 
> mounted homes and shared folders. 2/3th also need running a VMware XP 
> client (also NFS mounted). Their CPUs range from Athlon 64 3000+ up to 
> Phenom X4 955, with 2 or 4 GB RAM.
> 
> While this system usually operates fine, it suffers from delays, that are 
> displayed in latencytop as: "Writing page to disk:     8425,5 ms": 
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec 
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, 
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> 
> From other observations, this issue "feels" like it is induced by single 
> syncronisation points in the block layer, eg. if I create heavy IO load on 
> one RAID array, say resizing a VMware disk image, it can take up to a 
> minute to log in by ssh, although the ssh login does not touch this area at 
> all (different RAID arrays). Note, that the latencytop snapshots above are 
> made during normal operation, not this kind of load..
> 
> The network side looks fine, as its main interface rarely passes 40MiB/s, 
> and usually keeps in the 1 Kib/s - 5 MiB/s range. 
> 
> The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota 
> (yes, I do have a BBU on the areca, and disk write cache is effectively 
> turned off). 

Make sure the filesystem has the "lazy-count=1" attribute set (use
xfs_info to check, xfs_admin to change). That will remove the
superblock from most transactions and significant reduce latency of
transactions as they serialise while locking it...

Cheers,

Dave
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/