lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <201003101817.42812.hpj@urpla.net>
Date:	Wed, 10 Mar 2010 18:17:42 +0100
From:	"Hans-Peter Jansen" <hpj@...la.net>
To:	linux-kernel@...r.kernel.org
Subject: howto combat highly pathologic latencies on a server?

in a commercial setting, with all those evil elements at work like VMware, 
NFS, XFS, openSUSE, diskless fat clients, you name it...

System description:

Dual socket board: Tyan S2892, 2 * AMD Opteron 285 @ 2.6 GHz, 8 GB RAM, 
PRO/1000 MT Dual Port Server NIC, Areca ARC-1261 16 channel RAID 
controller, with 3 sets of RAID 5 arrays attached:
System is running from: 4 * WD Raptor 150GB (WDC WD1500ADFD-00NLR5)
VMware (XP-) images used via NFS: 6 * WD Raptor 74 GB (WDC WD740GD-00FLA0)
Homes, diskless clients, appl. data: 4 * Hitachi 1 GB (HDE721010SLA330).

All filesystems are xfs. The server serves about 20 diskless PC's, most use 
an Intel Pro/1000 GT NIC, all attached on a 3com 3870 48-port 10/100/1000 
switch.

OS is openSUSE 11.1/i586 with kernel 2.6.27.45 (the same kernel as SLE 11).

It serves mostly NFS, SMB, and does mild database (MySQL) and email 
processing (Cyrus IMAP, Postfix...). It also drives an ancient (but very 
important) terminal based transport order mgmt system, that often syncs 
it's data. Unfortunately, it's also used for running a VMware-Server 
(1.0.10) XP-client, that itself does simple database stuff (employers time 
registration).

Users generally describe this system as slow, although the load on the 
server is less than 1.5 most of the time. Interestingly, the former system, 
using ancient kernels (2.6.11, SuSE 9.3) was perceived significantly 
quicker (but not fast..).

The diskless clients are started once in the morning (taking 60-90 sec), use 
an aufs2 layered NFS mount for their openSUSE 11.1 system, and simple NFS 
mounted homes and shared folders. 2/3th also need running a VMware XP 
client (also NFS mounted). Their CPUs range from Athlon 64 3000+ up to 
Phenom X4 955, with 2 or 4 GB RAM.

While this system usually operates fine, it suffers from delays, that are 
displayed in latencytop as: "Writing page to disk:     8425,5 ms": 
ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec 
range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, 
ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.

>From other observations, this issue "feels" like it is induced by single 
syncronisation points in the block layer, eg. if I create heavy IO load on 
one RAID array, say resizing a VMware disk image, it can take up to a 
minute to log in by ssh, although the ssh login does not touch this area at 
all (different RAID arrays). Note, that the latencytop snapshots above are 
made during normal operation, not this kind of load..

The network side looks fine, as its main interface rarely passes 40MiB/s, 
and usually keeps in the 1 Kib/s - 5 MiB/s range. 

The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota 
(yes, I do have a BBU on the areca, and disk write cache is effectively 
turned off). 

The clients mount their system:
/:ro/rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nointr,nolock,proto=tcp,
timeo=600,retrans=2,sec=sys,mountvers=3,mountproto=udp
/home: similar
/shared: without nolock

Might later kernels mitigate this problem? As this is a production system, 
that is used 6.5 days a week, I cannot do dangerous experiments, also 
switching to 64 bit is a problem due to the legacy stuff described above...
OTOH, my users suffer from this, and anything helping in this respect is 
highly appreciated.

Thanks in advance,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ