lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061217223730.GW10054@mea-ext.zmailer.org>
Date:	Mon, 18 Dec 2006 00:37:30 +0200
From:	Matti Aarnio <matti.aarnio@...iler.org>
To:	Randy Dunlap <randy.dunlap@...cle.com>
Cc:	"J.H." <warthog9@...nel.org>, Andrew Morton <akpm@...l.org>,
	Pavel Machek <pavel@....cz>,
	kernel list <linux-kernel@...r.kernel.org>, hpa@...or.com,
	webmaster@...nel.org
Subject: Re: [KORG] Re: kernel.org lies about latest -mm kernel

On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> J.H. wrote:
...
> >The root cause boils down to with git, gitweb and the normal mirroring
> >on the frontend machines our basic working set no longer stays resident
> >in memory, which is forcing more and more to actively go to disk causing
> >a much higher I/O load.  You have the added problem that one of the
> >frontend machines is getting hit harder than the other due to several
> >factors: various DNS servers not round robining, people explicitly
> >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> >probably several other factors we aren't aware of.  This has caused the
> >average load on that machine to hover around 150-200 and if for whatever
> >reason we have to take one of the machines down the load on the
> >remaining machine will skyrocket to 2000+.  

Relaying on DNS and clients doing round-robin load-balancing is doomed.

You really, REALLY, need external L4 load-balancer switches.
(And installation help from somebody who really knows how to do this
kind of services on a cluster.)

Basic config features include, of course:
 - number of parallel active connections with each protocol
 - availability of each served protocol  (e.g. one can shutdown rsync
   at one server, and new rsync connections get pushed elsewere)
 - running load-balance of each served protocol separately
 - server load monitoring and letting it bias new connections to nodes
   not so utterly loaded
 - allowing direct access to each server in addition to the access
   via cluster service
 - some sort of connection persistence, only for HTTP access ?
   (ftp and rsync can do nicely without)

> >Since it's apparent not everyone is aware of what we are doing, I'll
> >mention briefly some of the bigger points.
...
> >- We've cut back on the number of ftp and rsync users to the machines.
> >Basically we are cutting back where we can in an attempt to keep the
> >load from spiraling out of control, this helped a bit when we recently
> >had to take one of the machines down and instead of loads spiking into
> >the 2000+ range we peaked at about 500-600 I believe.

How about having filesystems mounted with "noatime" ?
Or do you already do that ?

> >So we know the problem is there, and we are working on it - we are
> >getting e-mails about it if not daily than every other day or so.  If
> >there are suggestions we are willing to hear them - but the general
> >feeling with the admins is that we are probably hitting the biggest
> >problems already.

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ