lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200808041118.19743.bs@q-leap.de>
Date:	Mon, 4 Aug 2008 11:18:18 +0200
From:	Bernd Schubert <bs@...eap.de>
To:	"J. Bruce Fields" <bfields@...ldses.org>
Cc:	Neil Brown <neilb@...e.de>, Michael Shuey <shuey@...due.edu>,
	Shehjar Tikoo <shehjart@....unsw.edu.au>,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org,
	rees@...i.umich.edu, aglo@...i.umich.edu
Subject: Re: high latency NFS

On Monday 04 August 2008 03:11:58 J. Bruce Fields wrote:
> On Mon, Aug 04, 2008 at 10:32:06AM +1000, Dave Chinner wrote:
> > On Fri, Aug 01, 2008 at 03:15:59PM -0400, J. Bruce Fields wrote:
> > > On Fri, Aug 01, 2008 at 05:23:20PM +1000, Dave Chinner wrote:
> > > > On Thu, Jul 31, 2008 at 05:03:05PM +1000, Neil Brown wrote:
> > > > > You might want to track the max length of the request queue too and
> > > > > start more threads if the queue is long, to allow a quick ramp-up.
> > > >
> > > > Right, but even request queue depth is not a good indicator. You
> > > > need to leep track of how many NFSDs are actually doing useful
> > > > work. That is, if you've got an NFSD on the CPU that is hitting
> > > > the cache and not blocking, you don't need more NFSDs to handle
> > > > that load because they can't do any more work than the NFSD
> > > > that is currently running is.
> > > >
> > > > i.e. take the solution that Greg banks used for the CPU scheduler
> > > > overload issue (limiting the number of nfsds woken but not yet on
> > > > the CPU),
> > >
> > > I don't remember that, or wasn't watching when it happened.... Do you
> > > have a pointer?
> >
> > Ah, I thought that had been sent to mainline because it was
> > mentioned in his LCA talk at the start of the year. Slides
> > 65-67 here:
> >
> > http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/41.pdf
>
> OK, so to summarize: when the rate of incoming rpc's is very high (and,
> I guess, when we're serving everything out of cache and don't have IO
> wait), all the nfsd threads will stay runable all the time.  That keeps
> userspace processes from running (possibly for "minutes").  And that's a
> problem even on a server dedicated only to nfs, since it affects portmap
> and rpc.mountd.

Even worse, it affects user space HA software such as heartbeat and everyone 
with reasonable timeouts will see spurious 'failures'. 


-- 
Bernd Schubert
Q-Leap Networks GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ