lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 17 Jan 2007 01:15:15 -0500
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH] nfs: fix congestion control

On Wed, 2007-01-17 at 03:41 +0100, Peter Zijlstra wrote:
> On Tue, 2007-01-16 at 17:27 -0500, Trond Myklebust wrote:
> > On Tue, 2007-01-16 at 23:08 +0100, Peter Zijlstra wrote:
> > > Subject: nfs: fix congestion control
> > > 
> > > The current NFS client congestion logic is severely broken, it marks the
> > > backing device congested during each nfs_writepages() call and implements
> > > its own waitqueue.
> > > 
> > > Replace this by a more regular congestion implementation that puts a cap
> > > on the number of active writeback pages and uses the bdi congestion waitqueue.
> > > 
> > > NFSv[34] commit pages are allowed to go unchecked as long as we are under 
> > > the dirty page limit and not in direct reclaim.
> 
> > 
> > What on earth is the point of adding congestion control to COMMIT?
> > Strongly NACKed.
> 
> They are dirty pages, how are we getting rid of them when we reached the
> dirty limit?

They are certainly _not_ dirty pages. They are pages that have been
written to the server but are not yet guaranteed to have hit the disk
(they were only written to the server's page cache). We don't care if
they are paged in or swapped out on the local client.

\All the COMMIT does, is to ask the server to write the data from its
page cache onto disk. Once that has been done, we can release the pages.
If the commit fails, then we iterate through the whole writepage()
process again. The commit itself does, however, not even look at the
page data.

> > Why 16MB of on-the-wire data? Why not 32, or 128, or ...
> 
> Andrew always promotes a fixed number for congestion control, I pulled
> one from a dark place. I have no problem with a more dynamic solution.
> 
> > Solaris already allows you to send 2MB of write data in a single RPC
> > request, and the RPC engine has for some time allowed you to tune the
> > number of simultaneous RPC requests you have on the wire: Chuck has
> > already shown that read/write performance is greatly improved by upping
> > that value to 64 or more in the case of RPC over TCP. Why are we then
> > suddenly telling people that they are limited to 8 simultaneous writes?
> 
> min(max RPC size * max concurrent RPC reqs, dirty threshold) then?

That would be far preferable. For instance, it allows those who have
long latency fat pipes to actually use the bandwidth optimally when
writing out the data.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ