linux-kernel - Re: [PATCH] nfs: fix congestion control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1169014515.6065.5.camel@lade.trondhjem.org>
Date:	Wed, 17 Jan 2007 01:15:15 -0500
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH] nfs: fix congestion control

On Wed, 2007-01-17 at 03:41 +0100, Peter Zijlstra wrote:
> On Tue, 2007-01-16 at 17:27 -0500, Trond Myklebust wrote:
> > On Tue, 2007-01-16 at 23:08 +0100, Peter Zijlstra wrote:
> > > Subject: nfs: fix congestion control
> > > 
> > > The current NFS client congestion logic is severely broken, it marks the
> > > backing device congested during each nfs_writepages() call and implements
> > > its own waitqueue.
> > > 
> > > Replace this by a more regular congestion implementation that puts a cap
> > > on the number of active writeback pages and uses the bdi congestion waitqueue.
> > > 
> > > NFSv[34] commit pages are allowed to go unchecked as long as we are under 
> > > the dirty page limit and not in direct reclaim.
> 
> > 
> > What on earth is the point of adding congestion control to COMMIT?
> > Strongly NACKed.
> 
> They are dirty pages, how are we getting rid of them when we reached the
> dirty limit?

They are certainly _not_ dirty pages. They are pages that have been
written to the server but are not yet guaranteed to have hit the disk
(they were only written to the server's page cache). We don't care if
they are paged in or swapped out on the local client.

\All the COMMIT does, is to ask the server to write the data from its
page cache onto disk. Once that has been done, we can release the pages.
If the commit fails, then we iterate through the whole writepage()
process again. The commit itself does, however, not even look at the
page data.

> > Why 16MB of on-the-wire data? Why not 32, or 128, or ...
> 
> Andrew always promotes a fixed number for congestion control, I pulled
> one from a dark place. I have no problem with a more dynamic solution.
> 
> > Solaris already allows you to send 2MB of write data in a single RPC
> > request, and the RPC engine has for some time allowed you to tune the
> > number of simultaneous RPC requests you have on the wire: Chuck has
> > already shown that read/write performance is greatly improved by upping
> > that value to 64 or more in the case of RPC over TCP. Why are we then
> > suddenly telling people that they are limited to 8 simultaneous writes?
> 
> min(max RPC size * max concurrent RPC reqs, dirty threshold) then?

That would be far preferable. For instance, it allows those who have
long latency fat pipes to actually use the bandwidth optimally when
writing out the data.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/