linux-kernel - Re: [PATCH] improve the performance of large sequential write NFS workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1261165443.20899.555.camel@laptop>
Date:	Fri, 18 Dec 2009 20:44:03 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Steve Rago <sar@...-labs.com>
Cc:	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	Trond.Myklebust@...app.com, Wu Fengguang <fengguang.wu@...el.com>,
	"jens.axboe" <jens.axboe@...cle.com>
Subject: Re: [PATCH] improve the performance of large sequential write NFS
 workloads

On Fri, 2009-12-18 at 14:33 -0500, Steve Rago wrote:
> > > +void nfs_wait_woutstanding(struct inode *inode)
> > > +{
> > > +   if (nfs_max_woutstanding != 0) {
> > > +           unsigned long background_thresh;
> > > +           unsigned long dirty_thresh;
> > > +           long npages;
> > > +           struct nfs_inode *nfsi = NFS_I(inode);
> > > +
> > > +           get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
> > > +           npages = global_page_state(NR_FILE_DIRTY) +
> > > +                    global_page_state(NR_UNSTABLE_NFS) +
> > > +                    global_page_state(NR_WRITEBACK);
> > > +           if (npages >= background_thresh)
> > > +                   wait_event(nfsi->writes_wq,
> > > +                      atomic_read(&nfsi->writes) < nfs_max_woutstanding);
> > > +   }
> > > +}
> > 
> > This looks utterly busted too, why the global state and not the nfs
> > client's bdi state?
> > 
> > Also, why create this extra workqueue and not simply use the congestion
> > interface that is present? If the congestion stuff doesn't work for you,
> > fix that, don't add extra muck like this.
> 
> Pages are a global resource.  Once we hit the dirty_threshold, the
> system is going to work harder to flush the pages out.  This code
> prevents the caller from creating more dirty pages in this state,
> thereby making matters worse, when eager writeback is enabled.

You misunderstand, dirty limits are per BDI, all those npages might be
for !NFS traffic, in which case forcing the NFS into sync mode might be
the wrong thing to do.

The dirty pages are no longer a global resource in the current Linux
tree.

> This wait queue is used for different purposes than the congestion_wait
> interface.  Here we are preventing the caller from proceeding if there
> are too many NFS writes outstanding for this thread and we are in a
> memory pressure state.  It has nothing to do with the state of the bdi
> congestion. 

I'm thinking it ought to, congestion is exactly that, when the device
gets backed up and need to get moving.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/