lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 28 Sep 2007 10:40:00 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Chakri n <chakriin5@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-pm <linux-pm@...ts.linux-foundation.org>,
	lkml <linux-kernel@...r.kernel.org>, nfs@...ts.sourceforge.net
Subject: Re: A unresponsive file system can hang all I/O in the system on
	linux-2.6.23-rc6 (dirty_thresh problem?)

[ please don't top-post! ]

On Fri, 2007-09-28 at 01:27 -0700, Chakri n wrote:

> On 9/27/07, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> > On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote:
> >
> > > What we _don't_ want to happen is for other processes which are writing to
> > > other, non-dead devices to get collaterally blocked.  We have patches which
> > > might fix that queued for 2.6.24.  Peter?
> >
> > Nasty problem, don't do that :-)
> >
> > But yeah, with per BDI dirty limits we get stuck at whatever ratio that
> > NFS server/mount (?) has - which could be 100%. Other processes will
> > then work almost synchronously against their BDIs but it should work.
> >
> > [ They will lower the NFS-BDI's ratio, but some fancy clipping code will
> >   limit the other BDIs their dirty limit to not exceed the total limit.
> >   And with all these NFS pages stuck, that will still be nothing. ]
> >
> Thanks.
> 
> The BDI dirty limits sounds like a good idea.
> 
> Is there already a patch for this, which I could try?

v2.6.23-rc8-mm2

> I believe it works like this,
> 
> Each BDI, will have a limit. If the dirty_thresh exceeds the limit,
> all the I/O on the block device will be synchronous.
> 
> so, if I have sda & a NFS mount, the dirty limit can be different for
> each of them.
> 
> I can set dirty limit for
>  -  sda to be 90% and
>  -  NFS mount to be 50%.
> 
> So, if the dirty limit is greater than 50%, NFS does synchronously,
> but sda can work asynchronously, till dirty limit reaches 90%.

Not quite, the system determines the limit itself in an adaptive
fashion.

  bdi_limit = total_limit * p_bdi

Where p is a faction [0,1], and is determined by the relative writeout
speed of the current BDI vs all other BDIs.

So if you were to have 3 BDIs (sda, sdb and 1 nfs mount), and sda is
idle, and the nfs mount gets twice as much traffic as sdb, the ratios
will look like:

 p_sda: 0
 p_sdb: 1/3
 p_nfs: 2/3

Once the traffic exceeds the write speed of the device we build up a
backlog and stuff gets throttled, so these proportions converge to the
relative write speed of the BDIs when saturated with data.

So what can happen in your case is that the NFS mount is the only one
with traffic is will get a fraction of 1. If it then disconnects like in
your case, it will still have all of the dirty limit pinned for NFS.

However other devices will at that moment try to maintain a limit of 0,
which ends up being similar to a sync mount.

So they'll not get stuck, but they will be slow.

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ