lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 05 Oct 2007 12:57:34 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	akpm@...ux-foundation.org, wfg@...l.ustc.edu.cn,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Trond Myklebust <trond.myklebust@....uio.no>
Subject: Re: [PATCH] remove throttle_vm_writeout()

On Fri, 2007-10-05 at 12:27 +0200, Miklos Szeredi wrote:
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 4ef4d22..eff2438 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -88,7 +88,7 @@ static inline void wait_on_inode(struct inode *inode)
> >  int wakeup_pdflush(long nr_pages);
> >  void laptop_io_completion(void);
> >  void laptop_sync_completion(void);
> > -void throttle_vm_writeout(gfp_t gfp_mask);
> > +void throttle_vm_writeout(struct zone *zone, gfp_t gfp_mask);
> >  
> >  /* These are exported to sysctl. */
> >  extern int dirty_background_ratio;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index eec1481..f949997 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -326,11 +326,8 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
> >  }
> >  EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
> >  
> > -void throttle_vm_writeout(gfp_t gfp_mask)
> > +void throttle_vm_writeout(struct zone *zone, gfp_t gfp_mask)
> >  {
> > -	long background_thresh;
> > -	long dirty_thresh;
> > -
> >  	if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> >  		/*
> >  		 * The caller might hold locks which can prevent IO completion
> > @@ -342,17 +339,16 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> >  	}
> >  
> >          for ( ; ; ) {
> > -		get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
> > +		unsigned long thresh = zone_page_state(zone, NR_ACTIVE) +
> > +			zone_page_state(zone, NR_INACTIVE);
> >  
> > -                /*
> > -                 * Boost the allowable dirty threshold a bit for page
> > -                 * allocators so they don't get DoS'ed by heavy writers
> > -                 */
> > -                dirty_thresh += dirty_thresh / 10;      /* wheeee... */
> > +		/*
> > +		 * wait when 75% of the zone's pages are under writeback
> > +		 */
> > +		thresh -= thresh >> 2;
> > +		if (zone_page_state(zone, NR_WRITEBACK) < thresh)
> > +			break;
> >  
> > -                if (global_page_state(NR_UNSTABLE_NFS) +
> > -			global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > -                        	break;
> >                  congestion_wait(WRITE, HZ/10);
> >          }
> >  }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1be5a63..7dd6bd9 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -948,7 +948,7 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
> >  		}
> >  	}
> >  
> > -	throttle_vm_writeout(sc->gfp_mask);
> > +	throttle_vm_writeout(zone, sc->gfp_mask);
> >  
> >  	atomic_dec(&zone->reclaim_in_progress);
> >  	return nr_reclaimed;
> > 
> > 
> 
> I think that's an improvement in all respects.
> 
> However it still does not generally address the deadlock scenario: if
> there's a small DMA zone, and fuse manages to put all of those pages
> under writeout, then there's trouble.
> 
> But it's not really fuse specific.  If it was a normal filesystem that
> did that, and it needed a GFP_DMA allocation for writeout, it is in
> trouble also, as that allocation would fail (at least no deadlock).
> 
> Or is GFP_DMA never used by fs/io writeout paths?

I agree that its not complete. (hence the lack of sign-off etc.)

'normally' writeback pages just need an interrupt to signal the stuff is
written back. ie. the writeback completion path is atomic. 

 [ result of the thinking below -- the above is esp. true for swap pages
   - so maybe we should ensure that !swap traffic can never exceed this
   75% - that way, swap can always us get out of a tight spot ]

Maybe we should make that a requirement, although I see how that becomes
rather hard for FUSE (which is where Andrews PF_MEMALLOC suggestion
comes from - but I really dislike PF_MEMALLOC exposed to userspace).

Limiting FUSE to say 50% (suggestion from your other email) sounds like
a horrible hack to me. - Need more time to think on this.

.. o O [ process of thinking ]

While I think, I might as well tell the problem I had with
throttle_vm_writeout(), which is nfs unstable pages. Those don't go away
by waiting for it - as throttle_vm_writeout() does. So once you have an
excess of those, you're stuck as well.

In this patch I totally ignored unstable, but I'm not sure that's the
proper thing to do, I'd need to figure out what happens to an unstable
page when passed into pageout() - or if its passed to pageout at all.

If unstable pages would be passed to pageout(), and it would properly
convert them to writeback and clean them, then there is nothing wrong.

(Trond?)



Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists