lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 05 Oct 2007 10:22:00 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Miklos Szeredi <miklos@...redi.hu>, wfg@...l.ustc.edu.cn,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] remove throttle_vm_writeout()

On Thu, 2007-10-04 at 17:48 -0700, Andrew Morton wrote:
> On Fri, 05 Oct 2007 02:12:30 +0200 Miklos Szeredi <miklos@...redi.hu> wrote:
> 
> > > 
> > > I don't think I understand that.  Sure, it _shouldn't_ be a problem.  But it
> > > _is_.  That's what we're trying to fix, isn't it?
> > 
> > The problem, I believe is in the memory allocation code, not in fuse.
> 
> fuse is trying to do something which page reclaim was not designed for. 
> Stuff broke.
> 
> > In the example, memory allocation may be blocking indefinitely,
> > because we have 4MB under writeback, even though 28MB can still be
> > made available.  And that _should_ be fixable.
> 
> Well yes.  But we need to work out how, without re-breaking the thing which
> throttle_vm_writeout() fixed.

I'm thinking the really_congested thing will also fix this. By only
allowing a limited amount of extra writeback.

> > > > So the only thing the kernel should be careful about, is not to block
> > > > on an allocation if not strictly necessary.
> > > > 
> > > > Actually a trivial fix for this problem could be to just tweak the
> > > > thresholds, so to make the above scenario impossible.  Although I'm
> > > > still not convinced, this patch is perfect, because the dirty
> > > > threshold can actually change in time...
> > > > 
> > > > Index: linux/mm/page-writeback.c
> > > > ===================================================================
> > > > --- linux.orig/mm/page-writeback.c      2007-10-05 00:31:01.000000000 +0200
> > > > +++ linux/mm/page-writeback.c   2007-10-05 00:50:11.000000000 +0200
> > > > @@ -515,6 +515,12 @@ void throttle_vm_writeout(gfp_t gfp_mask
> > > >          for ( ; ; ) {
> > > >                 get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
> > > > 
> > > > +               /*
> > > > +                * Make sure the theshold is over the hard limit of
> > > > +                * dirty_thresh + ratelimit_pages * nr_cpus
> > > > +                */
> > > > +               dirty_thresh += ratelimit_pages * num_online_cpus();
> > > > +
> > > >                  /*
> > > >                   * Boost the allowable dirty threshold a bit for page
> > > >                   * allocators so they don't get DoS'ed by heavy writers
> > > 
> > > I can probably kind of guess what you're trying to do here.  But if
> > > ratelimit_pages * num_online_cpus() exceeds the size of the offending zone
> > > then things might go bad.
> > 
> > I think the admin can do quite a bit of other damage, by setting
> > dirty_ratio too high.
> > 
> > Maybe this writeback throttling should just have a fixed limit of 80%
> > ZONE_NORMAL, and limit dirty_ratio to something like 50%.
> 
> Bear in mind that the same problem will occur for the 16MB ZONE_DMA, and
> we cannot limit the system-wide dirty-memory threshold to 12MB.
> 
> iow, throttle_vm_writeout() needs to become zone-aware.  Then it only
> throttles when, say, 80% of ZONE_FOO is under writeback.

As it stand 110% of dirty limit can already be larger than say zone_dma
(and likely is), so that is not a new bug - and I don't think its the
thing Miklos runs into.

The problem Miklos is seeing (and I, just in a different form), is that
throttle_vm_writeout() gets stuck because balance_dirty_pages() gets
called once every ratelimit_pages (per cpu). So we can have nr_cpus *
ratelimit_pages extra.....

/me thinks

ok I confused myself.

by calling balance_dirty_pages() once every ratelimit_pages (per cpu)
allows for nr_cpus() * ratelimit_pages extra _dirty_ pages. But
balance_dirty_pages() will make it:
  nr_dirty + nr_unstable + nr_writeback < thresh

So even if it writes out all of the dirty pages, we still have:
  nr_unstable + nr_writeback < thresh

So at any one time nr_writeback should not exceed thresh. But it does!?

So how do we end up with more writeback pages than that? should we teach
pdflush about these limits as well?


Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists