linux-kernel - Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20070409204654.0e6b9920.akpm@linux-foundation.org>
Date:	Mon, 9 Apr 2007 20:46:54 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com>
Cc:	linux-kernel@...r.kernel.org, Bill Davidsen <davidsen@....com>,
	yumiko.sugita.yf@...achi.com, masami.hiramatsu.pt@...achi.com,
	hidehiro.kawai.ez@...achi.com, yuji.kakutani.uw@...achi.com,
	soshima@...hat.com, haoki@...hat.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 1/2] VM throttling: Start writeback at
 dirty_writeback_start_ratio

On Tue, 10 Apr 2007 12:04:54 +0900 Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com> wrote:

> Hello Andrew,
> Thank you for your comments.
> 
> Andrew Morton wrote:
> > On Tue, 03 Apr 2007 19:46:04 +0900
> > Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com> wrote:
> >> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
> >> dirty pages start writeback of dirty pages by themselves. At that time,
> >> these processes are not blocked in balance_dirty_pages(), but they may
> >> be blocked if the write-requests-queue of the written disk is full
> >> (that is, the length of the queue > `nr_requests'). By this behavior,
> >> we can throttle only processes which write to the disks with heavy load,
> >> and can allow processes to write to the other disks without blocking.
> >>
> >> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
> >> are throttled as current Linux does, not to fill up memory with dirty
> >> pages.
> > 
> > Does this actually solve the problem?  If the request queue is sufficiently
> > large (relative to the various dirty-memory thresholds) then I'd expect
> > that a heavy-writer will be able to very quickly take the total
> > dirty+writeback memory up to the dirty_ratio (should be renamed
> > throttle_threshold, but it's too late for that).
> > 
> > I suspect the reason why this patch was successful in your testing was
> > because dirty_start_writeback_ratio happens to exceed the size of the disk
> > request queues, so the heavy writer is getting stuck on disk request queue
> > exhaustion.
> > 
> > But that won't work if we have a lot of processes writing to a lot of
> > disks, and it won't work if the request queue size is large, or if the
> > dirty-memory thresholds are small (relative to the request queue size).
> > 
> > Do the patches still work after
> > `echo 10000 > /sys/block/sda/queue/nr_requests'?
> 
> As you pointed out, this patch has no effect if nr_requests is too large,
> because it distinguishes heavy disks depending on the length of the write-
> requests queue of each disk.
> 
> This patch is for providing the system administrators with room to avoid
> the problem by adjusting parameters appropriately, rather than an automatic
> solution for any possible situations.
> 
> Could you please tell me some situations in which we should set nr_request
> that large?

It's probably not a sensible thing to do.  But it's _possible_ to do, and
the fact that the kernel will again misbehave indicates an overall weakness
in our design.

And there are other ways in which this situation could occur:

- The request queue has a fixed size (it is not scaled according to the
  amount of memory in the machine).  So if the machine is small enough
  (say, 64MB) then the problem can happen.

- The machine could have a large number of disks

- The queue size of 128 is in units of "number of requests".  But it is
  independent upon the _size_ of those requests.  If someone comes up with
  a driver which wants to use 16MB-sized requests, the problem will again
  reoccur.

For all these sorts of reasons, we have learned that we should avoid any
dependence upon request queue exhaustion within the VM/VFS/etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/