linux-kernel - Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070405173125.c1aed842.akpm@linux-foundation.org>
Date:	Thu, 5 Apr 2007 17:31:25 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com>
Cc:	linux-kernel@...r.kernel.org, Bill Davidsen <davidsen@....com>,
	yumiko.sugita.yf@...achi.com, masami.hiramatsu.pt@...achi.com,
	hidehiro.kawai.ez@...achi.com, yuji.kakutani.uw@...achi.com,
	soshima@...hat.com, haoki@...hat.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 1/2] VM throttling: Start writeback at
 dirty_writeback_start_ratio

On Tue, 03 Apr 2007 19:46:04 +0900
Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com> wrote:

> This patchset is to avoid the problem that write(2) can be blocked for a
> long time if a system has several disks with different speed and is
> under heavy I/O pressure.
> 
> -Description of the problem:
> While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory,
> generators of dirty pages are blocked in balance_dirty_pages() until
> they start writeback of a specific number (`write_chunk', typically=1536)
> of dirty pages on the disks they write to.
> 
> Under this rule, if a process writes to the disk which has only a few
> (less than 1536) dirty pages, that process will be blocked until
> writeback of the other disks is completed and % of Dirty+Writeback goes
> below 40%.
> 
> Thus, if a slow device (such as a USB disk) has many dirty pages, the
> processes which write small data to the other disks can be blocked for
> quite a long time.
> 
> -Solution:
> This patch introduces high/low-watermark algorithm in
> balance_dirty_pages() in order to throttle only the processes which
> write to disks with heavy load.
> 
> This patch adds `dirty_start_writeback_ratio' for the low-watermark,
> and modifies get_dirty_limits() to calculate and return the writeback
> starting level of dirty pages based on `dirty_start_writeback_ratio'.
> 
> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
> dirty pages start writeback of dirty pages by themselves. At that time,
> these processes are not blocked in balance_dirty_pages(), but they may
> be blocked if the write-requests-queue of the written disk is full
> (that is, the length of the queue > `nr_requests'). By this behavior,
> we can throttle only processes which write to the disks with heavy load,
> and can allow processes to write to the other disks without blocking.
> 
> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
> are throttled as current Linux does, not to fill up memory with dirty
> pages.

Does this actually solve the problem?  If the request queue is sufficiently
large (relative to the various dirty-memory thresholds) then I'd expect
that a heavy-writer will be able to very quickly take the total
dirty+writeback memory up to the dirty_ratio (should be renamed
throttle_threshold, but it's too late for that).

I suspect the reason why this patch was successful in your testing was
because dirty_start_writeback_ratio happens to exceed the size of the disk
request queues, so the heavy writer is getting stuck on disk request queue
exhaustion.

But that won't work if we have a lot of processes writing to a lot of
disks, and it won't work if the request queue size is large, or if the
dirty-memory thresholds are small (relative to the request queue size).

Do the patches still work after
`echo 10000 > /sys/block/sda/queue/nr_requests'?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/