linux-kernel - [RFC][PATCH 0/3] VM throttling: avoid blocking occasional writers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <45DED819.9040404@hitachi.com>
Date:	Fri, 23 Feb 2007 21:03:37 +0900
From:	Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com>
To:	linux-kernel@...r.kernel.org
Cc:	akpm@...ux-foundation.org, miklos@...redi.hu,
	yumiko.sugita.yf@...achi.com, masami.hiramatsu.pt@...achi.com,
	hidehiro.kawai.ez@...achi.com, yuji.kakutani.uw@...achi.com,
	soshima@...hat.com, haoki@...hat.com
Subject: [RFC][PATCH 0/3] VM throttling: avoid blocking occasional writers

Hi,

I have observed a problem that write(2) can be blocked for a long time
if a system has several disks and is under heavy I/O pressure. This
patchset is to avoid the problem.

Example of the probrem:

There are two processes on a system which has two disks. Process-A
writes heavily to disk-a, and process-B writes small data (e.g. log
files) to disk-b occasionally. A portion of system memory, which is
depends on vm.dirty_ratio (typically 40%), is filled up with Dirty
and Writeback pages of disk-a.

In this situation, write(2) of process-B could be blocked for a very
long time (more then 60 seconds), although the load of disk-b is quite
low. In particular, the system would become quite slow, if disk-a is
slow (e.g. backup to an USB disk).

This seems to be the same problem as discussed in LKML:
http://marc.theaimsgroup.com/?t=115559902900003
and
http://marc.theaimsgroup.com/?t=117182340400003

Root cause:

I found this problem is caused by the balance_dirty_pages().

While Dirty+Writeback pages get more than 40% of memory, process-B is
blocked in balance_dirty_pages() until writeback of some (`write_chunk',
typically = 1536) dirty pages on disk-b is started.

However, because disk-b has only a few dirty pages, the process-B will
be blocked until writeback to disk-a is completed and Dirty+Writeback
goes below 40%.

Solution:

I consider that all of the dirty pages for the disk have been written
back and that the disk is clean if a process cannot write 'write_chunk'
pages in balance_dirty_pages().

To avoid using up the free memory with dirty pages by passing blocking,
this patchset adds a new threshold named vm.dirty_limit_ratio to sysctl.

It modifies balance_dirty_pages() not to block when the amount of
Dirty+Writeback is less than vm.dirty_limit_ratio percent of the memory.
In the other cases, writers are throttled as current Linux does.

In this patchset, vm.dirty_limit_ratio, instead of vm.dirty_ratio, is
used as the clamping level of Dirty+Writeback. And, vm.dirty_ratio is
used as the level at which a writers will itself start writeback of the
dirty pages.

Testing Results:

In the situation explained in "Example of the problem" section, I
measured time of write(2)ing to disk-b.
The write was completed by 30ms or less under the kernel with this
patchset.

When nr_requests is set too high (e.g. 8192), Dirty+Writeback grows near
vm.dirty_limit_ratio(45% of system memory by defaults). In that case,
write(2) sometimes took about 1 second.

This patchset can be applied to 2.6.20-mm2.
It consists of 3 pieces:

1/3 - add a sysctl variable `vm.dirty_limit_ratio'
2/3 - modify get_dirty_limits() to return the limit of dirty pages.
3/3 - break out of balance_dirty_pages() loop if the disk doesn't have
      remaining dirty pages, if Dirty+Writeback < vm.dirty_limit_ratio.

-- 
Tomoki Sekiyama
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: tomoki.sekiyama.qu@...achi.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/