[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1197354339.668.62.camel@localhost.localdomain>
Date: Tue, 11 Dec 2007 14:25:39 +0800
From: zhejiang <zhe.jiang@...el.com>
To: a.p.zijlstra@...llo.nl
Cc: linux-kernel@...r.kernel.org
Subject: Reducing the bdi proporion calculation period to speed up disk
write
The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
device dirty threshold. It works well.
However, the period for proportion calculation may be too large.
For 8G memory, the calc_period_shift() will return 19 as the shift.
When we switch writing operation between different disks, there may be
potential performance issue.
For example, we first write to disk A, then write to disk B.
The proportion for disk B will increase slowly because the denominator
is too large (It's 2^18 + (global_count & counter_mask)).
The disk B will get small dirty page quota for a long time,
it will get blocked frequently though the total dirty page is under the
dirty page limit.
Peter provided a patch to avoid this issue, this patch allow violation
of bdi limits if there is a lot of room on the system.
It looks like:
+if (nr_reclaimable + nr_writeback < (background_thresh +
dirty_thresh) / 2)
+ break;
This patch really help to avoid congestion, but if the dirty pages
exceed about 3/4 of the dirty_thresh, congestion still happens if we
write to another disk.
I think that we can reduce the period to speed up the proportion
adjustment.
diff -Nur a/page-writeback.c b/page-writeback.c
--- a/page-writeback.c 2007-12-11 13:46:30.000000000 +0800
+++ b/page-writeback.c 2007-12-11 13:47:11.000000000 +0800
@@ -128,10 +128,7 @@
*/
static int calc_period_shift(void)
{
- unsigned long dirty_total;
-
- dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
100;
- return 2 + ilog2(dirty_total - 1);
+ return 12;
}
In the 8G memory system, I did some testing with iozone.
I found that reducing the period help to increase the write speed
when switch to a new disk.
Run "./iozone -B -i 0 -i 2 -r 4k -s 1000M" twice in the disk B.
Here is the result:
1. With the patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
First Second
write 78M 173M
rewrite 112M 203M
randread 1710M 1697M
randwrite 192M 1412M
2. With Peter's patch
write 134M 169M
rewrite 134M 203M
randread 1717M 1705M
randwrite 179M 1412M
3.Adjust the shift to 12
write 260M 259M
rewrite 240M 246M
randread 1712M 1700M
randwrite 1409M 1409M
4.With Peter's patch and adjust the shift to 12
write 256M 239M
rewrite 253M 253M
randread 1704M 1716M
randwrite 1414M 1416M
Run "./iozone -B -i 0 -i 2 -r 4k -s 500M" twice in the disk B.
1. With the patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
First Second
write 821M 725M
rewrite 144M 1299M
randread 1740M 1733M
randwrite 1444M 1440M
2. With Peter's patch
write 1100M 1112M
rewrite 1295M 1313M
randread 1745M 1744M
randwrite 1452M 1449M
3.Adjust the shift to 12
write 1021M 1104M
rewrite 1314M 1311M
randread 1741M 1737M
randwrite 1448M 1445M
4.With Peter's patch and adjust the shift to 12
write 1104M 1105M
rewrite 1292M 1308M
randread 1737M 1741M
randwrite 1449M 1449M
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists