lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri,  9 Sep 2016 10:59:35 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     LKML <linux-kernel@...r.kernel.org>, Linux-MM <linux-mm@...ck.org>,
        Mel Gorman <mgorman@...hsingularity.net>
Cc:     Dave Chinner <david@...morbit.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Ying Huang <ying.huang@...el.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: [PATCH 4/4] mm, vmscan: Potentially stall direct reclaimers on tree_lock contention

If a heavy writer of a single file is forcing contention on the tree_lock
then it may be necessary to tempoarily stall the direct writer to allow
kswapd to make progress. This patch marks a pgdat congested if tree_lock
is being contended on the tail of the LRU.

On a swap-intensive workload to ramdisk, the following is observed

usemem
                              4.8.0-rc5             4.8.0-rc5
                           waitqueue-v1      directcongest-v1
Amean    System-1      179.61 (  0.00%)      202.21 (-12.58%)
Amean    System-3       68.91 (  0.00%)      105.14 (-52.59%)
Amean    System-5       93.09 (  0.00%)       80.98 ( 13.01%)
Amean    System-7       90.98 (  0.00%)       81.07 ( 10.90%)
Amean    System-8      299.81 (  0.00%)      227.08 ( 24.26%)
Amean    Elapsd-1      210.41 (  0.00%)      236.56 (-12.43%)
Amean    Elapsd-3       33.89 (  0.00%)       46.78 (-38.06%)
Amean    Elapsd-5       25.19 (  0.00%)       23.33 (  7.38%)
Amean    Elapsd-7       18.45 (  0.00%)       17.18 (  6.91%)
Amean    Elapsd-8       48.80 (  0.00%)       38.09 ( 21.93%)

Note that system CPU usage is reduced for high thread counts but it
is not a universal win and it's known to be highly variable. The
overall time stats look like

           4.8.0-rc5   4.8.0-rc5
        waitqueue-v1 directcongest-v1
User          462.40      468.18
System       5127.32     4875.92
Elapsed      2364.08     2539.77

It takes longer to complete but uses less system CPU. The benefit
is more noticable with xfs_io rewriting a file backed by ramdisk

                                                        4.8.0-rc5             4.8.0-rc5
                                                  waitqueue-v1r24   directcongest-v1r24
Amean    pwrite-single-rewrite-async-System        3.23 (  0.00%)        3.21 (  0.80%)
Amean    pwrite-single-rewrite-async-Elapsd        3.33 (  0.00%)        3.31 (  0.67%)

           4.8.0-rc5   4.8.0-rc5
        waitqueue-v1 directcongest-v1
User            8.76        9.25
System        392.31      389.10
Elapsed       406.29      403.74

As with the previous patch, a test from Dave Chinner would be necessary
to decide whether this patch is worthwhile. It seems reasonable to favour
workloads that are heavily writing files than heavily swapping as the
former situation is normal and reasonable while the latter situation will
never be optimal.

Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
---
 mm/vmscan.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 936070b0790e..953df97abe0c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -771,6 +771,15 @@ static unsigned long remove_mapping_list(struct list_head *mapping_list,
 					/* Stall kswapd once for 10ms on contention */
 					if (cmpxchg(&kswapd_exclusive, NUMA_NO_NODE, pgdat->node_id) != NUMA_NO_NODE) {
 						DEFINE_WAIT(wait);
+
+						/*
+						 * Tag the pgdat as congested as it may
+						 * indicate contention with a heavy
+						 * writer that should stall on
+						 * wait_iff_congested.
+						 */
+						set_bit(PGDAT_CONGESTED, &pgdat->flags);
+
 						prepare_to_wait(&kswapd_contended_wait,
 							&wait, TASK_INTERRUPTIBLE);
 						io_schedule_timeout(HZ/100);
-- 
2.6.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ