lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160512153234.GS4775@htj.duckdns.org>
Date:	Thu, 12 May 2016 11:32:34 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Miao Xie <miaoxie@...wei.com>
Cc:	Fengguang Wu <fengguang.wu@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write
 thoughput caused by writeback cgroup and dirty thottle

Hello,

On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote:
> >My box has 48 cores and 188GB memory, but I set
> >vm.dirty_background_bytes = 268435456
> >vm.dirty_bytes = 536870912
> >
> >if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB,
> >vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original
> >value(the above ones), the thoughout would be down to 500MB/s.
> >
> >And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when
> >the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think.

Heh, so, for cgroups, the absolute byte limits can't applied directly
and converted to percentage value before being applied.  You're
specifying 0.27% for threshold.  Unfortunately, the ratio is
translated into a percentage number and 0.27% becomes 0, so your
cgroups are always over limit and being throttled.

Can you please see whether the following patch fixes the issue?

Thanks.

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 999792d..a455a21 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
 	struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc);
 	unsigned long bytes = vm_dirty_bytes;
 	unsigned long bg_bytes = dirty_background_bytes;
-	unsigned long ratio = vm_dirty_ratio;
-	unsigned long bg_ratio = dirty_background_ratio;
+	/* convert ratios to per-PAGE_SIZE for higher precision */
+	unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100;
+	unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100;
 	unsigned long thresh;
 	unsigned long bg_thresh;
 	struct task_struct *tsk;
@@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
 		/*
 		 * The byte settings can't be applied directly to memcg
 		 * domains.  Convert them to ratios by scaling against
-		 * globally available memory.
+		 * globally available memory.  As the ratios are in
+		 * per-PAGE_SIZE, they can be obtained by dividing bytes by
+		 * pages.
 		 */
 		if (bytes)
-			ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 /
-				    global_avail, 100UL);
+			ratio = min(DIV_ROUND_UP(bytes, global_avail),
+				    PAGE_SIZE);
 		if (bg_bytes)
-			bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 /
-				       global_avail, 100UL);
+			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
+				       PAGE_SIZE);
 		bytes = bg_bytes = 0;
 	}
 
 	if (bytes)
 		thresh = DIV_ROUND_UP(bytes, PAGE_SIZE);
 	else
-		thresh = (ratio * available_memory) / 100;
+		thresh = (ratio * available_memory) / PAGE_SIZE;
 
 	if (bg_bytes)
 		bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
 	else
-		bg_thresh = (bg_ratio * available_memory) / 100;
+		bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
 
 	if (bg_thresh >= thresh)
 		bg_thresh = thresh / 2;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ