lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Nov 2010 15:12:33 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
	"Li, Shaohua" <shaohua.li@...el.com>,
	Christoph Hellwig <hch@....de>,
	Dave Chinner <david@...morbit.com>,
	Theodore Ts'o <tytso@....edu>,
	Chris Mason <chris.mason@...cle.com>,
	Mel Gorman <mel@....ul.ie>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-mm <linux-mm@...ck.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 06/13] writeback: bdi write bandwidth estimation

On Wed, 2010-11-24 at 21:46 +0800, Wu Fengguang wrote:
> On Wed, Nov 24, 2010 at 09:42:09PM +0800, Peter Zijlstra wrote:
> > On Wed, 2010-11-24 at 21:20 +0800, Wu Fengguang wrote:
> > > >         (jiffies - bdi->write_bandwidth_update_time < elapsed)
> > > 
> > > this will be true if someone else has _done_ overlapped estimation,
> > > otherwise it will equal:
> > > 
> > >         jiffies - bdi->write_bandwidth_update_time == elapsed
> > > 
> > > Sorry the comment needs updating. 
> > 
> > Right, but its racy as hell..
> 
> Yeah, for N concurrent dirtiers, plus the background flusher, only
> one is able to update write_bandwidth[_update_time]..

Wrong, nr_cpus are, they could all observe the old value before seeing
the update of the variable.

Why not something like the below, which keeps the stamps per bdi and
serializes on a lock (trylock, you only need a single updater at any one
time anyway):

probably got the math wrong, but the idea should be clear, you can even
add an explicit bdi_update_bandwidth_stamps() function which resets the
stamps to the current situation in order to skip periods of low
throughput (that would need to do spin_lock).

---
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 4ce34fa..de690c3 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -40,6 +40,7 @@ typedef int (congested_fn)(void *, int);
 enum bdi_stat_item {
 	BDI_RECLAIMABLE,
 	BDI_WRITEBACK,
+	BDI_WRITTEN,
 	NR_BDI_STAT_ITEMS
 };
 
@@ -88,6 +89,11 @@ struct backing_dev_info {
 
 	struct timer_list laptop_mode_wb_timer;
 
+	spinlock_t bw_lock;
+	unsigned long bw_time_stamp;
+	unsigned long bw_write_stamp;
+	int write_bandwidth;
+
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 	struct dentry *debug_stats;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 027100d..a934fe9 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -661,6 +661,11 @@ int bdi_init(struct backing_dev_info *bdi)
 	bdi->dirty_exceeded = 0;
 	err = prop_local_init_percpu(&bdi->completions);
 
+	spin_lock_init(&bdi->bw_lock);
+	bdi->bw_time_stamp = jiffies;
+	bdi->bw_write_stamp = 0;
+	bdi->write_bandwidth = 100 << (20 - PAGE_SHIFT); /* 100 MB/s */
+
 	if (err) {
 err:
 		while (i--)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b840afa..f3f5c24 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -219,6 +219,7 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
  */
 static inline void __bdi_writeout_inc(struct backing_dev_info *bdi)
 {
+	__inc_bdi_state(bdi, BDI_WRITTEN);
 	__prop_inc_percpu_max(&vm_completions, &bdi->completions,
 			      bdi->max_prop_frac);
 }
@@ -238,6 +239,35 @@ void task_dirty_inc(struct task_struct *tsk)
 	prop_inc_single(&vm_dirties, &tsk->dirties);
 }
 
+void bdi_update_write_bandwidth(struct backing_dev_info *bdi)
+{
+	unsigned long time_now, write_now;
+	long time_delta, write_delta;
+	long bw;
+
+	if (!spin_try_lock(&bdi->bw_lock))
+		return;
+
+	write_now = bdi_stat(bdi, BDI_WRITTEN);
+	time_now = jiffies;
+
+	write_delta = write_now - bdi->bw_write_stamp;
+	time_delta = time_now - bdi->bw_time_stamp;
+
+	/* rate-limit, only update once every 100ms */
+	if (time_delta < HZ/10 || !write_delta)
+		goto unlock;
+
+	bdi->bw_write_stamp = write_now;
+	bdi->bw_time_stamp = time_now;
+
+	bw = write_delta * HZ / time_delta;
+	bdi->write_bandwidth = (bdi->write_bandwidth + bw + 3) / 4;
+
+unlock:
+	spin_unlock(&bdi->bw_lock);
+}
+
 /*
  * Obtain an accurate fraction of the BDI's portion.
  */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ