linux-kernel - Re: [PATCH 6/6] mm: per device dirty threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 04 Apr 2007 14:05:56 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, neilb@...e.de, dgc@....com,
	tomoki.sekiyama.qu@...achi.com
Subject: Re: [PATCH 6/6] mm: per device dirty threshold

On Wed, 2007-04-04 at 13:12 +0200, Miklos Szeredi wrote:
> > > > so it could be that: scale / cycle > 1
> > > > by a very small amount; however:
> > > 
> > > No, I'm worried about the case when scale is too small.  If the
> > > per-bdi threshold becomes smaller than stat_threshold, then things
> > > won't work, because dirty+writeback will never go below the threshold,
> > > possibly resulting in the deadlock we are trying to avoid.
> > 
> > /me goes refresh the deadlock details..
> > 
> > A writes to B; A exceeds the dirty limit but writeout is blocked by B
> > because the dirty limit is exceeded, right?
> > 
> > This cannot happen when we decouple the BDI dirty thresholds, even when
> > a threshold is 0.
> > 
> > A write to B; A exceeds A's limit and writes to B, B has limit of 0, the
> > 1 dirty page gets written out (we gain ratio) and life goes on.
> > 
> > Right?
> 
> If the limit is zero, then we need the per-bdi dirty+write to go to
> zero, otherwise balance_dirty_pages() loops.  But the per-bdi
> writeback counter is not necessarily updated after the writeback,
> because the per-bdi per-CPU counter may not trip the update of the
> per-bdi counter.

Aaah, Doh, yeah, that makes sense. I must be dense.

Funny that that never triggered, I do run SMP boxen. Hmm, what to do?

Preferably you'd want to be able to 'flush' the per cpu diffs or
something like that in cases where thresh ~< NR_CPUS * stat_diff.

How about something like this:

---
 include/linux/backing-dev.h |    5 ++++
 mm/backing-dev.c            |   51 ++++++++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c         |    4 +++
 3 files changed, 60 insertions(+)

Index: linux-2.6/include/linux/backing-dev.h
===================================================================
--- linux-2.6.orig/include/linux/backing-dev.h
+++ linux-2.6/include/linux/backing-dev.h
@@ -117,6 +117,8 @@ void mod_bdi_stat(struct backing_dev_inf
 void inc_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item);
 void dec_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item);
 
+void bdi_flush_stat(struct backing_dev_info *bdi, enum bdi_stat_item item);
+void bdi_flush_all(struct backing_dev_info *bdi, enum bdi_stat_item item);
 #else /* CONFIG_SMP */
 
 static inline void __mod_bdi_stat(struct backing_dev_info *bdi,
@@ -142,6 +144,9 @@ static inline void __dec_bdi_stat(struct
 #define mod_bdi_stat __mod_bdi_stat
 #define inc_bdi_stat __inc_bdi_stat
 #define dec_bdi_stat __dec_bdi_stat
+
+#define bdi_flush_stat(bdi, item) do { } while (0)
+#define bdi_flush_all(bdi) do { } while (0)
 #endif
 
 void bdi_stat_init(struct backing_dev_info *bdi);
Index: linux-2.6/mm/backing-dev.c
===================================================================
--- linux-2.6.orig/mm/backing-dev.c
+++ linux-2.6/mm/backing-dev.c
@@ -188,4 +188,55 @@ void dec_bdi_stat(struct backing_dev_inf
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(dec_bdi_stat);
+
+void ___bdi_flush_stat(struct backing_dev_info *bdi, enum bdi_stat_item item)
+{
+	struct bdi_per_cpu_data *pcd = &bdi->pcd[smp_processor_id()];
+	s8 *p = pcd->bdi_stat_diff + item;
+
+	bdi_stat_add(*p, bdi, item);
+	*p = 0;
+}
+
+struct bdi_flush_struct {
+	struct backing_dev_info *bdi;
+	enum bdi_stat_item item;
+};
+
+void __bdi_flush_stat(struct bdi_flush_struct *flush)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	___bdi_flush_stat(flush->bdi, flush->item);
+	local_irq_restore(flags);
+}
+
+void __bdi_flush_all(struct backing_dev_info *bdi)
+{
+	unsigned long flags;
+	int i;
+
+	local_irq_save(flags);
+	for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
+		___bdi_flush_stat(bdi, i);
+	local_irq_restore(flags);
+}
+
+void bdi_flush_stat(struct backing_dev_info *bdi, enum bdi_stat_item item)
+{
+	struct bdi_flush_struct flush = {
+		bdi,
+		item
+	};
+
+	on_each_cpu(__bdi_flush_stat, &flush, 0, 1);
+}
+EXPORT_SYMBOL(bdi_flush_stat);
+
+void bdi_flush_all(struct backing_dev_info *bdi)
+{
+	on_each_cpu(__bdi_flush_all, bdi, 0, 1);
+}
+EXPORT_SYMBOL(bdi_flush_all);
 #endif
Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c
+++ linux-2.6/mm/page-writeback.c
@@ -345,6 +345,10 @@ static void balance_dirty_pages(struct a
 
 			get_dirty_limits(&background_thresh, &dirty_thresh,
 				       &bdi_thresh, bdi);
+
+			if (bdi_thresh < NR_CPUS * 8 * ilog2(NR_CPUS))
+				bdi_flush_all(bdi);
+
 			bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY) +
 						bdi_stat(bdi, BDI_UNSTABLE);
 			if (bdi_nr_reclaimable + bdi_stat(bdi, BDI_WRITEBACK) <=





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/