linux-kernel - Re: regression in page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090922080505.GB9192@localhost>
Date:	Tue, 22 Sep 2009 16:05:05 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	"Li, Shaohua" <shaohua.li@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"richard@....demon.co.uk" <richard@....demon.co.uk>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	Chris Mason <chris.mason@...cle.com>
Subject: Re: regression in page writeback

On Tue, Sep 22, 2009 at 02:40:12PM +0800, Peter Zijlstra wrote:
> On Tue, 2009-09-22 at 13:49 +0800, Shaohua Li wrote:
> > Hi,
> > Commit d7831a0bdf06b9f722b947bb0c205ff7d77cebd8 causes disk io regression
> > in my test.
> > My system has 12 disks, each disk has two partitions. System runs fio sequence
> > write on all partitions, each partion has 8 jobs.
> > 2.6.31-rc1, fio gives 460m/s disk io
> > 2.6.31-rc2, fio gives about 400m/s disk io. Revert the patch, speed back to
> > 460m/s
> > 
> > Under latest git: fio gives 450m/s disk io; If reverting the patch, the speed
> > is 484m/s.
> > 
> > With the patch, fio reports less io merge and more interrupts. My naive
> > analysis is the patch makes balance_dirty_pages_ratelimited_nr() limits
> > write chunk to 8 pages and then soon go to sleep in balance_dirty_pages(),
> > because most time the bdi_nr_reclaimable < bdi_thresh, and so when write
> > the pages out, the chunk is 8 pages long instead of 4M long. Without the patch,
> > thread can write 8 pages and then move some pages to writeback, and then
> > continue doing write. The patch seems to break this.
> > 
> > Unfortunatelly I can't figure out a fix for this issue, hopefully you have more
> > ideas.
> 
> This whole writeback business is very fragile,

Agreed, sorry..

> the patch does indeed cure a few cases and compounds a few other
> cases, typical trade off.
> 
> People are looking at it.

Staring at the changelog, I don't think balance_dirty_pages() could
"overshoot its limits and move all the dirty pages to writeback".
Because it will break when enough pages are written:

                if (pages_written >= write_chunk)
                        break;          /* We've done our duty */

The observed "overshooting" may well be the background_writeout()
behavior, which will hit the dirty numbers all the way down to 0.


    mm: prevent balance_dirty_pages() from doing too much work

    balance_dirty_pages can overreact and move all of the dirty pages to
    writeback unnecessarily.

    balance_dirty_pages makes its decision to throttle based on the number of
    dirty plus writeback pages that are over the calculated limit,so it will
    continue to move pages even when there are plenty of pages in writeback
    and less than the threshold still dirty.

    This allows it to overshoot its limits and move all the dirty pages to
    writeback while waiting for the drives to catch up and empty the writeback
    list.


I'm not sure how this patch stopped the "overshooting" behavior.
Maybe it managed to not start the background pdflush, or the started
pdflush thread exited because it found writeback is in progress by
someone else?

-               if (bdi_nr_reclaimable) {
+               if (bdi_nr_reclaimable > bdi_thresh) {

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/