linux-kernel - Re: regression in page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1253670775.11656.3.camel@sli10-desk.sh.intel.com>
Date:	Wed, 23 Sep 2009 09:52:55 +0800
From:	Shaohua Li <shaohua.li@...el.com>
To:	"Wu, Fengguang" <fengguang.wu@...el.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"richard@....demon.co.uk" <richard@....demon.co.uk>,
	"a.p.zijlstra@...llo.nl" <a.p.zijlstra@...llo.nl>,
	"jens.axboe@...cle.com" <jens.axboe@...cle.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Chris Mason <chris.mason@...cle.com>
Subject: Re: regression in page writeback

On Tue, 2009-09-22 at 21:39 +0800, Wu, Fengguang wrote:
> On Tue, Sep 22, 2009 at 07:50:15PM +0800, Li, Shaohua wrote:
> > On Tue, Sep 22, 2009 at 06:49:15PM +0800, Wu, Fengguang wrote:
> > > Shaohua,
> > > 
> > > On Tue, Sep 22, 2009 at 01:49:13PM +0800, Li, Shaohua wrote:
> > > > Hi,
> > > > Commit d7831a0bdf06b9f722b947bb0c205ff7d77cebd8 causes disk io regression
> > > > in my test.
> > > > My system has 12 disks, each disk has two partitions. System runs fio sequence
> > > > write on all partitions, each partion has 8 jobs.
> > > > 2.6.31-rc1, fio gives 460m/s disk io
> > > > 2.6.31-rc2, fio gives about 400m/s disk io. Revert the patch, speed back to
> > > > 460m/s
> > > > 
> > > > Under latest git: fio gives 450m/s disk io; If reverting the patch, the speed
> > > > is 484m/s.
> > > > 
> > > > With the patch, fio reports less io merge and more interrupts. My naive
> > > > analysis is the patch makes balance_dirty_pages_ratelimited_nr() limits
> > > > write chunk to 8 pages and then soon go to sleep in balance_dirty_pages(),
> > > > because most time the bdi_nr_reclaimable < bdi_thresh, and so when write
> > > > the pages out, the chunk is 8 pages long instead of 4M long. Without the patch,
> > > > thread can write 8 pages and then move some pages to writeback, and then
> > > > continue doing write. The patch seems to break this.
> > > 
> > > Do you have trace/numbers for above descriptions?
> > No. Just guess, because there is less io merge. And watch each bdi's states,
> > bdi_nr_reclaimable < bdi_thresh seems always true.
> 
> Ah OK.
> 
> > > > Unfortunatelly I can't figure out a fix for this issue, hopefully
> > > > you have more ideas.
> > > 
> > > Attached is a very verbose writeback debug patch, hope it helps and
> > > won't disturb the workload a lot :)
> > Hmm, the log buf will get overflowed soon, there is > 400m/s io. I tried
> > to produce this issue in a system with two disks, but fail. Anyway, I'll try
> > it out tomorrow.
> 
> Thank you~  I'd recommend to use netconsole or serial line, and stop
> local klogd because the write of log messages could add noises. 
attached is a short log. I'll try to get a full log after finish latest
git test.
bdi_nr_reclaimable is always less than bdi_thresh in the log. because
when bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh, background
writeback is already started, so bdi_nr_writeback should be > 0.

View attachment "msg2" of type "text/plain" (97612 bytes)