linux-kernel - Re: Bug in kernel 2.6.31, Slow wb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090801045345.GA16011@localhost>
Date:	Sat, 1 Aug 2009 12:53:46 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Martin Bligh <mbligh@...gle.com>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	Chad Talbott <ctalbott@...gle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Michael Rubin <mrubin@...gle.com>, sandeen@...hat.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout

On Sat, Aug 01, 2009 at 12:03:13PM +0800, Wu Fengguang wrote:
> On Thu, Jul 30, 2009 at 03:48:02PM -0700, Martin Bligh wrote:
> > On Thu, Jul 30, 2009 at 3:43 PM, Jens Axboe<jens.axboe@...cle.com> wrote:
> > > On Thu, Jul 30 2009, Martin Bligh wrote:
> > >> > The test case above on a 4G machine is only generating 1G of dirty data.
> > >> > I ran the same test case on the 16G, resulting in only background
> > >> > writeout. The relevant bit here being that the background writeout
> > >> > finished quickly, writing at disk speed.
> > >> >
> > >> > I re-ran the same test, but using 300 100MB files instead. While the
> > >> > dd's are running, we are going at ~80MB/sec (this is disk speed, it's an
> > >> > x25-m). When the dd's are done, it continues doing 80MB/sec for 10
> > >> > seconds or so. Then the remainder (about 2G) is written in bursts at
> > >> > disk speeds, but with some time in between.
> > >>
> > >> OK, I think the test case is sensitive to how many files you have - if
> > >> we punt them to the back of the list, and yet we still have 299 other
> > >> ones, it may well be able to keep the disk spinning despite the bug
> > >> I outlined.Try using 30 1GB files?
> > >
> > > If this disk starts spinning, then we have bigger bugs :-)
> > >>
> > >> Though it doesn't seem to happen with just one dd streamer, and
> > >> I don't see why the bug doesn't trigger in that case either.
> > >>
> > >> I believe the bugfix is correct independent of any bdi changes?
> > >
> > > Yeah I think so too, I'll run some more tests on this tomorrow and
> > > verify it there as well.
> > 
> > There's another issue I was discussing with Peter Z. earlier that the
> > bdi changes might help with - if you look at where the dirty pages
> > get to, they are capped hard at the average of the dirty and
> > background thresholds, meaning we can only dirty about half the
> > pages we should be able to. That does very slowly go away when
> > the bdi limit catches up, but it seems to start at 0, and it's progess
> > seems glacially slow (at least if you're impatient ;-))
> 
> You mean the dirty limit will start from
> (dirty_ratio+background_ratio)/2 = 15% to (dirty_ratio) = 20%,
> and grow in a very slow pace? I did observed such curves long ago,
> but it does not always show up, as in the below mini bench.
> 
> > This seems to affect some of our workloads badly when they have
> > a sharp spike in dirty data to one device, they get throttled heavily
> > when they wouldn't have before the per-bdi dirty limits.
> 
> Here is a single dd on my laptop with 4G memory, kernel 2.6.30.
> 
>         root /home/wfg# echo 10 > /proc/sys/vm/dirty_ratio                 
>         root /home/wfg# echo 20 > /proc/sys/vm/dirty_background_ratio 
> 
>         wfg ~% dd if=/dev/zero of=/opt/vm/10G bs=1M count=1000  
>         1000+0 records in
>         1000+0 records out
>         1048576000 bytes (1.0 GB) copied, 12.7143 s, 82.5 MB/s
> 
> output of vmmon:
> 
>          nr_dirty     nr_writeback
>                 0                0
>                 0                0
>             56795                0
>             51655            17020
>             52071            17511
>             51648            16898
>             51655            16485
>             52369            17425
>             51648            16930
>             51470            16809
>             52630            17267
>             51287            16634
>             51260            16641
>             51310            16903
>             51281            16379
>             46073            11169
>             46086                0
>             46089                0
>              3132             9657
>                21            17677
>                 3            14107
>                14                2
>                 0                0
>                 0                0
> 
> In this case nr_dirty stays almost constant.

I can see the growth when I increased the dd size to 2GB,
and the dd throughput decreased from 82.5MB/s to 60.9MB/s.

        wfg ~% dd if=/dev/zero of=/opt/vm/10G bs=1M count=2000
        2000+0 records in
        2000+0 records out
        2097152000 bytes (2.1 GB) copied, 34.4114 s, 60.9 MB/s

         nr_dirty     nr_writeback
                0                0
            44980                0
            49929            20353
            49929            20353
            49189            17822
            54556            14852
            49191            17717
            52455            15501
            49903            19330
            50077            17293
            50040            19111
            52097             7040
            52656            16797
            53361            19455
            53551            16999
            57599            16396
            55165             6801
            57626            16534
            56193            18795
            57888            16655
            57740            18818
            65759            11304
            60015            19842
            61136            16618
            62166            17429
            62160            16782
            62036            11907
            59237            13715
            61991            18561
            66256            15111
            60574            17551
            17926            17930
            17919            17057
            17919            16379
               11            13717
            11470             4606
                2              913
                2                0
               10                0
               10                0
                0                0
                0                0

But when I redid the above test after dropping all the ~3GB caches,
the dirty limit again seem to remain constant.

        # echo 1 > /proc/sys/vm/drop_caches

        wfg ~% dd if=/dev/zero of=/opt/vm/10G bs=1M count=2000
        2000+0 records in
        2000+0 records out
        2097152000 bytes (2.1 GB) copied, 33.3299 s, 62.9 MB/s

         nr_dirty     nr_writeback
                0                0
            76425            10825
            66255            17302
            69942            15865
            65332            17305
            71207            14605
            69957            15380
            65901            18960
            66365            16233
            66040            17041
            66042            16378
            66434             2169
            67606            17143
            68660            17195
            67613            16514
            67366            17415
            65784             4620
            69053            16831
            66037            17033
            64601            19936
            64629            16922
            70459             9227
            66673            17789
            65638            20102
            65166            17662
            66255            16286
            69821            11264
            82247             4113
            64012            18060
            29585            17920
             5872            16653
             5872            14197
            25422             1913
             5884            16658
                0            12027
                2               26
                2                0
                2                0

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/