linux-kernel - Re: Bug in kernel 2.6.31, Slow wb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090730010630.GA7326@localhost>
Date:	Thu, 30 Jul 2009 09:06:30 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Martin Bligh <mbligh@...gle.com>
Cc:	Chad Talbott <ctalbott@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Michael Rubin <mrubin@...gle.com>,
	Andrew Morton <akpm@...gle.com>,
	"sandeen@...hat.com" <sandeen@...hat.com>,
	Michael Davidson <md@...gle.com>
Subject: Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout

On Wed, Jul 29, 2009 at 10:11:10PM +0800, Martin Bligh wrote:
> > --- mm.orig/fs/fs-writeback.c
> > +++ mm/fs/fs-writeback.c
> > @@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode,
> >                                 * soon as the queue becomes uncongested.
> >                                 */
> >                                inode->i_state |= I_DIRTY_PAGES;
> > -                               if (wbc->nr_to_write <= 0) {
> > +                               if (wbc->nr_to_write <= 0 ||
> > +                                   wbc->encountered_congestion) {
> >                                        /*
> >                                         * slice used up: queue for next turn
> >                                         */
> >
> 
> That's not sufficient - it only the problem in the wb_kupdate path. If you want
> to be more conservative, how about we do this?

I agree on the unification of kupdate and sync paths. In fact I had a
patch for doing this. And I'd recommend to do it in two patches:
one to fix the congestion case, another to do the code unification.

The sync path don't care whether requeue_io() or redirty_tail() is
used, because they disregard the time stamps totally - only order of
inodes matters (ie. starvation), which is same for requeue_io()/redirty_tail().

Thanks,
Fengguang

> --- linux-2.6.30/fs/fs-writeback.c.old  2009-07-29 00:08:29.000000000 -0700
> +++ linux-2.6.30/fs/fs-writeback.c      2009-07-29 07:08:48.000000000 -0700
> @@ -323,43 +323,14 @@ __sync_single_inode(struct inode *inode,
>                          * We didn't write back all the pages.  nfs_writepages(
> )
>                          * sometimes bales out without doing anything. Redirty
>                          * the inode; Move it from s_io onto s_more_io/s_dirty.
> +                        * It may well have just encountered congestion
>                          */
> -                       /*
> -                        * akpm: if the caller was the kupdate function we put
> -                        * this inode at the head of s_dirty so it gets first
> -                        * consideration.  Otherwise, move it to the tail, for
> -                        * the reasons described there.  I'm not really sure
> -                        * how much sense this makes.  Presumably I had a good
> -                        * reasons for doing it this way, and I'd rather not
> -                        * muck with it at present.
> -                        */
> -                       if (wbc->for_kupdate) {
> -                               /*
> -                                * For the kupdate function we move the inode
> -                                * to s_more_io so it will get more writeout as
> -                                * soon as the queue becomes uncongested.
> -                                */
> -                               inode->i_state |= I_DIRTY_PAGES;
> -                               if (wbc->nr_to_write <= 0) {
> -                                       /*
> -                                        * slice used up: queue for next turn
> -                                        */
> -                                       requeue_io(inode);
> -                               } else {
> -                                       /*
> -                                        * somehow blocked: retry later
> -                                        */
> -                                       redirty_tail(inode);
> -                               }
> -                       } else {
> -                               /*
> -                                * Otherwise fully redirty the inode so that
> -                                * other inodes on this superblock will get som
> e
> -                                * writeout.  Otherwise heavy writing to one
> -                                * file would indefinitely suspend writeout of
> -                                * all the other files.
> -                                */
> -                               inode->i_state |= I_DIRTY_PAGES;
> +                       inode->i_state |= I_DIRTY_PAGES;
> +                       if (wbc->nr_to_write <= 0 ||     /* sliced used up */
> +                            wbc->encountered_congestion)
> +                               requeue_io(inode);
> +                       else {
> +                               /* somehow blocked: retry later */
>                                 redirty_tail(inode);
>                         }
>                 } else if (inode->i_state & I_DIRTY) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/