lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1290538347-sup-7669@think>
Date:	Tue, 23 Nov 2010 13:58:24 -0500
From:	Chris Mason <chris.mason@...cle.com>
To:	Nick Piggin <npiggin@...nel.dk>
Cc:	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-btrfs <linux-btrfs@...r.kernel.org>,
	Jan Kara <jack@...e.cz>, Eric Sandeen <sandeen@...hat.com>,
	"Theodore Ts'o" <tytso@....edu>
Subject: Re: [patch] fs: fix deadlocks in writeback_if_idle

Excerpts from Nick Piggin's message of 2010-11-23 07:52:23 -0500:
> On Tue, Nov 23, 2010 at 07:34:07AM -0500, Chris Mason wrote:
> > Excerpts from Nick Piggin's message of 2010-11-23 05:02:39 -0500:
> > 
> > [ ... ]
> > 
> > > 
> > > Avoid both these issues by issuing completely asynchronous writeback request in
> > > writeback_inodes_sb_if_idle. Don't let that fool you into thinking these
> > > functions don't suck any more.
> > > 
> > > ext4 now passes extensive stress testing with xfstests, fs_mark, dbench,
> > > with a writeback_inodes_if_idle call added directly into ext4_da_write_begin
> > > to trigger the path frequently. Previously it would spew lockdep stuff and
> > > hang in a number of ways very quickly.
> > > 
> > > Signed-off-by: Nick Piggin <npiggin@...nel.dk>
> > > 
> > > ---
> > >  fs/fs-writeback.c |   32 ++++++++++++++++++++------------
> > >  1 file changed, 20 insertions(+), 12 deletions(-)
> > > 
> > > Index: linux-2.6/fs/fs-writeback.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/fs-writeback.c    2010-11-23 20:57:23.000000000 +1100
> > > +++ linux-2.6/fs/fs-writeback.c    2010-11-23 20:59:10.000000000 +1100
> > > @@ -1152,16 +1152,17 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> > >   *
> > >   * Invoke writeback_inodes_sb if no writeback is currently underway.
> > >   * Returns 1 if writeback was started, 0 if not.
> > > + *
> > > + * Even if 1 is returned, writeback may not be started if memory allocation
> > > + * fails. This function makes no guarantees about anything.
> > >   */
> > >  int writeback_inodes_sb_if_idle(struct super_block *sb)
> > >  {
> > >      if (!writeback_in_progress(sb->s_bdi)) {
> > > -        down_read(&sb->s_umount);
> > > -        writeback_inodes_sb(sb);
> > > -        up_read(&sb->s_umount);
> > > +        bdi_start_writeback(sb->s_bdi, get_nr_dirty_pages());
> > 
> > I'll put on my skis and channel Christoph for a minute.  This isn't
> > quite the same as the original.  writeback_inodes_sb() writes inodes on a
> > specific super block, while bdi_start_writeback() writes them for any SB
> > on that bdi.
> 
> Right.
> 
> > For btrfs there's only one bdi per SB, but for most everyone else a disk
> > with a bunch of partitions is going to have multiple filesystems on the
> > same bdi.
> > 
> > My original btrfs patch just exported the bdi_ funcs so that btrfs could
> > do the above internally.  But Christoph objected, and I think he's
> > right.  We should either give everyone a bdi or make sure the writeback
> > func kicks only one filesystem.
> 
> Well it's just kicking the writeback thread, and it will writeback
> from that particular sb.

Hmmm?  It will writeback for all the SBs on that bdi.  In the current
form that ext4 uses, that gets pretty expensive if you have a bunch of
large partitions and you're only running out of space on one of them.

For the _nr variant that btrfs uses, it's worse for the filesystems
that don't have a 1:1 bdi<->sb mapping.  It might not actually write any
of the pages from the SB that is out of space.

> It makes no further guarantees, and anyway
> the sb has to compete for normal writeback within this bdi.

> 
> I think Christoph is right because filesystems should not really
> know about how bdi writeback queueing works. But I don't know if it's
> worth doing anything more complex for this functionality?

I think we should make a writeback_inodes_sb_unlocked() that doesn't
warn when the semaphore isn't held.  That should be enough given where
btrfs and ext4 are calling it from.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ