lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 3 Mar 2012 21:55:58 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, Greg Thelen <gthelen@...gle.com>,
	Ying Han <yinghan@...gle.com>,
	"hannes@...xchg.org" <hannes@...xchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Minchan Kim <minchan.kim@...il.com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Adrian Hunter <ext-adrian.hunter@...ia.com>,
	Artem Bityutskiy <Artem.Bityutskiy@...ia.com>
Subject: Re: [PATCH 5/9] writeback: introduce the pageout work

On Fri, Mar 02, 2012 at 11:57:00AM -0800, Andrew Morton wrote:
> On Fri, 2 Mar 2012 18:39:51 +0800
> Fengguang Wu <fengguang.wu@...el.com> wrote:
> 
> > > And I agree it's unlikely but given enough time and people, I
> > > believe someone finds a way to (inadvertedly) trigger this.
> > 
> > Right. The pageout works could add lots more iput() to the flusher
> > and turn some hidden statistical impossible bugs into real ones.
> > 
> > Fortunately the "flusher deadlocks itself" case is easy to detect and
> > prevent as illustrated in another email.
> 
> It would be a heck of a lot safer and saner to avoid the iput().  We
> know how to do this, so why not do it?

My concern about the page lock is, it costs more code and sounds like
hacking around something. It seems we (including me) have been trying
to shun away from the iput() problem. Since it's unlikely we are to
get rid of the already existing iput() calls from the flusher context,
why not face the problem, sort it out and use it with confident in new
code?

Let me try it now. The only scheme iput() can deadlock the flusher is
for the iput() path to come back to queue some work and wait for it.
Here are the exhaust list of the queue+wait paths:

writeback_inodes_sb_nr_if_idle
  ext4_nonda_switch
    ext4_page_mkwrite                   # from page fault
    ext4_da_write_begin                 # from user writes

writeback_inodes_sb_nr
  quotactl syscall                      # from syscall
  __sync_filesystem                     # from sync/umount
  shrink_liability                      # ubifs
    make_free_space
      ubifs_budget_space                # from all over ubifs:

   2    274  /c/linux/fs/ubifs/dir.c <<ubifs_create>>
   3    531  /c/linux/fs/ubifs/dir.c <<ubifs_link>>
   4    586  /c/linux/fs/ubifs/dir.c <<ubifs_unlink>>
   5    675  /c/linux/fs/ubifs/dir.c <<ubifs_rmdir>>
   6    731  /c/linux/fs/ubifs/dir.c <<ubifs_mkdir>>
   7    803  /c/linux/fs/ubifs/dir.c <<ubifs_mknod>>
   8    871  /c/linux/fs/ubifs/dir.c <<ubifs_symlink>>
   9   1006  /c/linux/fs/ubifs/dir.c <<ubifs_rename>>
  10   1009  /c/linux/fs/ubifs/dir.c <<ubifs_rename>>
  11    246  /c/linux/fs/ubifs/file.c <<write_begin_slow>>
  12    388  /c/linux/fs/ubifs/file.c <<allocate_budget>>
  13   1125  /c/linux/fs/ubifs/file.c <<do_truncation>>   <===== deadlockable
  14   1217  /c/linux/fs/ubifs/file.c <<do_setattr>>
  15   1381  /c/linux/fs/ubifs/file.c <<update_mctime>>
  16   1486  /c/linux/fs/ubifs/file.c <<ubifs_vm_page_mkwrite>>
  17    110  /c/linux/fs/ubifs/ioctl.c <<setflags>>
  19    122  /c/linux/fs/ubifs/xattr.c <<create_xattr>>
  20    201  /c/linux/fs/ubifs/xattr.c <<change_xattr>>
  21    494  /c/linux/fs/ubifs/xattr.c <<remove_xattr>>

It seems they are all safe except for ubifs. ubifs may actually
deadlock from the above do_truncation() caller. However it should be
fixable because the ubifs call for writeback_inodes_sb_nr() sounds
very brute force writeback and wait and there may well be better way
out.

CCing ubifs developers for possible thoughts..

Thanks,
Fengguang

PS. I'll be on travel in the following week and won't have much time
for replying emails. Sorry about that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ