linux-kernel - Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1263985021.2528.34.camel@localhost>
Date:	Wed, 20 Jan 2010 10:57:01 +0000
From:	Steven Whitehouse <swhiteho@...hat.com>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Dave Chinner <david@...morbit.com>,
	linux-kernel@...r.kernel.org, mingo@...hat.com,
	Nick Piggin <nickpiggin@...oo.com.au>,
	linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk
Subject: Re: lockdep: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R}
 usage.

Hi,

On Tue, 2010-01-19 at 13:46 -0500, Christoph Hellwig wrote:
> On Fri, Jan 15, 2010 at 01:53:15PM +0100, Peter Zijlstra wrote:
> > Well, I don't know enough about xfs (of filesystems in generic) to say
> > that with any certainty, but I can imagine inode writeback from the sync
> > that goes with umount to cause issues.
> > 
> > If this inode reclaim is past all that and the filesystem is basically
> > RO, then I don't think so and this could be considered a false positive,
> > in which case we need an annotation for this.
> 
> The issue is a bit more complicated.  In the unmount case
> invalidate_inodes() is indeed called after the filesystem is effectively
> read-only for user origination operations.  But there's a miriad of
> other invalidate_inodes() calls:
> 
>  - fs/block_dev.c:__invalidate_device()
> 
> 	This gets called from block device codes for various kinds of
> 	invalidations.  Doesn't make any sense at all to me, but hey..
> 
>  - fs/ext2/super.c:ext2_remount()
> 
> 	Appears like it's used to check for activate inodes during
> 	remount.  Very fishy usage, and could just be replaced with
> 	a list walk without any I/O
> 
>  - fs/gfs2/glock.c:gfs2_gl_hash_clear()
> 
> 	No idea.
> 
Its rather complicated and all down to using "special" inodes to cache
metadata so that GFS2 has two VFS inodes per "real" inode, one as per
normal and one just to cache metadata.

This causes a circular dependency between glocks and inodes since we
have something like this (in my best ascii art):

gfs2 inode -> iopen glock
           -> inode glock -> metadata inode

So at umount time, historically we've had to invalidate inodes once, and
then get rid of the inode glocks which implied a iput() on the metadata
inode and then invalidate inodes again to be rid of the metadata inodes.

This has been the source of many problems at umount time. In my -nmw git
tree at the moment, there are a couple of patches which are aimed at
fixing this issue. The solution is to embed a struct address_space in
each glock which caches metadata, rather than a complete inode.

>  - fs/gfs2/ops_fstype.c:fill_super()
> 
> 	Tries to kill all inodes in the fill_super error path, looks
> 	very fishy.
> 
For the same reason as above.

It should be possible to remove one or even both of these calls now. The
two patches in the -nmw tree do the bare minimum really to make the
change, but it should be possible to do a bit more clean up in that
area now,

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/