lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.BSO.4.64.0804301357580.18333@citi.umich.edu>
Date:	Wed, 30 Apr 2008 14:14:47 -0400 (EDT)
From:	"david m. richter" <richterd@...i.umich.edu>
To:	"J. Bruce Fields" <bfields@...ldses.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Bryan Wu <cooloney@...nel.org>, linux-kernel@...r.kernel.org,
	torvalds@...ux-foundation.org, willy@...ian.org,
	viro@...iv.linux.org.uk, uclinux-dist-devel@...ckfin.uclinux.org
Subject: Re: [LTP/VFS] fcntl SETLEASE fails on ramfs/tmpfs

On Wed, 30 Apr 2008, J. Bruce Fields wrote:

> On Tue, Apr 29, 2008 at 07:21:02PM -0400, david m. richter wrote:
> > On Tue, 29 Apr 2008, J. Bruce Fields wrote:
> > 
> > > On Tue, Apr 29, 2008 at 01:54:54PM -0700, Andrew Morton wrote:
> > > > On Tue, 29 Apr 2008 11:42:48 +0800
> > > > "Bryan Wu" <cooloney@...nel.org> wrote:
> > > > 
> > > > > Hi folk,
> > > > > 
> > > > > This days I am digging into this LTP bug reported on our Blackfin test
> > > > > machine, but I think it is general for other system.
> > > > > https://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3743
> > > > > 
> > > > > And I also found Kumar Gala reported this similar bug before.
> > > > > http://lkml.org/lkml/2007/11/14/388
> > > > > 
> > > > > 1, when opening and creating a new on ramfs/tmpfs, the dentry->d_count
> > > > > will be added one as below:
> > > > > --
> > > > > ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> > > > > {
> > > > > |_______struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
> > > > > |_______int error = -ENOSPC;
> > > > > 
> > > > > |_______if (inode) {
> > > > > |_______|_______if (dir->i_mode & S_ISGID) {
> > > > > |_______|_______|_______inode->i_gid = dir->i_gid;
> > > > > |_______|_______|_______if (S_ISDIR(mode))
> > > > > |_______|_______|_______|_______inode->i_mode |= S_ISGID;
> > > > > |_______|_______}
> > > > > |_______|_______d_instantiate(dentry, inode);
> > > > > |_______|_______dget(dentry);|__/* Extra count - pin the dentry in core */
> > > > > |_______|_______error = 0;
> > > > > |_______|_______dir->i_mtime = dir->i_ctime = CURRENT_TIME;
> > > > > |_______}
> > > > > |_______return error;
> > > > > }
> > > > > --
> > > > > The dget(dentry) call introduces an extra count, why?
> > > > > it is the same in tmpfs.
> > > > 
> > > > Because those dentries have no backing store.  Their sole existance is in
> > > > the dentry cache which is normally reclaimable.  But we can't reclaim these
> > > > dentries because there is nowhere from where they can be reestablished.
> > > > 
> > > > > 2, when calling  fcntl(fd, F_SETLEASE,F_WRLCK), it will return -EAGAIN
> > > > > --
> > > > > |_______if ((arg == F_WRLCK)
> > > > > |_______    && ((atomic_read(&dentry->d_count) > 1)
> > > > > |_______|_______|| (atomic_read(&inode->i_count) > 1)))
> > > > > |_______|_______goto out;
> > > > > --
> > > > 
> > > > Sucky heuristic.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > because the dentry->d_count will be 2 not 1. I tested ext2 on Blackfin, it is 1.
> > > > > 
> > > > > 3, so I guess maybe the dget(dentry) of ramfs_mknod is useless. But
> > > > > after remove this dget(),
> > > > > the ramfs can not be mounted as rootfs at all.
> > > > 
> > > > Interesting.  Presumably it got reclaimed synchronously somehow.
> > > > 
> > > > > Is the bug in generic_setlease() or in the ramfs/tmpfs inode create function?
> > > > > 
> > > > > Of course, simply remove the test '((atomic_read(&dentry->d_count) >
> > > > > 1)' can workaround this issue.
> > > > 
> > > > I guess we should make the generic_setlease() heuristic smarter.
> > > > 
> > > > Of course the _reason_ for that heuristic is uncommented and lost in time. 
> > > > And one wonders what locking prevents it from being totally racy, and if
> > > > "none", what happens when the race hits.  Sigh.
> > 
> > 	i'm not sure which particular kernel version we're talking about, 
> > but i think the intent was to rely on the BKL.  i noticed a couple cases 
> > where this didn't actually hold -- e.g., bruce has a patch queued to move 
> > the kmalloc in generic_setlease() so that it precedes the 
> > d_count/i_writecount checks and covers the race he mentions below.  an 
> > earlier patch closed a similar thing with get_write_access() when handling 
> > a truncate, etc.
> > 
> > 	fyi (well, not you, bruce): relatedly, i have a set of patches to 
> > introduce per-inode lease enabling/disabling (so we can fully implement 
> > NFSv4.0 file delegations and v4.1 directory delegations) which expand the 
> > cases where leases are broken (e.g. unlink) and which make it somewhat 
> > more explicit when it's safe to lease/break.  perhaps the next merge 
> > window for the first set of them.
> > 
> > 
> > > Yes, I think the race is:
> > > 
> > > 	1. generic_setlease(., F_WRLCK, .) checks d_count and i_count,
> > > 	   both are 1.
> > > 
> > > 	2. a read open comes in, calls break_lease which finds no lease
> > > 	   and continues happily on.
> > > 
> > > 	3. generic_setlease() sets the write lease.
> > > 
> > > The most likely consequences are that a local reader gets out-of-date
> > > data for a file that a Samba client has modified.
> > >
> > > I suppose that re-checking the d_count and i_count after step 3 might
> > > close the race.
> > 
> > 	as things currently stand, i believe that race can only happen if 
> > the leaser is blocking on the kmalloc.
> 
> Neither break_lease() (the shortcut inlined function, not the full
> __break_lease()) nor any of the open code that I can see is under the
> BKL, so unless I'm missing something, that code is racy.
> 
> --b.

	sorry, obviously you're right; if ->i_flock is null, break_lease() 
is doing the bad thing.  i've been through so many reworkings of that code 
that i'd forgotten which race was where; i came into this thread midway 
and should've reacquainted myself with the actual code in question.  my 
bad :-/

> > but yeah, the d_count check is 
> > pretty frail anyway ...
> > 
> > 
> > > > I suppose a stupid fix would be to set (and later clear) a new flag in
> > > > dentry.d_flags which means
> > > > 
> > > >   this dentry is pinned by a ram-backed device, so d_count==2 means
> > > >   "unused""
> > > > 
> > > > But it would be better to work out exactly what generic_setlease() is
> > > > trying to do there, and do it in a better way.
> > > 
> > > Yes.  What it's supposed to do is provide exclusion between opens and
> > > write leases.
> > > 
> > > We already have a mechanism that provides exclusion between write opens
> > > and exec, using the i_writecount, so we're using that for read leases.
> > > I suppose it'd be possible to do something similar for write leases;
> > > would there be smp scalability problems associated with counting all the
> > > read opens of a given inode?  Other problems?

	interesting; is this feasible?  i'd like to hear what folks think.

> > > 
> > > Even with this problem solved, I'm not convinced write leases are very
> > > useful as implemented.  Their only current user is Samba, which uses
> > > them to grant exclusive access to given files to allow clients to cache
> > > writes.
> > > 
> > > Samba knows when to revoke that exclusive access because the lease
> > > subsystem signals it on a read open of the file.  It doesn't revoke on
> > > stat, however.  This causes problem.  E.g., say Samba takes out a lease
> > > and tells some client it can now cache its writes indefinitely.
> > > Meanwhile a local application (say, make) is polling that file for
> > > changes using stat.  They never see those changes.
> > > 
> > > The NFSv2/v3 server for some reason has its own one-off hack that
> > > reports the ctime as now for on any write-leased file, which leads
> > > people to complain about spurious rebuilds:
> > > 
> > > 	http://bugzilla.kernel.org/show_bug.cgi?id=9454
> > > 
> > > The one thing I suspect is *not* a really serious problem here is the
> > > reported LTP failure, since probably the only user of this is Samba,
> > > which probably doesn't do a lot of tmpfs exports, and in any case it can
> > > probably soldier on (if with degraded performance--how badly I don't
> > > know) without getting the write lease it wants.
> > > 
> > > --b.
> > > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ