lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 03 Jan 2008 11:37:49 -0800
From:	Dave Hansen <haveblue@...ibm.com>
To:	"Serge E. Hallyn" <serue@...ibm.com>
Cc:	akpm@...l.org, linux-kernel@...r.kernel.org, miklos@...redi.hu,
	hch@...radead.org
Subject: Re: [PATCH] track number of mnts writing to superblocks

On Wed, 2008-01-02 at 22:02 -0600, Serge E. Hallyn wrote:
> Ok I'm blabbing quite a bit here while trying to figure out
> the patch, and maybe there are some useful hints for where more
> comments would be useful.  But other than the fact that
> mark_mnt_has_writer() needs to the atomic_inc() even if
> cpu_writer was passed in as NULL, the patch seems good.

Yeah, I screwed that up.  Should be fixed now.

> > Signed-off-by: Dave Hansen <haveblue@...ibm.com>
> > ---
> > 
> >  linux-2.6.git-dave/fs/file_table.c       |   24 -----
> >  linux-2.6.git-dave/fs/namespace.c        |  134 +++++++++++++++++++++++++------
> >  linux-2.6.git-dave/fs/super.c            |   61 +++++++++++---
> >  linux-2.6.git-dave/include/linux/fs.h    |    5 -
> >  linux-2.6.git-dave/include/linux/mount.h |    3 
> >  5 files changed, 163 insertions(+), 64 deletions(-)
> > 
> > diff -puN fs/file_table.c~track_sb_mnt_writers fs/file_table.c
> > --- linux-2.6.git/fs/file_table.c~track_sb_mnt_writers	2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/file_table.c	2008-01-02 10:49:11.000000000 -0800
> > @@ -374,30 +374,6 @@ void file_kill(struct file *file)
> >  	}
> >  }
> > 
> > -int fs_may_remount_ro(struct super_block *sb)
> > -{
> > -	struct file *file;
> > -
> > -	/* Check that no files are currently opened for writing. */
> > -	file_list_lock();
> > -	list_for_each_entry(file, &sb->s_files, f_u.fu_list) {
> > -		struct inode *inode = file->f_path.dentry->d_inode;
> > -
> > -		/* File with pending delete? */
> > -		if (inode->i_nlink == 0)
> > -			goto too_bad;
> > -
> > -		/* Writeable file? */
> > -		if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
> 
> (why did this originally skip directories?)

I think it's more to skip device files and named pipes than directories.
I don't even know what happens offhand if you try a plain write() to an
open directory.

> > diff -puN fs/file_table.c.orig~track_sb_mnt_writers fs/file_table.c.orig
> > diff -puN fs/namespace.c~track_sb_mnt_writers fs/namespace.c
> > --- linux-2.6.git/fs/namespace.c~track_sb_mnt_writers	2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/namespace.c	2008-01-02 13:39:52.000000000 -0800
> > @@ -118,7 +118,7 @@ struct mnt_writer {
> >  	 * If holding multiple instances of this lock, they
> >  	 * must be ordered by cpu number.
> >  	 */
> > -	spinlock_t lock;
> > +	struct mutex lock;
> >  	struct lock_class_key lock_class; /* compiles out with !lockdep */
> >  	unsigned long count;
> >  	struct vfsmount *mnt;
> > @@ -130,7 +130,7 @@ static int __init init_mnt_writers(void)
> >  	int cpu;
> >  	for_each_possible_cpu(cpu) {
> >  		struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
> > -		spin_lock_init(&writer->lock);
> > +		mutex_init(&writer->lock);
> >  		lockdep_set_class(&writer->lock, &writer->lock_class);
> >  		writer->count = 0;
> >  	}
> > @@ -145,11 +145,52 @@ static void mnt_unlock_cpus(void)
> > 
> >  	for_each_possible_cpu(cpu) {
> >  		cpu_writer = &per_cpu(mnt_writers, cpu);
> > -		spin_unlock(&cpu_writer->lock);
> > +		mutex_unlock(&cpu_writer->lock);
> >  	}
> >  }
> > 
> > -static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
> > +static int mark_mnt_has_writer(struct vfsmount *mnt,
> > +			       struct mnt_writer *cpu_writer)
> > +{
> > +	/*
> > +	 * Ensure that if there are people racing to set
> > +	 * the bit that only one of them succeeds and can
> > +	 * increment the sb count.
> > +	 */
> > +	if (test_and_set_bit(ilog2(MNT_MAY_HAVE_WRITERS), &mnt->mnt_flags))
> > +		return 0;
> 
> Comment isn't entirely clear, but you're returning 0 here because
> someone else has already set the flag and incremented
> sb->__s_mnt_writers so you don't have to and you're all set to go on
> with writing?

Yeah.  This function is all about making sure that the sb is marked
properly because the mnt is writable.  If it's already marked, then
we're good to go.

> > +	if (cpu_writer == NULL)
> > +		return 0;
> > +
> > +	/*
> > +	 * Our goal here is to get exclusion from a superblock
> > +	 * remount operation (fs_may_remount_ro()).  This is
> > +	 * effectively a slow path that we must go through the
> > +	 * first time we set the bit on each mount, but never
> > +	 * again unless the writer counts get coalesced.
> > +	 */
> > +
> > +	mutex_unlock(&cpu_writer->lock);
> > +	lock_super(mnt->mnt_sb);
> > +
> > +	atomic_inc(&mnt->mnt_sb->__s_mnt_writers);
> 
> The atomic_inc of course should be done even if cpu_writer was passed
> in as NULL, you just don't need to do the locking then, and can
> return 0 in that case?

Yep.  I'll fix that.

> > +
> > +	unlock_super(mnt->mnt_sb);
> > +	mutex_lock(&cpu_writer->lock);
> > +	return -EAGAIN;
> > +}
> > +
> > +static void __mark_sb_if_writable(struct vfsmount *mnt)
> 
> This function is taking the writable state from a mnt and marking it in
> the sb.  So the name should be a shorter verison of something like
> "commit_mnt_writable_state_to_sb"

Here's what I have now:

static void __sync_mnt_writable_to_sb(struct vfsmount *mnt)

> > +{
> > +	int bitnr = ilog2(MNT_MAY_HAVE_WRITERS);
> > +
> > +	if (atomic_read(&mnt->__mnt_writers))
> > +		mark_mnt_has_writer(mnt, NULL);
> > +	else if (test_and_clear_bit(bitnr, &mnt->mnt_flags))
> > +		atomic_dec(&mnt->mnt_sb->__s_mnt_writers);
> 
> And after staring at this code for awhile it did make sense, but a
> comment above it would help saying that
> 
> 	/* If mnt has writers, mark_mnt_has_writer() will make
> 	   sure that is marked in the sb.  If mnt has no writers,
> 	   then if mnt->mnt_flags was previously set, that means
> 	   that mnt->mnt_sb->__s_mnt_writers reflects our having
> 	   writers, so we decrement it
> 	 */
> 
> Ok, maybe it's clearer to other people and the comment isn't needed...

You couldn't figure it out at first glance, and I tend to think that
most people will have the same experience.  I'll see if I can add a
better comment.

I'll send out an updated patch in a bit along with a simpler alternative
patch.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ