[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1199389069.13731.72.camel@localhost>
Date: Thu, 03 Jan 2008 11:37:49 -0800
From: Dave Hansen <haveblue@...ibm.com>
To: "Serge E. Hallyn" <serue@...ibm.com>
Cc: akpm@...l.org, linux-kernel@...r.kernel.org, miklos@...redi.hu,
hch@...radead.org
Subject: Re: [PATCH] track number of mnts writing to superblocks
On Wed, 2008-01-02 at 22:02 -0600, Serge E. Hallyn wrote:
> Ok I'm blabbing quite a bit here while trying to figure out
> the patch, and maybe there are some useful hints for where more
> comments would be useful. But other than the fact that
> mark_mnt_has_writer() needs to the atomic_inc() even if
> cpu_writer was passed in as NULL, the patch seems good.
Yeah, I screwed that up. Should be fixed now.
> > Signed-off-by: Dave Hansen <haveblue@...ibm.com>
> > ---
> >
> > linux-2.6.git-dave/fs/file_table.c | 24 -----
> > linux-2.6.git-dave/fs/namespace.c | 134 +++++++++++++++++++++++++------
> > linux-2.6.git-dave/fs/super.c | 61 +++++++++++---
> > linux-2.6.git-dave/include/linux/fs.h | 5 -
> > linux-2.6.git-dave/include/linux/mount.h | 3
> > 5 files changed, 163 insertions(+), 64 deletions(-)
> >
> > diff -puN fs/file_table.c~track_sb_mnt_writers fs/file_table.c
> > --- linux-2.6.git/fs/file_table.c~track_sb_mnt_writers 2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/file_table.c 2008-01-02 10:49:11.000000000 -0800
> > @@ -374,30 +374,6 @@ void file_kill(struct file *file)
> > }
> > }
> >
> > -int fs_may_remount_ro(struct super_block *sb)
> > -{
> > - struct file *file;
> > -
> > - /* Check that no files are currently opened for writing. */
> > - file_list_lock();
> > - list_for_each_entry(file, &sb->s_files, f_u.fu_list) {
> > - struct inode *inode = file->f_path.dentry->d_inode;
> > -
> > - /* File with pending delete? */
> > - if (inode->i_nlink == 0)
> > - goto too_bad;
> > -
> > - /* Writeable file? */
> > - if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
>
> (why did this originally skip directories?)
I think it's more to skip device files and named pipes than directories.
I don't even know what happens offhand if you try a plain write() to an
open directory.
> > diff -puN fs/file_table.c.orig~track_sb_mnt_writers fs/file_table.c.orig
> > diff -puN fs/namespace.c~track_sb_mnt_writers fs/namespace.c
> > --- linux-2.6.git/fs/namespace.c~track_sb_mnt_writers 2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/namespace.c 2008-01-02 13:39:52.000000000 -0800
> > @@ -118,7 +118,7 @@ struct mnt_writer {
> > * If holding multiple instances of this lock, they
> > * must be ordered by cpu number.
> > */
> > - spinlock_t lock;
> > + struct mutex lock;
> > struct lock_class_key lock_class; /* compiles out with !lockdep */
> > unsigned long count;
> > struct vfsmount *mnt;
> > @@ -130,7 +130,7 @@ static int __init init_mnt_writers(void)
> > int cpu;
> > for_each_possible_cpu(cpu) {
> > struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
> > - spin_lock_init(&writer->lock);
> > + mutex_init(&writer->lock);
> > lockdep_set_class(&writer->lock, &writer->lock_class);
> > writer->count = 0;
> > }
> > @@ -145,11 +145,52 @@ static void mnt_unlock_cpus(void)
> >
> > for_each_possible_cpu(cpu) {
> > cpu_writer = &per_cpu(mnt_writers, cpu);
> > - spin_unlock(&cpu_writer->lock);
> > + mutex_unlock(&cpu_writer->lock);
> > }
> > }
> >
> > -static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
> > +static int mark_mnt_has_writer(struct vfsmount *mnt,
> > + struct mnt_writer *cpu_writer)
> > +{
> > + /*
> > + * Ensure that if there are people racing to set
> > + * the bit that only one of them succeeds and can
> > + * increment the sb count.
> > + */
> > + if (test_and_set_bit(ilog2(MNT_MAY_HAVE_WRITERS), &mnt->mnt_flags))
> > + return 0;
>
> Comment isn't entirely clear, but you're returning 0 here because
> someone else has already set the flag and incremented
> sb->__s_mnt_writers so you don't have to and you're all set to go on
> with writing?
Yeah. This function is all about making sure that the sb is marked
properly because the mnt is writable. If it's already marked, then
we're good to go.
> > + if (cpu_writer == NULL)
> > + return 0;
> > +
> > + /*
> > + * Our goal here is to get exclusion from a superblock
> > + * remount operation (fs_may_remount_ro()). This is
> > + * effectively a slow path that we must go through the
> > + * first time we set the bit on each mount, but never
> > + * again unless the writer counts get coalesced.
> > + */
> > +
> > + mutex_unlock(&cpu_writer->lock);
> > + lock_super(mnt->mnt_sb);
> > +
> > + atomic_inc(&mnt->mnt_sb->__s_mnt_writers);
>
> The atomic_inc of course should be done even if cpu_writer was passed
> in as NULL, you just don't need to do the locking then, and can
> return 0 in that case?
Yep. I'll fix that.
> > +
> > + unlock_super(mnt->mnt_sb);
> > + mutex_lock(&cpu_writer->lock);
> > + return -EAGAIN;
> > +}
> > +
> > +static void __mark_sb_if_writable(struct vfsmount *mnt)
>
> This function is taking the writable state from a mnt and marking it in
> the sb. So the name should be a shorter verison of something like
> "commit_mnt_writable_state_to_sb"
Here's what I have now:
static void __sync_mnt_writable_to_sb(struct vfsmount *mnt)
> > +{
> > + int bitnr = ilog2(MNT_MAY_HAVE_WRITERS);
> > +
> > + if (atomic_read(&mnt->__mnt_writers))
> > + mark_mnt_has_writer(mnt, NULL);
> > + else if (test_and_clear_bit(bitnr, &mnt->mnt_flags))
> > + atomic_dec(&mnt->mnt_sb->__s_mnt_writers);
>
> And after staring at this code for awhile it did make sense, but a
> comment above it would help saying that
>
> /* If mnt has writers, mark_mnt_has_writer() will make
> sure that is marked in the sb. If mnt has no writers,
> then if mnt->mnt_flags was previously set, that means
> that mnt->mnt_sb->__s_mnt_writers reflects our having
> writers, so we decrement it
> */
>
> Ok, maybe it's clearer to other people and the comment isn't needed...
You couldn't figure it out at first glance, and I tend to think that
most people will have the same experience. I'll see if I can add a
better comment.
I'll send out an updated patch in a bit along with a simpler alternative
patch.
-- Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists