linux-kernel - Re: [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121221110333.GE17357@quack.suse.cz>
Date:	Fri, 21 Dec 2012 12:03:33 +0100
From:	Jan Kara <jack@...e.cz>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
	Linux FS Devel <linux-fsdevel@...r.kernel.org>,
	Dave Chinner <david@...morbit.com>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [RFC PATCH 2/4] mm: Update file times when inodes are written
 after mmaped writes

On Thu 20-12-12 21:42:46, Andy Lutomirski wrote:
> On Thu, Dec 20, 2012 at 4:34 PM, Jan Kara <jack@...e.cz> wrote:
> >> +/**
> >> + *   inode_update_time_writable      -       update mtime and ctime time
> >> + *   @inode: inode accessed
> >> + *
> >> + *   This is like file_update_time, but it assumes the mnt is writable
> >> + *   and takes an inode parameter instead.
> >> + */
> >> +
> >> +int inode_update_time_writable(struct inode *inode)
> >> +{
> >> +     struct timespec now;
> >> +     int sync_it = 0;
> >> +     int ret;
> >> +
> >> +     /* First try to exhaust all avenues to not sync */
> >> +     if (IS_NOCMTIME(inode))
> >> +             return 0;
> >> +
> >> +     now = current_fs_time(inode->i_sb);
> >> +     if (!timespec_equal(&inode->i_mtime, &now))
> >> +             sync_it = S_MTIME;
> >> +
> >> +     if (!timespec_equal(&inode->i_ctime, &now))
> >> +             sync_it |= S_CTIME;
> >> +
> >> +     if (IS_I_VERSION(inode))
> >> +             sync_it |= S_VERSION;
> >> +
> >> +     if (!sync_it)
> >> +             return 0;
> >> +
> >> +     ret = update_time(inode, &now, sync_it);
> >> +
> >> +     return ret;
> >> +}
> >> +EXPORT_SYMBOL(inode_update_time_writable);
> >> +
> >   So this differs from file_update_time() only by not calling
> > __mnt_want_write(). Why this special function? It is actually unsafe wrt
> > remounts read-only or filesystem freezing... For that you need to call
> > sb_start_write() / sb_end_write() around the timestamp update. Umm, or
> > better sb_start_pagefault() / sb_end_pagefault() because the call in
> > remove_vma() gets called under mmap_sem so we are in a rather similar
> > situation to ->page_mkwrite.
> 
> The important difference is that it takes an inode* as a parameter
> instead of a file*.  I don't think that inodes have a struct vfsmount,
> so I can't call __mnt_want_write.  I'll take a look at
> sb_start_pagefault.  I'll also refactor this a bit to minimize code
> duplication.  The current approach was for the v1 rfc version. :)
  I see. OK. __mnt_want_write() is the less important part and I guess we
can just skip that for timestamp updates (we are sure update was generated
because of some writeable mount so it's not really important via which
mount writing of them to disk was triggered). But freeze protection is a
must (otherwise e.g. dm snapshot could create corrupted filesystems). Also
reducing code duplication would be nice as you say.

> >> diff --git a/mm/mmap.c b/mm/mmap.c
> >> index 3913262..60301dc 100644
> >> --- a/mm/mmap.c
> >> +++ b/mm/mmap.c
> >> @@ -223,6 +223,10 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> >>       struct vm_area_struct *next = vma->vm_next;
> >>
> >>       might_sleep();
> >> +
> >> +     if (vma->vm_file)
> >> +             mapping_flush_cmtime(vma->vm_file->f_mapping);
> >> +
> >>       if (vma->vm_ops && vma->vm_ops->close)
> >>               vma->vm_ops->close(vma);
> >>       if (vma->vm_file)
> >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >> index cdea11a..8cbb7fb 100644
> >> --- a/mm/page-writeback.c
> >> +++ b/mm/page-writeback.c
> >> @@ -1910,6 +1910,13 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
> >>               ret = mapping->a_ops->writepages(mapping, wbc);
> >>       else
> >>               ret = generic_writepages(mapping, wbc);
> >> +
> >> +     /*
> >> +      * This is after writepages because the AS_CMTIME bit won't
> >> +      * bet set until writepages is called.
> >> +      */
> >> +     mapping_flush_cmtime(mapping);
> >> +
> >>       return ret;
> >>  }
> >>
> >> @@ -2117,8 +2124,17 @@ EXPORT_SYMBOL(set_page_dirty);
> >>   */
> >>  int set_page_dirty_from_pte(struct page *page)
> >>  {
> >> -     /* Doesn't do anything interesting yet. */
> >> -     return set_page_dirty(page);
> >> +     int ret = set_page_dirty(page);
> >> +
> >> +     /*
> >> +      * We may be out of memory and/or have various locks held, so
> >> +      * there isn't much we can do in here.
> >> +      */
> >> +     struct address_space *mapping = page_mapping(page);
> >   Declarations should go together please. So something like:
> >         int ret = set_page_dirty(page);
> >         struct address_space *mapping = page_mapping(page);
> >
> >         /* comment... */
> 
> Will do.  Some day I'll learn how to act less like a C99/C++
> programmer when writing kernel code.
  We are conservative sometimes ;)

> Am I correct in interpreting this as "these patches may be
> sufficiently non-insane that I should keep working on them"?  I admit
> I'm pretty far out of my depth working on vm/vfs stuff.
  Well, they look sane to me. The 'ext4 sometime hangs' part isn't that
compelling in general (although I understand it's your main motivation) but
'we should follow POSIX' part is. I suggest you CC linux-mm@...ck.org on
your next iteration as some changes involve more MM than anything else.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/