[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALCETrXLtLbgQH-eXm7AvxyRYa2jxZ=3jR=CfvAY+drGC_O4nA@mail.gmail.com>
Date: Wed, 4 Sep 2013 13:05:27 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Jan Kara <jack@...e.cz>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Dave Chinner <david@...morbit.com>,
"Theodore Ts'o" <tytso@....edu>,
Dave Hansen <dave.hansen@...ux.intel.com>, xfs@....sgi.com,
Tim Chen <tim.c.chen@...ux.intel.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates
On Wed, Sep 4, 2013 at 12:20 PM, Jan Kara <jack@...e.cz> wrote:
> On Wed 04-09-13 10:54:50, Andy Lutomirski wrote:
>> >> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
>> >> }
>> >> EXPORT_SYMBOL(write_one_page);
>> >>
>> >> +void mapping_flush_cmtime(struct address_space *mapping)
>> >> +{
>> >> + if (mapping_test_clear_cmtime(mapping) &&
>> >> + mapping->a_ops->update_cmtime_deferred)
>> >> + mapping->a_ops->update_cmtime_deferred(mapping);
>> >> +}
>> >> +EXPORT_SYMBOL(mapping_flush_cmtime);
>> > Hum, is there a reason for update_cmtime_deferred() operation? I can
>> > hardly imagine anyone will want to do anything else than what
>> > inode_update_time_writable() does so why bother? You mention tmpfs & co.
>> > don't fit into your scheme well with which I agree so let's just keep
>> > file_update_time() in their page_mkwrite() operation. But I don't see a
>> > real need for avoiding the deferred cmtime logic...
>>
>> I think there might be odd corner cases. For example, mmap a tmpfs
>> file, write it, and unmap it. Then, an hour later, maybe the system
> If you unmap it then that will handle the update. But if you won't unmap,
> you'd get spurious updates of timestamps which would be strange.
>
>> will be under memory pressure and page out the file. This could
>> trigger a surprising time update. (I'm not sure this can actually
>> happen on tmpfs, but maybe it would on some other filesystem.)
>>
>> Does this actually matter? A flag to turn the feature on or off would
>> do the trick, but I don't think there's precedent for sticking a flag
>> in a_ops.
> Flag in a_ops is ugly. But you can have a flag in 'struct
> filesystem_type' which would be reasonable.
OK, will do.
>
>> >> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
>> >> +{
>> >> + /*
>> >> + * We get called from munmap and msync. Both calls can race
>> >> + * with fs freezing. If the fs is frozen after
>> >> + * mapping_test_clear_cmtime but before the time update, then
>> >> + * sync_filesystem will miss the cmtime update (because we
>> >> + * just cleared it) and we don't be able to write (because the
>> >> + * fs is frozen). On the other hand, we can't just return if
>> >> + * we're in the SB_FREEZE_PAGEFAULT state because our caller
>> >> + * expects the timestamp to be synchronously updated. So we
>> >> + * get write access without blocking, at the SB_FREEZE_FS
>> >> + * level. If the fs is already fully frozen, then we already
>> >> + * know we have nothing to do.
>> >> + */
>> >> +
>> >> + if (!mapping_test_cmtime(mapping))
>> >> + return; /* Optimization: nothing to do. */
>> >> +
>> >> + if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
>> >> + mapping_flush_cmtime(mapping);
>> >> + __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
>> >> + }
>> >> +}
>> > This is wrong because SB_FREEZE_FS level is targetted for filesystem
>> > internal use. Also it is racy. mapping_flush_cmtime() ends up calling
>> > mark_inode_dirty() and filesystems such as ext4 or xfs will start a
>> > transaction to store inode in the journal. This gets freeze protection at
>> > SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
>> > SB_FREEZE_FS before this second protection, things will deadlock.
>>
>> Whoops -- I assumed that it was safe to recursively take freeze
>> protection at the same level.
>>
>> I'm worried about the following race:
>>
>> Thread 1 (in munmap):
>> Check AS_CMTIME set
>> sb_start_pagefault
>>
>> Thread 2 (freezing the fs):
>> frozen = SB_FREEZE_PAGEFAULT;
>> sync_filesystem()
>>
>> Thread 1 is now stuck. It doesn't need to be, because sync_filesystem
>> will flush out the cmtime write. But there doesn't seem to be a clean
>> mechanism to wait for the freeze to finish.
> OK, I see. Frankly, I'd rather live with msync() and munmap() blocking
> while filesystem is frozen than trying to outsmart the freezing logic...
> If someone comes up with a usecase where it causes trouble, we can always
> improve the logic with some clever tricks.
I'll at least check that it's a shared writable mapping before doing
the flush to avoid blocking on other types of munmap.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists