[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMBkX3e8CKaDh7vdqAcTH1tnye4UusPZ_XjJdNVRZcaDqZnt2g@mail.gmail.com>
Date: Tue, 10 Jan 2012 21:38:29 -0800
From: Surbhi Palande <csurbhi@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Eric Sandeen <sandeen@...deen.net>,
Kamal Mostafa <kamal@...onical.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Randy Dunlap <rdunlap@...otime.net>,
Theodore Tso <tytso@....edu>, linux-ext4@...r.kernel.org,
Valerie Aurora <val@...consulting.com>,
Christopher Chaltain <christopher.chaltain@...onical.com>,
"Peter M. Petrakis" <peter.petrakis@...onical.com>,
Mikulas Patocka <mpatocka@...hat.com>
Subject: Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal
On second thoughts, I fail to see why there is still a race window
after this patch.
Here are the reasons why i fail to see how the data can be dirtied
when all the operations involve a journal:
----------
So here is the problem that we see
CPU1 CPU2
Task1 (write operation) Task2
---------------------------------------------------------------------------------------
t1 ext4_journal_start()
t2 ext4_journal_start_sb()
t3 vfs_check_frozen sb->frozen=SB_FREEZE_WRITE
t4 jbd2_journal_start() /* hence forth all processes calling
vfs_check_frozen will wait */
Now, our aim is to stop Task1 from dirtying the page cache ie in
starting this transaction. However if it is successful in starting
this transaction, then we want to make sure that this transaction is
flushed out.
Correct?
(with the journal freeze patch applied)
* Task1 is now executing the code of jbd2_journal_start(). Task2 is
the freezing process.
CPU1 CPU2
Task1 (write operation) Task2
t4 jbd2_journal_start()
...
tx read_lock(&journal->j_state_lock)
Now two things can happen at this point with respect to Task2:
a) either journal->j_flags is set to JBD2_FROZEN already
b) or it is not set.
Lets look at both the cases:
A) When journal->j_flags is set to JBD2_FROZEN already then
jbd2_journal_start() will get stuck on the waitqueue as long as the
journal->j_flags is JBD2_FROZEN. So we cannot create dirty data in
this case.
B) When journal->j_flags is not set to JBD2_ FROZEN then two more
things could happen:
Task2 has already finished the call to jbd2_journal_flush() by the
time Task1 calls read_lock(). So now no new task (T1) should create
any new dirty data as that will not get flushed out. So we really want
to stop the jbd2_journal_start() from succeeding.
CPU1 CPU2
Task1 (write operation) Task2
----------------------------------------------------------------------------------------------
tx read_lock(&journal->j_state_lock)
jbd2_journal_lock_updates() /*journal->j_barrier_count incremented -
so non zero ! */
tx+1 jbd2_journal_flush()
tx+2 write_lock(&journal->j_state_lock);
/* till the read_lock is relinquished by
task1 , we are stuck */
tx+3 if(journal->j_barrier_count)
read_unlock(&journal->j_state_lock);
/* so we can set the journal->j_flags now as write_lock()
succeeds here */
tx+4 goto repeat
The above case is where the writing process/jbd2_journal_start() calls
read_lock() *after* jbd2_journal_lock_updates() are called by the
freezing process and hence is protected by the j_barrier_count check.
This transaction can definitely not start!
Now following is the case where the writing
process/jbd2_journal_start() calls read_lock() *before*
jbd2_journal_lock_updates() are called by the freezing process.
However, if Task1 already finished starting the transaction as follows:
CPU1 CPU2
Task1 (write operation)
Task2
-------------------------------------------------------------------------------------------------------------------------------------------------------------
tx read_lock(&journal->j_state_lock)
tx+1
jbd2_journal_lock_updates() (will be stuck at the next instance...)
tx+3 if(journal->j_barrier_count)
write_lock(journal->j_state_lock)
/* journal->j_barrier_count = 0, so we proceed here*/
/* stuck here till Task1 relinquishes the lock*/
tx+4 read_unlock(&journal->j_state_lock);
tx+5 now start_this_handle() returns successfully
and associates this handle with the running
transaction.
tx+6
jbd2_journal_lock_updates() succeeds
tx+7
jbd2_journal_flush() /* This shall flush the running transactions */
/* There are no
outstanding transcations. No new ext4_journal_start() will succeed by
this point*/
--------------------------------
Thus:
Though a racy transaction can really be started after SB_FREEZE_WRITE,
either that transaction will be flushed by jbd2_journal_flush or it
cannot be started at all because of the barrier count.
So, I do not understand why the journaled writes should really have a
race window after this patch is applied?
Thanks!
Regards,
Surbhi
On Tue, Jan 10, 2012 at 4:13 PM, Surbhi Palande <csurbhi@...il.com> wrote:
> Hi Jan,
>
>>>
>>>
>>> If all the write operations were journaled, then this patch would not allow
>>> ext4 filesystem to have any dirty data after its frozen.
>>> (as journal_start() would block).
>>>
>>> I think the only one candidate that creates dirty data without calling
>>> ext4_journal_start() is mmapped?
>> No, the problem is in any write path. The problem is with operations
>> that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
>> operations dirty the filesystem but running sync may easily miss them.
>> During this phase journal is not frozen so that does not help you in any
>> way.
>>
>> Honza
>
> Ok! No new transaction can really start after the journal is frozen.
> But we can have dirty data after SB_FREEZE_WRITE and before
> SB_FREEZE_TRANS.
> I agree with you. However, can this be fixed by adding a
> sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
> over?
>
>
> So then essentially, when freeze_super() returns, the page cache is clean?
>
> I do definitely agree that the fix is to add a lock for mutual
> exclusion between freeze filesystem and writes to a frozen filesystem.
>
> Thanks!
>
> Regards,
> Surbhi.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists