[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 03 Mar 2009 17:36:05 +0900
From: Akira Fujita <a-fujita@...jp.nec.com>
To: Theodore Tso <tytso@....edu>
CC: linux-ext4@...r.kernel.org
Subject: Re: [RFC][PATCH 2/3] ext4: Exchange the blocks between two inodes
Hi Ted,
Thank you for comment. :-)
Theodore Tso wrote:
>> +
>> + up_write(&EXT4_I(org_inode)->i_data_sem);
>> + ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags,
>> + &page, &fsdata);
>> + down_write(&EXT4_I(org_inode)->i_data_sem);
>>
>
> This is going to be a problem. Once we release i_data_sem, there is
> the possibility that other processes which might be running and
> accessing the file at the same time that the defragger is running
> could be blocked waiting for i_data_sem to be released. The moment it
> gets released, they will grab the lock then start to modify extent
> tree --- either allocating new blocks to it, or worse, truncating or
> unlinking the target inode.
>
> This is going to be a mess to fix, since Linux doesn't have recursive
> locking primitives. We do take i_mutex, which will protect us against
> truncates, but it won't protect against a write() system call. Also,
> if there inode has delayed allocation blocks pending, those could get
> written out by the page cleaner, and i_mutex won't protect us against
> changes to i_data_sem, either.
>
>
As you said, we take i_mutex at the start of ext4_defrag()
and hold it until the end of this function,
so orig file is protected against truncates and
it never be shrunk during defrag.
On the other hand, semaphore is released/taken around write_begin()
in ext4_defrag_partial() every page, so it does not protect orig file
against a write() system call from other process.
So that defrag result (fragmentation) might not be best,
but data corruption does not occur at least.
So I think it is not a serious problem.
As above, it is not necessary to lock the whole of ext4_defrag()
with semaphore, we should just lock only a necessary point.
Therefore defrag V1's lock seems have unneeded lock points.
I will change lock point and semaphore type in the next version.
Do I overlook something?
Regards,
Akira Fujita
> We could add special-case kludgery to wrap around all of the places
> that takes or release the i_data_sem so that we get recursive locking
> support --- but that would be very ugly indeed.
>
> I'm not sure what's the best way to deal with this; maybe if we sleep
> on it someone will come up with a better suggestion --- but it's
> something that we have to figure out.
>
> - Ted
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists