[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKYAXd-DSo1R7zgu81vc9ckke+5doeEd2QPwqHt54OF-VZo34Q@mail.gmail.com>
Date: Mon, 18 Feb 2013 23:25:11 +0900
From: Namjae Jeon <linkinjeon@...il.com>
To: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Cc: Andrew Bartlett <abartlet@...ba.org>, linux-kernel@...r.kernel.org,
Ravishankar N <cyberax82@...il.com>,
Amit Sahrawat <amit.sahrawat83@...il.com>,
Nam-Jae Jeon <namjae.jeon@...sung.com>,
Ravishankar N <ravi.n1@...sung.com>,
Amit Sahrawat <a.sahrawat@...sung.com>
Subject: Re: Read support for fat_fallocate()? (was [v2] fat: editions to
support fat_fallocate())
2013/2/18 OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>:
> Andrew Bartlett <abartlet@...ba.org> writes:
>
>>> >> First, Thanks for your interest !
>>> >> A mismatch between inode size and reserved blocks can be either due to
>>> >> pre-allocation (after our changes) or due to corruption (sudden unplug
>>> >> of media etc).
>>> >> We don’t think it is right to include only read only support (i.e.
>>> >> without fallocate support) for such files because if such files are
>>> >> encountered it only means that the file is corrupted, as there is no
>>> >> current method to check if the issue is due to pre-allocation.
>>> >> If it is to be included in the kernel, then the whole patch has to go
>>> >> in.
>>> >
>>> > I don't see why that is the case.
>>> If we consider that there is no FALLOCATE support, then the condition
>>> of file size and blocks not matching can be only possible in case of
>>> corruption, right?
>>
>> Sure. I was just suggesting we transparently recover from that, by
>> using the blocks. Think of it more as an online fsck not about
>> fallocate.
>>
>> Anyway, if you don't think it's reasonable to use those blocks, or to
>> 'just fix it', then we just have to continue to do as we currently do.
>> That is on first sign of FS corruption just stop doing writes, and await
>> an FSCK.
>
> I'm not sure what is suggesting actually though. We have to consider
> about synchronous runtime fsck makes normal path enough slower.
>
> E.g. probably, in this case, all first open(2) of the inode will have to
> walk cluster chain until end of cluster mark, to verify cluster chain.
>
>>> >> But then again, since the FAT specifications do not accommodate
>>> >> for pre-allocation, then it is up to OGAWA to decide if this is
>>> >> acceptable.
>>> >> In any case, the patch will definitely break backward compatibility
>>> >> (on an older fat driver without fallocate support) and also in case
>>> >> for the two variants for the same kernel versions and only one has
>>> >> FALLOCATE enabled, in such cases also, the behavior will assume
>>> >> corruption in one case.
>>> >
>>> > I agree that the sudden unplug is a concern, but why not make the
>>> > filesystem more robust against that inevitable occurrence? If the
>>> > blocks appear to be allocated to the file, why not use them?
>>> We also agree that there should be pre-allocation feature on FAT, and
>>> we had shared the scenarios where this could be required for a TV/
>>> recorder.
>>> But there are certain drawbacks which were raised by OGAWA with
>>> respect to compatibility and we also tend to agree on them.
>>> There could possibly be an issue where we are unable to distinguish
>>> between pre-allocation and corruption. Perhaps we could set a status
>>> bit on the file to indicate whether the file has pre-allocated blocks.
>>> This will make it clear whether the allocation is genuine through the
>>> FAT Fallocate request or is a result of corruption. Depending on the
>>> status of the flag - the decision can be made regard to reading
>>> blocks.
>>> But, the main issue in this will be storing this bit in on-disk
>>> directory entry for that file. From the feature and usability point of
>>> view, we should have fallocate on FAT too.
>>>
>>> But it needs initial ACK from OGAWA to continue to work on this so
>>> that we can figure out the proper solution to move forward.
>>
>> OK. Given the need to find other approaches, I wanted to suggest some
>> ideas - some of which you may have already considered:
>>
>> What about having a shadow FAT in a file, say called 'allocated space',
>> that would contain inode -> cluster list pairs, and where that file
>> would itself contain the free space the 'belongs' to other files?
>>
>> As new clusters become needed in a file, they would simply be removed
>> from the 'allocated space' file, and assigned to the file they really
>> belong to. That way, another OS just sees a large file, nothing more.
>>
>> Or, if we cannot make any changes to the on-disk format, what about
>> keeping such a database in memory, allocating some of the existing free
>> list to files that have had fallocate() called on them? (Naturally,
>> this makes it non-persistent, and instead more of a 'hint', but could at
>> least solve our mutual performance issues).
>
> [...]
>
> Hm. My concerns are compatibility and reliability. Although We can
> change on-disk format if need, but I don't think it can be compatible
> and reliable. If so, who wants to use it? I feel there is no reason to
> use FAT if there is no compatible.
>
> Well, anyway, possible solution would be, we can pre-allocate physical
> blocks via fallocate(2) or something, but discard pre-allocated blocks
> at ->release() (or before unmount at least). This way would have
> compatibility (no on-disk change over unmount) and possible breakage
> would be same with normal extend write patterns on kernel crash
> (i.e. Windows or fsck will truncate after i_size).
Hi OGAWA.
We don't need to consider device unplugging case ?
If yes, I can rework fat fallocate patch as your suggestion.
Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists