[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b90a4183-6b22-2864-d484-9fd3e30543d0@huawei.com>
Date: Tue, 6 Feb 2018 11:21:44 +0800
From: Gao Xiang <gaoxiang25@...wei.com>
To: Gabriel Krisman Bertazi <krisman@...labora.co.uk>,
Al Viro <viro@...IV.linux.org.uk>
CC: <tytso@....edu>, <david@...morbit.com>, <olaf@....com>,
<linux-ext4@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
<alvaro.soliverez@...labora.co.uk>, <kernel@...ts.collabora.co.uk>,
hutj <hutj@...wei.com>
Subject: Re: [PATCH RFC v2 00/13] NLS/UTF-8 Case-Insensitive lookups for ext4
and VFS proposal
Hi Gabriel,
On 2018/2/6 10:24, Gabriel Krisman Bertazi wrote:
> Al Viro <viro@...IV.linux.org.uk> writes:
>
>> On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote:
>>
>>> The second proposal is related to the VFS layer:
>>>
>>> (2) Enable Insensitive lookup support on a per-mountpoint basis,
>>> via a MS_CASEFOLD flag, with the ultimate goal of supporting a
>>> case-insensitive bind mount of a subtree, side-by-side with a
>>> sensitive version of the filesystem.
>>
>> First reaction: No. With the side of HELL NO.
>>
>> Your ultimate goal sounds utterly insane - dcache tree must be shared
>> for all mounts. Moreover, "would these two names refer to the same
>> object" can not be mount-dependent. Not going to happen.
>>
>> Please, post the description of what you are planning to do.
>> Detailed. I'm not saying that it's 100% impossible to do correctly,
>> but I'm _very_ sceptical about the feasibility.
>>
>> I'm certainly not going to ACK any VFS changes until you convince
>> me that this thing can be done with sane semantics.
>
> Hi Viro,
>
> Sorry for the delay. I wanted to get a better understanding of the
> code before getting back to you.
>
> Our customer is interested in exposing a subtree of an existing
> filesystem (native Linux filesystems, xfs, ext4 and others) in an
> case-insensitive lookup manner, without paying the cost of a userspace
> getdents implementation, and, preferably, without requiring the user to
> modify data on the disk.
Could I express my opinion? I have working on case-insensitive sdcardfs
for months.
I think your problem is how we optimise a case-insensitive lookup on the
file system with a case-sensitive dcache (I mean d_add and no d_compare
and d_hash).
In that case, we could not trust the negative dentry when _creating_ a
case-insensitive file, for example:
there exists "anDroid" on-disk, but ext4's in-memory dcache only has
the negative "Android", if we lookup "Android" we will get the
_negative_ dentry, but we _cannot_ create it since "anDroid" exists on
disk. In the create case, an on-disk _iterate_ (or readdir) is necessary.
I could give another example, if we uses case-insentive ext4 and create
"Android" and "anDroid", how to deal with the case in the
case-insensitive way?
I mean in that case we should make both "Android" and "anDroid" can
access, right?
I think we need to build a special case-sensitive dcache rather than
a case-insensitive dcache following the native case-insentive fs(use
d_add_ci, d_compare and d_hash, eg. fat, ntfs...)
Finally, I agree "let the user shot herself in the foot by having two
files with the exact CI name", but I think it could not the VFS
_busniess_ itself since each customer solution "case-sensitive ext4 ->
case-insensitive lookup" has their _perfered_ way (for example,
"android" and "Android" exist, A perfers android and B perfers Android.
Finally, I think for optmization, ext4 or other fs could add some dir
inode _tag_ and supports native case-insensitive for these dirs could be
better....
Sorry to bother,
Thanks,
>
> After learning, last year, about the similar Android issue from Ted at a
> conference, I started to believe a bind mount is the appropriate
> solution for both our use cases.
>
> Like Ted mentioned, we let the user shot herself in the foot by having
> two files with the exact CI name, and which one she gets will be
> undefined. My current solution gets the exact match if it is present,
> and my second approach could be adapted to do that with a performance
> cost only if both files are not cached. That seem to be acceptable by
> Ted and shouldn't be a problem for me too.
>
> Let me start by discussing what I have now, which is a functional
> solution, to the best of my knowledge, but which still requires rework
> to better benefit from the dentry cache. Then I will discuss the
> solutions I am working on.
>
> (1) What I have now:
>
> As I mentioned in the first email, MS_CASEFOLD triggers a MNT_CASEFOLD
> flag in the vfsmount structure. This flag, which gets recalculated
> every time a mountpoint is traversed, is used to decide if we do a
> casefolded ->lookup() for each component of the path. This obviously
> allows paths with multiple nested mountpoints to work.
>
> The ->lookup() function calls d_add_ci(), making sure that only the
> exact-match dentry is inserted in the cache. This means that we don't
> have multiple dentries, differing only by case, associated to the same
> file. This is an important semantic, which I am not modifying. This
> also means that a following CI lookup with an inexact-case match will
> trigger another ->lookup(), which is bad, but is manageable (for now).
> I address this problem later in this email.
>
> Regarding negative dentries. If we have a negative dentry in the cache,
> the case-sensitive algorithm will abort the lookup and return the
> dentry, nothing new here. If, on the other hand, That component lookup
> is under a MNT_CASEFOLD search, the code does a bigger effort, ignores
> the negative dentry and still do ->lookup().
>
> This means that:
>
> - Case-sensitive (CS) search performance is not affected, except for
> the following sequence: Do a CS(foo) which returns a negative_dentry,
> then CI(FOO), and finally CS(foo). This will require the second
> CS(foo) to re-do the search and recreate the negative dentry. If
> only CS searches are done, the performance should be unaffected.
>
> - Case-insensitive (CI) benefits from the dentry cache as long as it is
> doing an exact-case name match. Otherwise, it always falls to
> ->lookup(). The exact-name dentry, if already in the cache, will be
> returned in both cases.
>
> - Negative dentries are ignored if doing CI and it always re-do the
> search.
>
> The prototype code is in the following branch. I can send
> patches if you want.
>
> https://gitlab.collabora.com/krisman/linux.git -b vfs-ms_casefold
>
> (2) What I want to do:
>
> I want to allow lookups of the inexact name to find the cached dentry,
> preventing it from going to the disk only to find out later it already
> has that dentry cached in some other cache bucket. This means that,
> whenever there is a MNT_CASEFOLD enabled, every mountpoint of that
> superblock needs to use d_hash(casefold(name)), such that every file is
> in a known bucket. Obviously this is a worse hash function, but the
> cost is only payed by filesystems using the feature. If a bind mount is
> done, we must rehash all existing entries of that filesystem at
> "bind,remount,ignorecase" time.
>
> Like the previous version, we only store exact name dentries, even
> though the bucket is different now. When doing a LOOKUP_CASEFOLD, we
> use a different ->d_compare_CI() to do the matching, such that we can
> find the right dentry or not, in both scenarios (CI and CS).
>
> We still can't trust negative dentries when doing a CI lookup. Consider
> the case where we do CS(FOO) and foo doesn't exist. Now we have a
> negative_dentry(FOO), but CI(FOO) would still need to find an existing
> 'foo' file. The first solution is to make fast_CI_lookup() walk the
> entire bucket and ignore previous matching negative dentries if a
> positive one is found. If no positive dentry is found, then fallback
> and do ->lookup(). A superior solution is to mark negative dentries as
> soft negatives and hard negatives, depending on what kind of lookup
> created the dentry, whether CS or CI, respectively. Then, we can allow
> a CI lookup to return the negative dentry only if it is a HARD negative,
> and try the ->lookup() otherwise. For this to work, any dentry creation
> on that parent requires that we invalidate hard negative children
> dentries, and make them soft, at insertion/rename time.
>
> With these changes I believe the dentry cache could be shared by code
> doing CI and CS lookups of the same dentry. My main concern is whether
> we could invalidate a negative dentry without harming parallel lookups
> using that dentry, but I believe I can work around that.
>
> I believe this covers everything I could identify, but please let me
> know if I missed something, obvious or not.
>
Powered by blists - more mailing lists