linux-kernel - Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxhPzcX6Ti8UX4WOg9gJJn+YTuk9OgU80d9imoJ2QdXaWQ@mail.gmail.com>
Date:   Sat, 16 May 2020 09:58:11 +0300
From:   Amir Goldstein <amir73il@...il.com>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Luis Henriques <lhenriques@...e.com>,
        Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org,
        linux-kernel <linux-kernel@...r.kernel.org>,
        fstests <fstests@...r.kernel.org>,
        Dave Chinner <dchinner@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Miklos Szeredi <miklos@...redi.hu>,
        "J. Bruce Fields" <bfields@...hat.com>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>
Subject: Re: [PATCH] ceph: don't return -ESTALE if there's still an open file

[pulling in nfs guys]

> > Questions:
> > 1. Does sync() result in fully purging inodes on MDS?
>
> I don't think so, but again, that code is not trivial to follow. I do
> know that the MDS keeps around a "strays directory" which contains
> unlinked inodes that are lazily cleaned up. My suspicion is that it's
> satisfying lookups out of this cache as well.
>
> Which may be fine...the MDS is not required to be POSIX compliant after
> all. Only the fs drivers are.
>
> > 2. Is i_nlink synchronized among nodes on deferred delete?
> > IWO, can inode come back from the dead on client if another node
> > has linked it before i_nlink 0 was observed?
>
> No, that shouldn't happen. The caps mechanism should ensure that it
> can't be observed by other clients until after the change.
>
> That said, Luis' current patch doesn't ensure we have the correct caps
> to check the i_nlink. We may need to add that in before we can roll with
> this.
>
> > 3. Can an NFS client be "migrated" from one ceph node to another
> > with an open but unlinked file?
> >
>
> No. Open files in ceph are generally per-client. You can't pass around a
> fd (or equivalent).

Not sure we are talking about the same thing.
It's not ceph fd that is being passed around, it's the NFS client's fd.
If there is no case where NFS client would access ceph client2
with a handle it got from ceph client1, then there is no reason to satisfy
an open_by_handle() call for an unlinked file on client2.
If file was opened on client1, it may be "legal" to satisfy open_by_handle()
on client2, but I don't see how stopping to satisfy that can break anything.

>
> > I think what the test is trying to verify is that a "fully purged" inodes
> > cannot be opened db handle, but there is no standard way to verify
> > "fully purged", so the test resorts to sync() + another sync() + drop_caches.
> >
>
> Got it. That makes sense.
>
> > Is there anything else that needs to be done on ceph in order to flush
> > all deferred operations from this client to MDS?
>
> I'm leaning toward something like what Luis has proposed, but adding in
> appropriate cap handling.

That sounds fine.

>
> Basically, we have to consider the situation where one client has the
> file open and another client unlinks it, and then does an
> open_by_handle_at. Should it succeed in that case?
>
> I can see arguments for either way.

IMO, the behavior should be defined for a client that has the file open.
For the rest it does not really matter.

My argument is that is it easy to satisfy the test's expectation and conform
to behavior of other filesystems without breaking any real workload.

To satisfy the test's expectation, you only need to change behavior of ceph
client in i_count 1 use case. If i_count is 1 need to take all relevant caps
to check that i_nlink is "globally" 0, before returning ESTALE.
But if i_count > 1, no need to bother.

Thanks,
Amir.