[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVxNtCumkbDzfn2+LwvAL5y18NaA8-00YDyHagBXkGX0w@mail.gmail.com>
Date: Wed, 3 Sep 2014 09:57:38 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Jeff Layton <jlayton@...chiereds.net>
Cc: Michael Kerrisk <mtk.manpages@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Pavel Emelyanov <xemul@...allels.com>,
"J. Bruce Fields" <bfields@...ldses.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH] locks: Ability to test for flock presence on fd
On Sep 3, 2014 9:04 AM, "Jeff Layton" <jlayton@...chiereds.net> wrote:
>
> On Wed, 3 Sep 2014 20:00:02 +0400
> Pavel Emelyanov <xemul@...allels.com> wrote:
>
> > On 09/03/2014 07:55 PM, Jeff Layton wrote:
> > > On Wed, 3 Sep 2014 18:38:24 +0400
> > > Pavel Emelyanov <xemul@...allels.com> wrote:
> > >
> > >> On 09/02/2014 11:53 PM, Jeff Layton wrote:
> > >>> On Tue, 2 Sep 2014 15:43:00 -0400
> > >>> "J. Bruce Fields" <bfields@...ldses.org> wrote:
> > >>>
> > >>>> On Tue, Sep 02, 2014 at 11:07:14PM +0400, Pavel Emelyanov wrote:
> > >>>>> On 09/02/2014 10:44 PM, J. Bruce Fields wrote:
> > >>>>>> On Tue, Sep 02, 2014 at 09:17:34PM +0400, Pavel Emelyanov wrote:
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> There's a problem with getting information about who has a flock on
> > >>>>>>> a specific file. The thing is that the "owner" field, that is shown in
> > >>>>>>> /proc/locks is the pid of the task who created the flock, not the one
> > >>>>>>> who _may_ hold it.
> > >>>>>>>
> > >>>>>>> If the flock creator shared the file with some other task (by forking
> > >>>>>>> or via scm_rights) and then died or closed the file, the information
> > >>>>>>> shown in proc no longer corresponds to the reality.
> > >>>>>>>
> > >>>>>>> This is critical for CRIU project, that tries to dump (and restore)
> > >>>>>>> the state of running tasks. For example, let's take two tasks A and B
> > >>>>>>> both opened a file "/foo", one of tasks places a LOCK_SH lock on the
> > >>>>>>> file and then "obfuscated" the owner field in /proc/locks. After this
> > >>>>>>> we have no ways to find out who is the lock holder.
> > >>>>>>>
> > >>>>>>> I'd like to note, that for LOCK_EX this problem is not critical -- we
> > >>>>>>> may go to both tasks and "ask" them to LOCK_EX the file again (we can
> > >>>>>>> do it in CRIU, I can tell more if required). The one who succeeds is
> > >>>>>>> the lock holder.
> > >>>>>>
> > >>>>>> It could be both, actually, right?
> > >>>>>
> > >>>>> Two succeeding with LOCK_EX? AFAIU no. Am I missing something?
> > >>>>
> > >>>> After a fork, two processes "own" the lock, right?:
> > >>>>
> > >>>> int main(int argc, char *argv[])
> > >>>> {
> > >>>> int fd, ret;
> > >>>>
> > >>>> fd = open(argv[1], O_RDWR);
> > >>>> ret = flock(fd, LOCK_EX);
> > >>>> if (ret)
> > >>>> err(1, "flock");
> > >>>> ret = fork();
> > >>>> if (ret == -1)
> > >>>> err(1, "flock");
> > >>>> ret = flock(fd, LOCK_EX);
> > >>>> if (ret)
> > >>>> err(1, "flock");
> > >>>> printf("%d got exclusive lock\n", getpid());
> > >>>> sleep(1000);
> > >>>> }
> > >>>>
> > >>>> $ touch TMP
> > >>>> $ ./test TMP
> > >>>> 15882 got exclusive lock
> > >>>> 15883 got exclusive lock
> > >>>> ^C
> > >>>>
> > >>>> I may misunderstand what you're doing.
> > >>>>
> > >>>
> > >>> Yeah, I don't understand either.
> > >>>
> > >>> Flock locks are owned by the file description. The task that set
> > >>> them is really irrelevant once they are set.
> > >>>
> > >>> In the second flock() call there, you're just "modifying" an existing
> > >>> lock (which turns out to be a noop here).
> > >>>
> > >>> So, I don't quite understand the problem this solves. I get that you're
> > >>> trying to reestablish the flock "state" after a checkpoint/restore
> > >>> event, but why does it matter what task actually sets the locks as long
> > >>> as they're set on the correct set of fds?
> > >>
> > >> Sorry for confusion. Let me try to explain it more clearly.
> > >>
> > >> First, what I meant talking about two LOCK_EX locks. Let's consider
> > >> this scenario:
> > >>
> > >> pid = fork()
> > >> fd = open("/foo"); /* both parent and child has _different_ files */
> > >> if (pid == 0)
> > >> /* child only */
> > >> flock(fd, LOCK_EX);
> > >>
> > >> at this point we have two different files pointing to "/foo" and
> > >> only one of them has LOCK_EX on it. So if try to LOCK_EX it again,
> > >> only at child's file this would succeed. So we can distinguish which
> > >> file is locked using this method.
> > >>
> > >>
> > >>
> > >> Now, what problem this patch is trying to solve. It's quite tricky,
> > >> but still. Let's imagine this scenario:
> > >>
> > >> pid = fork();
> > >> fd = open("/foo"); /* yet again -- two different files */
> > >> if (pid == 0) {
> > >> flock(fd, LOCK_SH);
> > >> pid2 = fork();
> > >> if (pid2 != 0)
> > >> exit(0);
> > >> }
> > >>
> > >> at this point we have:
> > >>
> > >> task A -- the original task with file "/foo" opened
> > >> task B -- the first child, that exited at the end
> > >> task C -- the 2nd child, that "inherited" a file with the lock from B
> > >>
> > >> Note, that file at A and file at C are two different files (struct
> > >> file-s). And it's only the C's one that is locked.
> > >>
> > >> The problem is that the /proc/locks shows the pid of B in this lock's
> > >> owner field. And we have no glue to find out who the real lock owner
> > >> is using the /proc/locks.
> > >>
> > >> If we try to do the trickery like the one we did with LOCK_EX above,
> > >> this is what we would get.
> > >>
> > >> If putting the 2nd LOCK_SH from A and from C, both attempts would succeed,
> > >> so this is not the solution.
> > >>
> > >> If we try to LOCK_EX from A and C, only C would succeed, so this seem
> > >> to be the solution, but it's actually not. If there's another pair of
> > >> A' and C' tasks holding the same "/foo" and having the LOCK_SH on C',
> > >> this trick would stop working as none of the tasks would be able to
> > >> put such lock on this file.
> > >>
> > >>
> > >> Thus, we need some way to find out whether a task X has a lock on file F.
> > >> This patch is one of the ways of doing this.
> > >>
> > >> Hope this explanation is more clear.
> > >>
> > >
> > > Yes, thanks for clarifying.
> > >
> > > I think we do need to be a bit careful when describing this though.
> > >
> > > flock locks are not owned by tasks, but by the file description. So you
> > > can't really tell whether task X has a lock on file F. Several tasks
> > > could have a reference to file F and none of them has any more "claim"
> > > to a lock on that file than another (at least from an API standpoint).
> > >
> > > What your patch really does is tell you whether that file description
> > > has a particular type of lock set on it.
> >
> > Exactly.
> >
> > > Like Bruce, I think this looks fairly reasonable. That said, I had to
> > > go through a bunch of API gyrations recently when getting the OFD lock
> > > patches merged. It would be good to accompany your kernel patch with
> > > glibc and manpage patches as well so we can make sure we have the
> > > design settled before merging anything.
> > >
> > > Sound OK?
> >
> > Sure! But I think glibc and man-pages people would first want the
> > kernel part to get finished, as it's the part that mostly drives the
> > API. Since the linux-api@ is in Cc for this patch, what else would
> > you suggest me to do to keep the process moving?
> >
> > Thanks,
> > Pavel
> >
>
> (cc'ing Michael Kerrisk, the manpages maintainer)
>
> Michael had good suggestions for me when I was doing the OFD lock work.
> I'd also consider cc'ing the glibc development list.
I would suggest writing up a short description of the exact semantics
of your proposal. That way reviewers can decide whether the semantics
make sense and then check whether the code matches the description.
This could take the form of text that would go in the appropriate manpage.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists