[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54073B02.2060707@parallels.com>
Date: Wed, 3 Sep 2014 20:00:02 +0400
From: Pavel Emelyanov <xemul@...allels.com>
To: Jeff Layton <jlayton@...chiereds.net>
CC: "J. Bruce Fields" <bfields@...ldses.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
<linux-api@...r.kernel.org>
Subject: Re: [PATCH] locks: Ability to test for flock presence on fd
On 09/03/2014 07:55 PM, Jeff Layton wrote:
> On Wed, 3 Sep 2014 18:38:24 +0400
> Pavel Emelyanov <xemul@...allels.com> wrote:
>
>> On 09/02/2014 11:53 PM, Jeff Layton wrote:
>>> On Tue, 2 Sep 2014 15:43:00 -0400
>>> "J. Bruce Fields" <bfields@...ldses.org> wrote:
>>>
>>>> On Tue, Sep 02, 2014 at 11:07:14PM +0400, Pavel Emelyanov wrote:
>>>>> On 09/02/2014 10:44 PM, J. Bruce Fields wrote:
>>>>>> On Tue, Sep 02, 2014 at 09:17:34PM +0400, Pavel Emelyanov wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> There's a problem with getting information about who has a flock on
>>>>>>> a specific file. The thing is that the "owner" field, that is shown in
>>>>>>> /proc/locks is the pid of the task who created the flock, not the one
>>>>>>> who _may_ hold it.
>>>>>>>
>>>>>>> If the flock creator shared the file with some other task (by forking
>>>>>>> or via scm_rights) and then died or closed the file, the information
>>>>>>> shown in proc no longer corresponds to the reality.
>>>>>>>
>>>>>>> This is critical for CRIU project, that tries to dump (and restore)
>>>>>>> the state of running tasks. For example, let's take two tasks A and B
>>>>>>> both opened a file "/foo", one of tasks places a LOCK_SH lock on the
>>>>>>> file and then "obfuscated" the owner field in /proc/locks. After this
>>>>>>> we have no ways to find out who is the lock holder.
>>>>>>>
>>>>>>> I'd like to note, that for LOCK_EX this problem is not critical -- we
>>>>>>> may go to both tasks and "ask" them to LOCK_EX the file again (we can
>>>>>>> do it in CRIU, I can tell more if required). The one who succeeds is
>>>>>>> the lock holder.
>>>>>>
>>>>>> It could be both, actually, right?
>>>>>
>>>>> Two succeeding with LOCK_EX? AFAIU no. Am I missing something?
>>>>
>>>> After a fork, two processes "own" the lock, right?:
>>>>
>>>> int main(int argc, char *argv[])
>>>> {
>>>> int fd, ret;
>>>>
>>>> fd = open(argv[1], O_RDWR);
>>>> ret = flock(fd, LOCK_EX);
>>>> if (ret)
>>>> err(1, "flock");
>>>> ret = fork();
>>>> if (ret == -1)
>>>> err(1, "flock");
>>>> ret = flock(fd, LOCK_EX);
>>>> if (ret)
>>>> err(1, "flock");
>>>> printf("%d got exclusive lock\n", getpid());
>>>> sleep(1000);
>>>> }
>>>>
>>>> $ touch TMP
>>>> $ ./test TMP
>>>> 15882 got exclusive lock
>>>> 15883 got exclusive lock
>>>> ^C
>>>>
>>>> I may misunderstand what you're doing.
>>>>
>>>
>>> Yeah, I don't understand either.
>>>
>>> Flock locks are owned by the file description. The task that set
>>> them is really irrelevant once they are set.
>>>
>>> In the second flock() call there, you're just "modifying" an existing
>>> lock (which turns out to be a noop here).
>>>
>>> So, I don't quite understand the problem this solves. I get that you're
>>> trying to reestablish the flock "state" after a checkpoint/restore
>>> event, but why does it matter what task actually sets the locks as long
>>> as they're set on the correct set of fds?
>>
>> Sorry for confusion. Let me try to explain it more clearly.
>>
>> First, what I meant talking about two LOCK_EX locks. Let's consider
>> this scenario:
>>
>> pid = fork()
>> fd = open("/foo"); /* both parent and child has _different_ files */
>> if (pid == 0)
>> /* child only */
>> flock(fd, LOCK_EX);
>>
>> at this point we have two different files pointing to "/foo" and
>> only one of them has LOCK_EX on it. So if try to LOCK_EX it again,
>> only at child's file this would succeed. So we can distinguish which
>> file is locked using this method.
>>
>>
>>
>> Now, what problem this patch is trying to solve. It's quite tricky,
>> but still. Let's imagine this scenario:
>>
>> pid = fork();
>> fd = open("/foo"); /* yet again -- two different files */
>> if (pid == 0) {
>> flock(fd, LOCK_SH);
>> pid2 = fork();
>> if (pid2 != 0)
>> exit(0);
>> }
>>
>> at this point we have:
>>
>> task A -- the original task with file "/foo" opened
>> task B -- the first child, that exited at the end
>> task C -- the 2nd child, that "inherited" a file with the lock from B
>>
>> Note, that file at A and file at C are two different files (struct
>> file-s). And it's only the C's one that is locked.
>>
>> The problem is that the /proc/locks shows the pid of B in this lock's
>> owner field. And we have no glue to find out who the real lock owner
>> is using the /proc/locks.
>>
>> If we try to do the trickery like the one we did with LOCK_EX above,
>> this is what we would get.
>>
>> If putting the 2nd LOCK_SH from A and from C, both attempts would succeed,
>> so this is not the solution.
>>
>> If we try to LOCK_EX from A and C, only C would succeed, so this seem
>> to be the solution, but it's actually not. If there's another pair of
>> A' and C' tasks holding the same "/foo" and having the LOCK_SH on C',
>> this trick would stop working as none of the tasks would be able to
>> put such lock on this file.
>>
>>
>> Thus, we need some way to find out whether a task X has a lock on file F.
>> This patch is one of the ways of doing this.
>>
>> Hope this explanation is more clear.
>>
>
> Yes, thanks for clarifying.
>
> I think we do need to be a bit careful when describing this though.
>
> flock locks are not owned by tasks, but by the file description. So you
> can't really tell whether task X has a lock on file F. Several tasks
> could have a reference to file F and none of them has any more "claim"
> to a lock on that file than another (at least from an API standpoint).
>
> What your patch really does is tell you whether that file description
> has a particular type of lock set on it.
Exactly.
> Like Bruce, I think this looks fairly reasonable. That said, I had to
> go through a bunch of API gyrations recently when getting the OFD lock
> patches merged. It would be good to accompany your kernel patch with
> glibc and manpage patches as well so we can make sure we have the
> design settled before merging anything.
>
> Sound OK?
Sure! But I think glibc and man-pages people would first want the
kernel part to get finished, as it's the part that mostly drives the
API. Since the linux-api@ is in Cc for this patch, what else would
you suggest me to do to keep the process moving?
Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists