lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <79EA1EBC-895B-4AFC-92EB-05E46C6AAEF5@mac.com>
Date:	Wed, 21 Nov 2007 01:16:27 -0500
From:	Kyle Moffett <mrmacman_g4@....com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Hansen <haveblue@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Ulrich Drepper <drepper@...hat.com>,
	linux-kernel@...r.kernel.org,
	"Dinakar Guniguntala [imap]" <dino@...ibm.com>,
	Sripathi Kodi <sripathik@...ibm.com>
Subject: Re: Futexes and network filesystems.

On Nov 20, 2007, at 17:53:52, Er ic W. Biederman wrote:
> I had a chance to think about this a bit more, and realized that  
> the problem is that futexes don't appear to work on network  
> filesystems, even if the network filesystems provide coherent  
> shared memory.
>
> It seems to me that we need to have a call that gets a unique token  
> for a process for each filesystem per filesystem for use in futexes  
> (especially robust futexes).  Say get_fs_task_id(const char *path);
>
> On local filesystems this could just be the pid as we use today,  
> but for filesystems that can be accessed from contexts with  
> potentially overlapping pid values this could be something else.   
> It is an extra syscall in the preparation path, but it should be  
> hardly more expensive the current getpid().
>
> Once we have fixed the futex infrastructure to be able to handle  
> futexes on network filesystems, the pid namespace case will be  
> trivial to implement.

Actually, I would think that get_vm_task_id(void *addr) would be a  
more useful interface.  The call would still be a relatively simple  
lookup to find the struct file associated with the particular virtual  
mapping, but it would be race-free from the perspective of userspace  
and would not require that we somehow figure out the file descriptor  
associated with a particular mmap() (which may be closed by this  
point in time).  Useful extension would be the get_fd_task_id(int fd)  
and get_fs_task_id(const char *path), but those are less important.

The other important thing is to ensure that somehow the numbers are  
considered unique only within the particular domain of a container,  
such that you can migrate a container from one system to another even  
using a simple local ext3 filesystem (on a networked block device)  
and still be able to have things work properly even after the  
migration.  Naturally this would only work with an upgraded libc but  
I think that's a reasonable requirement to enforce for migration of  
futexes and cross-network futexes.

Even for network filesystems which don't implement coherent shared  
memory, you might add a memexcl() system call which (when used by  
multiple cooperating processes) ensures that a given page is only  
ever mapped by at most one computer accessing a given network  
filesystem.  The page-outs and page-ins when shuttling that page  
across the network would be expensive, but I believe the cost would  
be reasonable for many applications and it would allow traditional  
atomic ops on the mapped pages to take and release futexes in the  
uncontended case.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ