linux-kernel - Re: /proc/pid/fd && anon_inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130825065039.GB9299@1wt.eu>
Date:	Sun, 25 Aug 2013 08:50:39 +0200
From:	Willy Tarreau <w@....eu>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Andy Lutomirski <luto@...capital.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"security@...nel.org" <security@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux FS Devel <linux-fsdevel@...r.kernel.org>,
	Brad Spengler <spender@...ecurity.net>
Subject: Re: /proc/pid/fd && anon_inode_fops

On Sun, Aug 25, 2013 at 06:23:17AM +0100, Al Viro wrote:
> On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote:
> 
> > I doubt it. It seems to me that most such entries are implemented
> > for completeness while most valid uses only concern /proc/self/fd.
> > Maybe if we had an option so that only /proc/self/fd would actually
> > allow to access the fds while all /proc/pid/fd would only show what
> > they map to, it would be a good step forward.
> 
> How?  The fundamental problem is not visibility of that stuff, it's
> new opened file for the same object (Linux behaviour) vs. new descriptor
> refering to the same opened file (*BSD and friends).  We can't get
> anon_... sanely reopened in the former semantics and they are very
> visibly different for regular files, so switching to *BSD one is not
> feasible - too high odds of userland breakage.  The difference in
> semantics, of course, is that on Linux opening /dev/stdin gives you
> a descriptor with independent current IO position; on *BSD you get
> a descriptor sharing the current IO position with stdin.  IOW, it's
> independent open() of the same file vs. dup().
> 
> We are really stuck with the current semantics here - switching to
> *BSD one would not only mean serious surgery on descriptor handling
> (it's one of the wartier areas in *BSD VFS, in large part because
> of magic-open-really-a-dup kludges they have to do), it would change
> a long-standing userland API that had been there for nearly 20 years
> _and_ one that tends to be used in corner cases of hell knows how many
> scripts.

Thanks for explaining Al, that really helps me understand. However
there's still a difference between /proc/pid called from the process
itself (=/proc/self) and called from other processes that seems to
suit the situation :

  willy@...pc:~$ ls -la /tmp/bash 
  -r-x--x--x 1 root users 916852 2013-08-25 08:19 /tmp/bash*
  willy@...pc:~$ exec /tmp/bash -i
  willy@...pc:~$ echo $$
  22678
  willy@...pc:~$ ls -la /proc/22678/fd
  ls: cannot open directory /proc/22678/fd: Permission denied
  willy@...pc:~$ ls -la /proc/22678/exe 
  ls: cannot read symbolic link /proc/22678/exe: Permission denied
  willy@...pc:~$ cat /proc/22678/fd/0 
  cat: /proc/22678/fd/0: Permission denied

but :
  willy@...pc:~$ read < /proc/22678/fd/0 
  azerazerazer
  willy@...pc:~$ echo $REPLY
  azerazerazer

strace clearly shows that the process was allowed to inspect itself
and the other ones were not :

  willy@...pc:~$ strace -p 22678
  open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = 3

  willy@...pc:~$ strace cat /proc/22678/fd/0 
  open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = -1 EACCES (Permission denied)

It looks like this difference was introduced by this patch (which also fixes
this issue we've been having for a very long time on 2.4 and early 2.6) :

    8948e11 Allow access to /proc/$PID/fd after setuid()

Thus I'm wondering if something like this could help, the idea would be
that a with the appropriate mount option, a task could only look at its
own descriptors unless it's running with privileges :

    static int proc_fd_permission(struct inode *inode, int mask,
                                  struct nameidata *nd)
    {
       if (task_pid(current) == proc_pid(inode))
             return 0;
       if (capable(CAP_DAC_OVERRIDE))
             return 0;
       if (proc_mounted_with_strict_option)
             return -EACCES;
       return generic_permission(inode, mask, NULL);
    }

Thus it would not change the default behaviour except for people who would
mount /proc with a special option.

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/