linux-kernel - Re: [PATCH] fs: fix i_writecount on shmem and friends

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFwzBoTc+09jZTGjfTQTfkgZwrN5m09snb+F9inLaDn0OA@mail.gmail.com>
Date:	Tue, 11 Mar 2014 12:05:09 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	David Herrmann <dh.herrmann@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	David Howells <dhowells@...hat.com>,
	Oleg Nesterov <oleg@...hat.com>,
	stable <stable@...r.kernel.org>
Subject: Re: [PATCH] fs: fix i_writecount on shmem and friends

Al, any comments?

David's test-program is some broken mix of C and shell scripting, but
the fixed version does show the issue he talks about:

    int main(int argc, char **argv)
    {
            int p[2], ro;
            char buf[128];

            pipe(p);
            sprintf(buf, "/proc/self/fd/%d", p[1]);
            ro = open(buf, O_RDONLY);
            sprintf(buf, "/proc/self/fd/%d", ro);
            close(p[1]);
            return open(buf, O_RDWR);
    }

which returns ETXTBSY (most easily seen by just stracing it).

The patch would also seem to make sense, with the i_readcount_inc()
being immediately below for the FMODE_READ case.

[ Quoting the whole email for context, sorry ]

                  Linus

On Mon, Mar 3, 2014 at 7:16 AM, David Herrmann <dh.herrmann@...il.com> wrote:
> VM_DENYWRITE currently relies on i_writecount. Unless there's an active
> writable reference to an inode, VM_DENYWRITE is not allowed.
> Unfortunately, alloc_file() does not increase i_writecount, therefore,
> does not prevent a following VM_DENYWRITE even though the new file might
> have been opened with FMODE_WRITE. However, callers of alloc_file() expect
> the file object to be fully instantiated so they can call fput() on it. We
> could now either fix all callers to do an get_write_access() if opened
> with FMODE_WRITE, or simply fix alloc_file() to do that. I chose the
> latter.
>
> Note that this bug allows some rather subtle misbehavior. The following
> sequence of calls should work just fine, but currently fails:
>     int p[2], orig, ro, rw;
>     char buf[128];
>
>     pipe(p);
>     sprintf(buf, "/proc/self/fd/%d", p[1]);
>     ro = open("/proc/self/fd/$orig", O_RDONLY);
>     close(p[1]);
>     rw = open("/proc/self/fd/$ro", O_RDWR);
>
> The final open() cannot succeed as close(p[1]) caused an integer underflow
> on i_writecount, effectively causing VM_DENYWRITE on the inode. The open
> will fail with -ETXTBUSY.
>
> It's a rather odd sequence of calls and given that open() doesn't use
> alloc_file() (and thus not affected by this bug), it's rather unlikely
> that this is a serious issue. But stuff like anon_inode shares a *single*
> inode across a huge set of interfaces. If any of these is broken like
> pipe(), it will affect all of these (ranging from dma-buf to epoll).
>
> Cc: Al Viro <viro@...iv.linux.org.uk>
> Cc: David Howells <dhowells@...hat.com>
> Cc: Oleg Nesterov <oleg@...hat.com>
> Cc: <stable@...r.kernel.org>
> Signed-off-by: David Herrmann <dh.herrmann@...il.com>
> ---
>  fs/file_table.c | 27 ++++++++++++++++++---------
>  1 file changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 5fff903..e3c8dd0 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -167,6 +167,7 @@ struct file *alloc_file(struct path *path, fmode_t mode,
>                 const struct file_operations *fop)
>  {
>         struct file *file;
> +       int error;
>
>         file = get_empty_filp();
>         if (IS_ERR(file))
> @@ -178,15 +179,23 @@ struct file *alloc_file(struct path *path, fmode_t mode,
>         file->f_mode = mode;
>         file->f_op = fop;
>
> -       /*
> -        * These mounts don't really matter in practice
> -        * for r/o bind mounts.  They aren't userspace-
> -        * visible.  We do this for consistency, and so
> -        * that we can do debugging checks at __fput()
> -        */
> -       if ((mode & FMODE_WRITE) && !special_file(path->dentry->d_inode->i_mode)) {
> -               file_take_write(file);
> -               WARN_ON(mnt_clone_write(path->mnt));
> +       if (mode & FMODE_WRITE) {
> +               error = get_write_access(path->dentry->d_inode);
> +               if (error) {
> +                       put_filp(file);
> +                       return ERR_PTR(error);
> +               }
> +
> +               /*
> +                * These mounts don't really matter in practice
> +                * for r/o bind mounts.  They aren't userspace-
> +                * visible.  We do this for consistency, and so
> +                * that we can do debugging checks at __fput()
> +                */
> +               if (!special_file(path->dentry->d_inode->i_mode)) {
> +                       file_take_write(file);
> +                       WARN_ON(mnt_clone_write(path->mnt));
> +               }
>         }
>         if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
>                 i_readcount_inc(path->dentry->d_inode);
> --
> 1.9.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/