[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1709190339020.15590@file01.intranet.prod.int.rdu2.redhat.com>
Date: Tue, 19 Sep 2017 03:53:27 -0400 (EDT)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Joe Lawrence <joe.lawrence@...hat.com>
cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
"Luis R. Rodriguez" <mcgrof@...nel.org>,
Kees Cook <keescook@...omium.org>,
Michael Kerrisk <mtk.manpages@...il.com>,
Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a
mutex
On Fri, 15 Sep 2017, Joe Lawrence wrote:
> On 09/14/2017 07:09 PM, Mikulas Patocka wrote:
> > On Tue, 5 Sep 2017, Joe Lawrence wrote:
> >> pipe_max_size is assigned directly via procfs sysctl:
> >>
> >> static struct ctl_table fs_table[] = {
> >> ...
> >> {
> >> .procname = "pipe-max-size",
> >> .data = &pipe_max_size,
> >> .maxlen = sizeof(int),
> >> .mode = 0644,
> >> .proc_handler = &pipe_proc_fn,
> >> .extra1 = &pipe_min_size,
> >> },
> >> ...
> >>
> >> int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
> >> size_t *lenp, loff_t *ppos)
> >> {
> >> ...
> >> ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
> >> ...
> >>
> >> and then later rounded in-place a few statements later:
> >>
> >> ...
> >> pipe_max_size = round_pipe_size(pipe_max_size);
> >> ...
> >>
> >> This leaves a window of time between initial assignment and rounding
> >> that may be visible to other threads. (For example, one thread sets a
> >> non-rounded value to pipe_max_size while another reads its value.)
> >>
> >> Similar reads of pipe_max_size are potentially racey:
> >>
> >> pipe.c :: alloc_pipe_info()
> >> pipe.c :: pipe_set_size()
> >>
> >> Protect them and the procfs sysctl assignment with a mutex.
> >>
> >> Reported-by: Mikulas Patocka <mpatocka@...hat.com>
> >> Signed-off-by: Joe Lawrence <joe.lawrence@...hat.com>
> >> ---
> >> fs/pipe.c | 28 ++++++++++++++++++++++++----
> >> 1 file changed, 24 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/pipe.c b/fs/pipe.c
> >> index fa28910b3c59..33bb11b0d78e 100644
> >> --- a/fs/pipe.c
> >> +++ b/fs/pipe.c
> >> @@ -35,6 +35,11 @@
> >> unsigned int pipe_max_size = 1048576;
> >>
> >> /*
> >> + * Provide mutual exclusion around access to pipe_max_size
> >> + */
> >> +static DEFINE_MUTEX(pipe_max_mutex);
> >> +
> >> +/*
> >> * Minimum pipe size, as required by POSIX
> >> */
> >> unsigned int pipe_min_size = PAGE_SIZE;
> >> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
> >> unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
> >> struct user_struct *user = get_current_user();
> >> unsigned long user_bufs;
> >> + unsigned int max_size;
> >>
> >> pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
> >> if (pipe == NULL)
> >> goto out_free_uid;
> >>
> >> - if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> >> - pipe_bufs = pipe_max_size >> PAGE_SHIFT;
> >> + mutex_lock(&pipe_max_mutex);
> >> + max_size = pipe_max_size;
> >> + mutex_unlock(&pipe_max_mutex);
> >> +
> >> + if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
> >> + pipe_bufs = max_size >> PAGE_SHIFT;
> >>
> >> user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
> >>
> >> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
> >> struct pipe_buffer *bufs;
> >> unsigned int size, nr_pages;
> >> unsigned long user_bufs;
> >> + unsigned int max_size;
> >> long ret = 0;
> >>
> >> size = round_pipe_size(arg);
> >> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
> >> * Decreasing the pipe capacity is always permitted, even
> >> * if the user is currently over a limit.
> >> */
> >> + mutex_lock(&pipe_max_mutex);
> >> + max_size = pipe_max_size;
> >> + mutex_unlock(&pipe_max_mutex);
> >> if (nr_pages > pipe->buffers &&
> >> - size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> >> + size > max_size && !capable(CAP_SYS_RESOURCE))
> >> return -EPERM;
> >>
> >> user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
> >> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
> >> unsigned int rounded_pipe_max_size;
> >> int ret;
> >>
> >> + mutex_lock(&pipe_max_mutex);
> >> orig_pipe_max_size = pipe_max_size;
> >> ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
> >> - if (ret < 0 || !write)
> >> + if (ret < 0 || !write) {
> >> + mutex_unlock(&pipe_max_mutex);
> >> return ret;
> >> + }
> >>
> >> rounded_pipe_max_size = round_pipe_size(pipe_max_size);
> >> if (rounded_pipe_max_size == 0) {
> >> pipe_max_size = orig_pipe_max_size;
> >> + mutex_unlock(&pipe_max_mutex);
> >> return -EINVAL;
> >> }
> >>
> >> pipe_max_size = rounded_pipe_max_size;
> >> + mutex_unlock(&pipe_max_mutex);
> >> +
> >> return ret;
> >> }
> >>
> >> --
> >> 1.8.3.1
> >>
> > I think this mutex is too heavy - if multiple processes simultaneously
> > create a pipe, the mutex would cause performance degradation.
> >
> > You can call do_proc_dointvec with a custom callback "conv" function that
> > does the rounding of the pipe size value.
> >
> > Mikulas
> >
>
> Hi Mikulas,
>
> I'm not strong when it comes to memory barriers, but one of the
> side-effects of using the mutex is that pipe_set_size() and
> alloc_pipe_info() should have a consistent view of pipe_max_size.
>
> If I remove the mutex (and assume that I implement a custom
> do_proc_dointvec "conv" callback), is it safe for these routines to
> directly use pipe_max_size as they had done before?
>
> If not, is it safe to alias through a temporary stack variable (ie,
> could the compiler re-read pipe_max_size multiple times in the function)?
>
> Would READ_ONCE() help in any way?
Theoretically re-reading the variable is possible and you should use
ACCESS_ONCE or READ_ONCE+WRITE_ONCE on that variable.
In practice, ACCESS_ONCE/READ_ONCE/WRITE_ONCE is missing at a lot of
kernel variables that could be modified asynchronously and no one is
complaining about it and no one is making any systematic effort to fix it.
That re-reading happens (I have some test code that makes the gcc
optimizer re-read a variable), but it happens very rarely.
Another theoretical problem is that when reading or writing a variable
without ACCESS_ONCE, the compiler could read and write the variable using
multiple smaller memory accesses. But in practice, it happens only on some
non-common architectures.
> The mutex covered up some confusion on my part here.
>
> OTOH, since pipe_max_size is read-only for pipe_set_size() and
> alloc_pipe_info() and only updated occasionally by pipe_proc_fn(), would
> rw_semaphore or RCU be a fit here?
RW semaphore causes cache-line ping-pong between CPUs, it slows down the
kernel just like a normal spinlock or mutex.
RCU would be useless here (i.e. you don't want to allocate memory and
atomically assign it with rcu_assign_pointer).
> Regards,
>
> -- Joe
Mikulas
Powered by blists - more mailing lists