netdev - Re: [PATCH bpf-next v9] bpf: add new helper get_file_path for mapping a file descriptor to a pathname

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABtjQmao44Cy=_FdujwqzHcpX58_cgSgxQrWv2sYxKQjT=5K7Q@mail.gmail.com>
Date:   Fri, 15 Nov 2019 20:52:54 +0800
From:   Wenbo Zhang <ethercflow@...il.com>
To:     bpf <bpf@...r.kernel.org>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Yonghong Song <yhs@...com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v9] bpf: add new helper get_file_path for mapping
 a file descriptor to a pathname

>From Alexei's latest comment:

>  This is definitely very useful helper that bpf tracing community has
>  been asking for long time, but I have few concerns with implementation:
> - fdget_raw is only used inside fs/, so it doesn't look right to skip the layers.
> - accessing current->fs is not always correct, so the code should somehow
>   check that it's ok to do so, but I'm not sure if (in_irq()) would be enough.
> - some implementations of d_dname do sleep.  For example: dmabuffs_dname.
>   Though it seems to me that it's a bug in that particular FS. But I'd like
>   to hear clear yes from VFS experts that fdget_raw() + d_path() is ok
>   from preempt_disabled section.

> The other alternative is to wait for sleepable and preemptible BPF programs to
> appear. Which is probably a month or so away. Then all these issues will
> disappear.

After consulting Yonghong, I made this patch to try to solve these issues:

> - fdget_raw is only used inside fs/, so it doesn't look right to skip the layers.
As Yonghong's suggest, I use fget_raw instead as fget_raw has been
used outside fs/
(kernel/cgroup/cgroup.c, net/core/scm.c)

> - accessing current->fs is not always correct, so the code should somehow
>   check that it's ok to do so, but I'm not sure if (in_irq()) would be enough.

In addition to checking irq, we also did the following checks

if (unlikely(in_interrupt() ||
                     current->flags & (PF_KTHREAD | PF_EXITING)))
                return -EPERM;

Similar to bpf_probe_write_user

> - some implementations of d_dname do sleep.  For example: dmabuffs_dname.
>   Though it seems to me that it's a bug in that particular FS. But I'd like
>   to hear clear yes from VFS experts that fdget_raw() + d_path() is ok
>   from preempt_disabled section.

For unmountable pseudo filesystem, it seems to have no meaning to get their fake
paths as they don't have path, and to be no way to validate this
function pointer can
be always safe to call in the current context.

So I add judge before d_path

        if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname)
                return -EINVAL;

On the other hand, if some people's use case consider unmountable
pseudo files, I'll
remove this judge and wait for sleepable and preemptible BPF programs

Other than these,  I added a more detailed description for this
helper，about the return buffer's
content and error code.

 * int bpf_get_file_path(char *path, u32 size, int fd)
 *      Description
 *              Get **file** atrribute from the current task by *fd*, then call
 *              **d_path** to get it's absolute path and copy it as string into
 *              *path* of *size*. Notice the **path** don't support unmountable
 *              pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
 *              The *size* must be strictly positive. On success, the helper
 *              makes sure that the *path* is NUL-terminated, and the buffer
 *              could be:
 *              - a regular full path (include mountable fs eg: /proc, /sys)
 *              - a regular full path with "(deleted)" at the end.
 *              On failure, it is filled with zeroes.
 *      Return
 *              On success, returns the length of the copied string INCLUDING
 *              the trailing NUL.
 *
 *              On failure, the returned value is one of the following:
 *
 *              **-EPERM** if no permission to get the path (eg: in irq ctx).
 *
 *              **-EBADF** if *fd* is invalid.
 *
 *              **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
 *
 *              **-ENAMETOOLONG** if full path is longer than *size*

Thank you.

Wenbo Zhang <ethercflow@...il.com> 于2019年11月15日周五 下午8:48写道：
>
> When people want to identify which file system files are being opened,
> read, and written to, they can use this helper with file descriptor as
> input to achieve this goal. Other pseudo filesystems are also supported.
>
> This requirement is mainly discussed here:
>
>   https://github.com/iovisor/bcc/issues/237
>
> v8->v9:
> - format helper description
>
> v7->v8: addressed Alexei's feedback
> - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> - ensure we're in user context which is safe fot the help to run
> - filter unmountable pseudo filesystem, because they don't have real path
> - supplement the description of this helper function
>
> v6->v7:
> - fix missing signed-off-by line
>
> v5->v6: addressed Andrii's feedback
> - avoid unnecessary goto end by having two explicit returns
>
> v4->v5: addressed Andrii and Daniel's feedback
> - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> helper's names
> - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> - remove fdput from fdget_raw's error path
> - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> into the buffer or an error code if the path was too long
> - modify the normal path's return value to return copied string length
> including NUL
> - update this helper description's Return bits.
>
> v3->v4: addressed Daniel's feedback
> - fix missing fdput()
> - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> - move fd2path's test code to another patch
> - add comment to explain why use fdget_raw instead of fdget
>
> v2->v3: addressed Yonghong's feedback
> - remove unnecessary LOCKDOWN_BPF_READ
> - refactor error handling section for enhanced readability
> - provide a test case in tools/testing/selftests/bpf
>
> v1->v2: addressed Daniel's feedback
> - fix backward compatibility
> - add this helper description
> - fix signed-off name
>
> Signed-off-by: Wenbo Zhang <ethercflow@...il.com>
> ---
>  include/uapi/linux/bpf.h       | 29 +++++++++++++++-
>  kernel/trace/bpf_trace.c       | 63 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 28 ++++++++++++++-
>  3 files changed, 118 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index df6809a76404..71832cd3729c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2815,6 +2815,32 @@ union bpf_attr {
>   *     Return
>   *             On success, the strictly positive length of the string, including
>   *             the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *     Description
> + *             Get **file** atrribute from the current task by *fd*, then call
> + *             **d_path** to get it's absolute path and copy it as string into
> + *             *path* of *size*. Notice the **path** don't support unmountable
> + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *             The *size* must be strictly positive. On success, the helper
> + *             makes sure that the *path* is NUL-terminated, and the buffer
> + *             could be:
> + *             - a regular full path (include mountable fs eg: /proc, /sys)
> + *             - a regular full path with "(deleted)" at the end.
> + *             On failure, it is filled with zeroes.
> + *     Return
> + *             On success, returns the length of the copied string INCLUDING
> + *             the trailing NUL.
> + *
> + *             On failure, the returned value is one of the following:
> + *
> + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *             **-EBADF** if *fd* is invalid.
> + *
> + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *             **-ENAMETOOLONG** if full path is longer than *size*
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -2932,7 +2958,8 @@ union bpf_attr {
>         FN(probe_read_user),            \
>         FN(probe_read_kernel),          \
>         FN(probe_read_user_str),        \
> -       FN(probe_read_kernel_str),
> +       FN(probe_read_kernel_str),      \
> +       FN(get_file_path),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ffc91d4935ac..c77e55418f1e 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -762,6 +762,67 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
>         .arg1_type      = ARG_ANYTHING,
>  };
>
> +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> +{
> +       struct file *f;
> +       char *p;
> +       int ret = -EBADF;
> +
> +       /* Ensure we're in user context which is safe for the helper to
> +        * run. This helper has no business in a kthread.
> +        */
> +       if (unlikely(in_interrupt() ||
> +                    current->flags & (PF_KTHREAD | PF_EXITING)))
> +               return -EPERM;
> +
> +       /* Use fget_raw instead of fget to support O_PATH, and it doesn't
> +        * have any sleepable code, so it's ok to be here.
> +        */
> +       f = fget_raw(fd);
> +       if (!f)
> +               goto error;
> +
> +       /* For unmountable pseudo filesystem, it seems to have no meaning
> +        * to get their fake paths as they don't have path, and to be no
> +        * way to validate this function pointer can be always safe to call
> +        * in the current context.
> +        */
> +       if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname)
> +               return -EINVAL;
> +
> +       /* After filter unmountable pseudo filesytem, d_path won't call
> +        * dentry->d_op->d_name(), the normally path doesn't have any
> +        * sleepable code, and despite it uses the current macro to get
> +        * fs_struct (current->fs), we've already ensured we're in user
> +        * context, so it's ok to be here.
> +        */
> +       p = d_path(&f->f_path, dst, size);
> +       if (IS_ERR(p)) {
> +               ret = PTR_ERR(p);
> +               fput(f);
> +               goto error;
> +       }
> +
> +       ret = strlen(p);
> +       memmove(dst, p, ret);
> +       dst[ret++] = '\0';
> +       fput(f);
> +       return ret;
> +
> +error:
> +       memset(dst, '0', size);
> +       return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_get_file_path_proto = {
> +       .func       = bpf_get_file_path,
> +       .gpl_only   = true,
> +       .ret_type   = RET_INTEGER,
> +       .arg1_type  = ARG_PTR_TO_UNINIT_MEM,
> +       .arg2_type  = ARG_CONST_SIZE,
> +       .arg3_type  = ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -822,6 +883,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  #endif
>         case BPF_FUNC_send_signal:
>                 return &bpf_send_signal_proto;
> +       case BPF_FUNC_get_file_path:
> +               return &bpf_get_file_path_proto;
>         default:
>                 return NULL;
>         }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index df6809a76404..afe98857fa04 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2815,6 +2815,31 @@ union bpf_attr {
>   *     Return
>   *             On success, the strictly positive length of the string, including
>   *             the trailing NUL character. On error, a negative value.
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *     Description
> + *             Get **file** atrribute from the current task by *fd*, then call
> + *             **d_path** to get it's absolute path and copy it as string into
> + *             *path* of *size*. Notice the **path** don't support unmountable
> + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *             The *size* must be strictly positive. On success, the helper
> + *             makes sure that the *path* is NUL-terminated, and the buffer
> + *             could be:
> + *             - a regular full path (include mountable fs eg: /proc, /sys)
> + *             - a regular full path with "(deleted)" at the end.
> + *             On failure, it is filled with zeroes.
> + *     Return
> + *             On success, returns the length of the copied string INCLUDING
> + *             the trailing NUL.
> + *
> + *             On failure, the returned value is one of the following:
> + *
> + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *             **-EBADF** if *fd* is invalid.
> + *
> + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *             **-ENAMETOOLONG** if full path is longer than *size*
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -2932,7 +2957,8 @@ union bpf_attr {
>         FN(probe_read_user),            \
>         FN(probe_read_kernel),          \
>         FN(probe_read_user_str),        \
> -       FN(probe_read_kernel_str),
> +       FN(probe_read_kernel_str),      \
> +       FN(get_file_path),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> --
> 2.17.1
>