netdev - Re: [PATCH bpf-next v7] bpf: add new helper get_file_path for mapping a file descriptor to a pathname

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABtjQmb_Pts45VwVtCbw-OoxCCGCNnCupuXPwPggBc0D4F0d2g@mail.gmail.com>
Date:   Fri, 8 Nov 2019 02:02:42 +0800
From:   Wenbo Zhang <ethercflow@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     bpf <bpf@...r.kernel.org>, Yonghong Song <yhs@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v7] bpf: add new helper get_file_path for mapping
 a file descriptor to a pathname

> - fdget_raw is only used inside fs/, so it doesn't look right to skip the layers.

Sorry,  I mistakenly thought that as long as it is non-internel.h it
can be accessed externally.
Would you please give me more details about how to deal with this
situation or is there any
documents explain this? I hope to learn more then fix this.

> - accessing current->fs is not always correct, so the code should somehow
>  check that it's ok to do so, but I'm not sure if (in_irq()) would be enough.

I'll check more about this these days, then determine the solution.

> - some implementations of d_dname do sleep.  For example: dmabuffs_dname.
>  Though it seems to me that it's a bug in that particular FS. But I'd like
>  to hear clear yes from VFS experts that fdget_raw() + d_path() is ok
>  from preempt_disabled section.
> The other alternative is to wait for sleepable and preemptible BPF programs to
> appear. Which is probably a month or so away. Then all these issues will
> disappear.

Sorry for didn't check the whole exist callback functions, gave an
imprecise conclusion.
I will do my best to learn more, I hope I can make better
contributions to bpf in the future.

> The other alternative is to wait for sleepable and preemptible BPF programs to
> appear. Which is probably a month or so away. Then all these issues will
> disappear.

I think wait for sleepable and preemptible BPF programs to appear is a
better way to compatible
with all kinds implementations of d_dname.

Thank you for providing these valuable suggestions and information.

Alexei Starovoitov <alexei.starovoitov@...il.com> 于2019年11月6日周三 上午6:19写道：
>
> On Sun, Nov 03, 2019 at 02:54:17AM -0500, Wenbo Zhang wrote:
> > When people want to identify which file system files are being opened,
> > read, and written to, they can use this helper with file descriptor as
> > input to achieve this goal. Other pseudo filesystems are also supported.
> >
> > This requirement is mainly discussed here:
> >
> >   https://github.com/iovisor/bcc/issues/237
> >
> > v6->v7:
> > - fix missing signed-off-by line
> >
> > v5->v6: addressed Andrii's feedback
> > - avoid unnecessary goto end by having two explicit returns
> >
> > v4->v5: addressed Andrii and Daniel's feedback
> > - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> > helper's names
> > - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> > - remove fdput from fdget_raw's error path
> > - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> > into the buffer or an error code if the path was too long
> > - modify the normal path's return value to return copied string length
> > including NUL
> > - update this helper description's Return bits.
> >
> > v3->v4: addressed Daniel's feedback
> > - fix missing fdput()
> > - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> > - move fd2path's test code to another patch
> > - add comment to explain why use fdget_raw instead of fdget
> >
> > v2->v3: addressed Yonghong's feedback
> > - remove unnecessary LOCKDOWN_BPF_READ
> > - refactor error handling section for enhanced readability
> > - provide a test case in tools/testing/selftests/bpf
> >
> > v1->v2: addressed Daniel's feedback
> > - fix backward compatibility
> > - add this helper description
> > - fix signed-off name
> >
> > Signed-off-by: Wenbo Zhang <ethercflow@...il.com>
> > ---
> >  include/uapi/linux/bpf.h       | 15 ++++++++++-
> >  kernel/trace/bpf_trace.c       | 48 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 15 ++++++++++-
> >  3 files changed, 76 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index a6bf19dabaab..d618a914c6fe 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2777,6 +2777,18 @@ union bpf_attr {
> >   *           restricted to raw_tracepoint bpf programs.
> >   *   Return
> >   *           0 on success, or a negative error in case of failure.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *   Description
> > + *           Get **file** atrribute from the current task by *fd*, then call
> > + *           **d_path** to get it's absolute path and copy it as string into
> > + *           *path* of *size*. The **path** also support pseudo filesystems
> > + *           (whether or not it can be mounted). The *size* must be strictly
> > + *           positive. On success, the helper makes sure that the *path* is
> > + *           NUL-terminated. On failure, it is filled with zeroes.
> > + *   Return
> > + *           On success, returns the length of the copied string INCLUDING
> > + *           the trailing NUL, or a negative error in case of failure.
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)                \
> >       FN(unspec),                     \
> > @@ -2890,7 +2902,8 @@ union bpf_attr {
> >       FN(sk_storage_delete),          \
> >       FN(send_signal),                \
> >       FN(tcp_gen_syncookie),          \
> > -     FN(skb_output),
> > +     FN(skb_output),                 \
> > +     FN(get_file_path),
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index f50bf19f7a05..41be1c5989af 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -683,6 +683,52 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
> >       .arg1_type      = ARG_ANYTHING,
> >  };
> >
> > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> > +{
> > +     struct fd f;
> > +     char *p;
> > +     int ret = -EBADF;
> > +
> > +     /* Use fdget_raw instead of fdget to support O_PATH, and
> > +      * fdget_raw doesn't have any sleepable code, so it's ok
> > +      * to be here.
> > +      */
> > +     f = fdget_raw(fd);
> > +     if (!f.file)
> > +             goto error;
> > +
> > +     /* d_path doesn't have any sleepable code, so it's ok to
> > +      * be here. But it uses the current macro to get fs_struct
> > +      * (current->fs). So this helper shouldn't be called in
> > +      * interrupt context.
> > +      */
> > +     p = d_path(&f.file->f_path, dst, size);
> > +     if (IS_ERR(p)) {
> > +             ret = PTR_ERR(p);
> > +             fdput(f);
> > +             goto error;
> > +     }
>
> This is definitely very useful helper that bpf tracing community has
> been asking for long time, but I have few concerns with implementation:
> - fdget_raw is only used inside fs/, so it doesn't look right to skip the layers.
> - accessing current->fs is not always correct, so the code should somehow
>   check that it's ok to do so, but I'm not sure if (in_irq()) would be enough.
> - some implementations of d_dname do sleep.  For example: dmabuffs_dname.
>   Though it seems to me that it's a bug in that particular FS. But I'd like
>   to hear clear yes from VFS experts that fdget_raw() + d_path() is ok
>   from preempt_disabled section.
>
> The other alternative is to wait for sleepable and preemptible BPF programs to
> appear. Which is probably a month or so away. Then all these issues will
> disappear.
>