netdev - Re: [PATCH bpf-next 1/2] bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPhsuW52eTurJ4pPAgZtv0giw2C+7r6aMacZXx8XkwUxBGARAQ@mail.gmail.com>
Date:   Mon, 21 Dec 2020 14:22:41 -0800
From:   Song Liu <song@...nel.org>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH bpf-next 1/2] bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt

On Thu, Dec 17, 2020 at 9:24 AM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> When we attach a bpf program to cgroup/getsockopt any other getsockopt()
> syscall starts incurring kzalloc/kfree cost. While, in general, it's
> not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE.
> TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement
> fastpath for incoming TCP, we don't want to have extra allocations in
> there.
>
> Let add a small buffer on the stack and use it for small (majority)
> {s,g}etsockopt values. I've started with 128 bytes to cover
> the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes
> currently, with some planned extension to 64 + some headroom
> for the future).

I don't really know the rule of thumb, but 128 bytes on stack feels too big to
me. I would like to hear others' opinions on this. Can we solve the problem
with some other mechanisms, e.g. a mempool?

[...]

>
> +static void *sockopt_export_buf(struct bpf_sockopt_kern *ctx)
> +{
> +       void *p;
> +
> +       if (ctx->optval != ctx->buf)
> +               return ctx->optval;
> +
> +       /* We've used bpf_sockopt_kern->buf as an intermediary storage,
> +        * but the BPF program indicates that we need to pass this
> +        * data to the kernel setsockopt handler. No way to export
> +        * on-stack buf, have to allocate a new buffer. The caller
> +        * is responsible for the kfree().
> +        */
> +       p = kzalloc(ctx->optlen, GFP_USER);
> +       if (!p)
> +               return ERR_PTR(-ENOMEM);
> +       memcpy(p, ctx->optval, ctx->optlen);
> +       return p;
> +}
> +
>  int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
>                                        int *optname, char __user *optval,
>                                        int *optlen, char **kernel_optval)
> @@ -1389,8 +1420,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
>                  * use original userspace data.
>                  */
>                 if (ctx.optlen != 0) {
> -                       *optlen = ctx.optlen;
> -                       *kernel_optval = ctx.optval;
> +                       void *buf = sockopt_export_buf(&ctx);

I found it is hard to follow the logic here (when to allocate memory, how to
fail over, etc.). Do we have plan to reuse sockopt_export_buf()? If not, it is
probably cleaner to put the logic in __cgroup_bpf_run_filter_setsockopt()?

Thanks,
Song

[...]