[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW52eTurJ4pPAgZtv0giw2C+7r6aMacZXx8XkwUxBGARAQ@mail.gmail.com>
Date: Mon, 21 Dec 2020 14:22:41 -0800
From: Song Liu <song@...nel.org>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH bpf-next 1/2] bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt
On Thu, Dec 17, 2020 at 9:24 AM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> When we attach a bpf program to cgroup/getsockopt any other getsockopt()
> syscall starts incurring kzalloc/kfree cost. While, in general, it's
> not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE.
> TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement
> fastpath for incoming TCP, we don't want to have extra allocations in
> there.
>
> Let add a small buffer on the stack and use it for small (majority)
> {s,g}etsockopt values. I've started with 128 bytes to cover
> the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes
> currently, with some planned extension to 64 + some headroom
> for the future).
I don't really know the rule of thumb, but 128 bytes on stack feels too big to
me. I would like to hear others' opinions on this. Can we solve the problem
with some other mechanisms, e.g. a mempool?
[...]
>
> +static void *sockopt_export_buf(struct bpf_sockopt_kern *ctx)
> +{
> + void *p;
> +
> + if (ctx->optval != ctx->buf)
> + return ctx->optval;
> +
> + /* We've used bpf_sockopt_kern->buf as an intermediary storage,
> + * but the BPF program indicates that we need to pass this
> + * data to the kernel setsockopt handler. No way to export
> + * on-stack buf, have to allocate a new buffer. The caller
> + * is responsible for the kfree().
> + */
> + p = kzalloc(ctx->optlen, GFP_USER);
> + if (!p)
> + return ERR_PTR(-ENOMEM);
> + memcpy(p, ctx->optval, ctx->optlen);
> + return p;
> +}
> +
> int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
> int *optname, char __user *optval,
> int *optlen, char **kernel_optval)
> @@ -1389,8 +1420,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
> * use original userspace data.
> */
> if (ctx.optlen != 0) {
> - *optlen = ctx.optlen;
> - *kernel_optval = ctx.optval;
> + void *buf = sockopt_export_buf(&ctx);
I found it is hard to follow the logic here (when to allocate memory, how to
fail over, etc.). Do we have plan to reuse sockopt_export_buf()? If not, it is
probably cleaner to put the logic in __cgroup_bpf_run_filter_setsockopt()?
Thanks,
Song
[...]
Powered by blists - more mailing lists