netdev - Re: [PATCH bpf-next] libbpf: Use dynamically allocated buffer when receiving netlink messages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzY=spmQrPX06-hrNMSaH_Sst-WTZiHSpNaCid4+ZNjB3w@mail.gmail.com>
Date:   Fri, 11 Feb 2022 14:14:30 -0800
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Zhiqian Guan <zhguan@...hat.com>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next] libbpf: Use dynamically allocated buffer when
 receiving netlink messages

On Fri, Feb 11, 2022 at 11:51 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> When receiving netlink messages, libbpf was using a statically allocated
> stack buffer of 4k bytes. This happened to work fine on systems with a 4k
> page size, but on systems with larger page sizes it can lead to truncated
> messages. The user-visible impact of this was that libbpf would insist no
> XDP program was attached to some interfaces because that bit of the netlink
> message got chopped off.
>
> Fix this by switching to a dynamically allocated buffer; we borrow the
> approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to get
> the actual size of the pending message before receiving it, adjusting the
> buffer as necessary. While we're at it, also add retries on interrupted
> system calls around the recvmsg() call.
>
> Reported-by: Zhiqian Guan <zhguan@...hat.com>
> Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers")
> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
> ---
>  tools/lib/bpf/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index c39c37f99d5c..9a6e95206bf0 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -87,22 +87,70 @@ enum {
>         NL_DONE,
>  };
>
> +static int __libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, int flags)

let's not use names starting with underscored. Just call it
"netlink_recvmsg" or something like that.

> +{
> +       int len;
> +
> +       do {
> +               len = recvmsg(sock, mhdr, flags);

recvmsg returns ssize_t, is it ok to truncate to int?


> +       } while (len < 0 && (errno == EINTR || errno == EAGAIN));
> +
> +       if (len < 0)
> +               return -errno;
> +       return len;
> +}
> +
> +static int libbpf_netlink_recvmsg(int sock, struct msghdr *mhdr, char **buf)
> +{
> +       struct iovec *iov = mhdr->msg_iov;
> +       void *nbuf;
> +       int len;
> +
> +       len = __libbpf_netlink_recvmsg(sock, mhdr, MSG_PEEK | MSG_TRUNC);
> +       if (len < 0)
> +               return len;
> +
> +       if (len < 4096)
> +               len = 4096;
> +
> +       if (len > iov->iov_len) {
> +               nbuf = realloc(iov->iov_base, len);
> +               if (!nbuf) {
> +                       free(iov->iov_base);
> +                       return -ENOMEM;
> +               }
> +               iov->iov_base = nbuf;

this function both sets iov->iov_base *and* returns buf. It's quite a
convoluted contract. Seems like buf is not necessary (and also NULL
out iov->iov_base in case of error above?). But it might be cleaner to
do this MSG_PEEK  + realloc + recvmsg  in libbpf_netlink_recv()
explicitly. It's only one place.


> +               iov->iov_len = len;
> +       }
> +
> +       len = __libbpf_netlink_recvmsg(sock, mhdr, 0);
> +       if (len > 0)
> +               *buf = iov->iov_base;
> +       return len;
> +}
> +
>  static int libbpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>                                __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
>                                void *cookie)
>  {
> +       struct iovec iov = {};
> +       struct msghdr mhdr = {
> +               .msg_iov = &iov,
> +               .msg_iovlen = 1,
> +       };
>         bool multipart = true;
>         struct nlmsgerr *err;
>         struct nlmsghdr *nh;
> -       char buf[4096];
>         int len, ret;
> +       char *buf;
> +
>
>         while (multipart) {
>  start:
>                 multipart = false;
> -               len = recv(sock, buf, sizeof(buf), 0);
> +               len = libbpf_netlink_recvmsg(sock, &mhdr, &buf);
>                 if (len < 0) {
> -                       ret = -errno;
> +                       ret = len;
>                         goto done;
>                 }
>
> @@ -151,6 +199,7 @@ static int libbpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>         }
>         ret = 0;
>  done:
> +       free(iov.iov_base);

double free on -ENOMEM? And even more confusing why you bother with
buf at all...

>         return ret;
>  }
>
> --
> 2.35.1
>