[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <562A9391.1040806@redhat.com>
Date: Fri, 23 Oct 2015 22:07:45 +0200
From: Florian Weimer <fweimer@...hat.com>
To: GNU C Library <libc-alpha@...rceware.org>
Cc: Hannes Sowa <hannes@...hat.com>, netdev@...r.kernel.org
Subject: [PATCH] glibc: Terminate process on invalid netlink response from
kernel [BZ #12926]
This patch revisits this glibc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=12926
For some reason, this particular code path is very good at picking up
file descriptors which have been reused in correctly. This happens if
other threads have a race, close the wrong file descriptor (the one used
in the glibc netlink code), and reopen another one in its place.
The netlink requests we send to the kernel are:
struct req
{
struct nlmsghdr nlh;
struct rtgenmsg g;
/* struct rtgenmsg consists of a single byte. This means there
are three bytes of padding included in the REQ definition.
We make them explicit here. */
char pad[3];
} req;
req.nlh.nlmsg_len = sizeof (req);
req.nlh.nlmsg_type = RTM_GETADDR;
req.nlh.nlmsg_flags = NLM_F_ROOT | NLM_F_MATCH | NLM_F_REQUEST;
req.nlh.nlmsg_pid = 0;
req.nlh.nlmsg_seq = time (NULL);
req.g.rtgen_family = AF_UNSPEC;
req.nlh.nlmsg_len = sizeof (req);
req.nlh.nlmsg_type = RTM_GETLINK;
req.nlh.nlmsg_flags = NLM_F_ROOT | NLM_F_MATCH | NLM_F_REQUEST;
req.nlh.nlmsg_pid = 0;
req.nlh.nlmsg_seq = time (NULL);
req.g.rtgen_family = AF_UNSPEC;
I discussed this with Hannes and he thinks that a zero-length reply (as
received by recvmsg) is impossible at this point, for these specific
types of netlink requests. The new assert triggers for zero-length
replies, but also for replies less than sizeof (struct nlmsghdr) bytes
long, and for unexpected errors (EBADF, ENOTSOCK, ENOTCONN,
ECONNREFUSED, and EAGAIN on a non-blocking sockets—ours are all blocking).
This is purely a defense against silent data corruption and bug reports
incorrectly blaming glibc (or the wrong part of glibc at least). I
added it to all three copies of the netlink code in glibc.
The glibc netlink code is still broken: It does not time out and retry
(needed in case the request gets lots), does not handle NLM_F_DUMP_INTR,
and does not deal with NLMSG_ERROR and ENOBUFS. But these are separate
issues. SOCK_CLOEXEC is not used, either. If we fix those issues, the
assert would remain in place, except for the EAGAIN part.
(By the way, we'd also love to have a better kernel interface to fulfill
the needs for getaddrinfo address sorting. The netlink requests we
currently use are much too slow if the host has many addresses configured.)
I have tested that basic getaddrinfo operations still work after the
patch, but glibc testsuite coverage in this area is very limited, and I
have yet to do full-system testing with this patch.
Florian
View attachment "0001-Terminate-process-on-invalid-netlink-response-from-k.patch" of type "text/x-patch" (8533 bytes)
Powered by blists - more mailing lists