[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJwJo6ax-Ltpa2xY7J7VxjDkUq_5NJqYx_g+yNn9yfrNHfWeYA@mail.gmail.com>
Date: Sat, 16 Nov 2024 00:48:17 +0000
From: Dmitry Safonov <0x7f454c46@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@...nel.org>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, David Ahern <dsahern@...nel.org>,
Ivan Delalande <colona@...sta.com>, Matthieu Baerts <matttbe@...nel.org>,
Mat Martineau <martineau@...nel.org>, Geliang Tang <geliang@...nel.org>,
John Fastabend <john.fastabend@...il.com>, Davide Caratti <dcaratti@...hat.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
mptcp@...ts.linux.dev, Johannes Berg <johannes@...solutions.net>
Subject: Re: [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken
Hi Jakub,
On Sat, 16 Nov 2024 at 00:08, Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Wed, 13 Nov 2024 18:46:39 +0000 Dmitry Safonov via B4 Relay wrote:
> > 2. Inet-diag allocates netlink message for sockets in
> > inet_diag_dump_one_icsk(), which uses a TCP-diag callback
> > .idiag_get_aux_size(), that pre-calculates the needed space for
> > TCP-diag related information. But as neither socket lock nor
> > rcu_readlock() are held between allocation and the actual TCP
> > info filling, the TCP-related space requirement may change before
> > reaching tcp_diag_put_md5sig(). I.e., the number of TCP-MD5 keys on
> > a socket. Thankfully, TCP-MD5-diag won't overwrite the skb, but will
> > return EMSGSIZE, triggering WARN_ON() in inet_diag_dump_one_icsk().
>
> Would it be too ugly if we simply retried with a 32kB skb if the initial
> dump failed with EMSGSIZE?
Yeah, I'm not sure. I thought of keeping it simple and just marking
the nlmsg "inconsistent". This is arguably a change of meaning for
NLM_F_DUMP_INTR because previously, it meant that the multi-message
dump became inconsistent between recvmsg() calls. And now, it is also
utilized in the "do" version if it raced with the socket setsockopts()
in another thread.
> Another option would be to automatically grow the skb. The size
> accounting is an endless source of bugs. We'd just need to scan
> the codebase to make sure there are no cases where someone does
>
> ptr = __nla_reserve();
> nla_put();
> *ptr = 0;
>
> Which may be too much of a project and source of bugs in itself.
This seems quite more complex than just marking the dump inconsistent
and letting the userspace deal with the result or retry if it wants
precise key information.
> Or do both, retry as a fix, and auto-grow in net-next.
>
> > In order to remove the new limit from (4) solution, my plan is to
> > convert the dump of TCP-MD5 keys from an array to
> > NL_ATTR_TYPE_NESTED_ARRAY (or alike), which should also address (1).
> > And for (3), it's needed to teach tcp-diag how-to remember not only
> > socket on which previous recvmsg() stopped, but potentially TCP-MD5
> > key as well.
>
> Just putting the same attribute type multiple times is preferable
> to array types.
Cool. I didn't know that. I think I was confused by iproute way of
parsing [which I read very briefly, so might have misunderstood]:
: while (RTA_OK(rta, len)) {
: type = rta->rta_type & ~flags;
: if ((type <= max) && (!tb[type]))
: tb[type] = rta;
: rta = RTA_NEXT(rta, len);
: }
https://github.com/iproute2/iproute2/blob/main/lib/libnetlink.c#L1526
which seems like it will just ignore duplicate attributes.
That doesn't mean iproute has to dictate new code in kernel, for sure.
Thanks,
Dmitry
Powered by blists - more mailing lists