[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241115175838.4dec771a@kernel.org>
Date: Fri, 15 Nov 2024 17:58:38 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Dmitry Safonov <0x7f454c46@...il.com>
Cc: Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@...nel.org>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
 <horms@...nel.org>, David Ahern <dsahern@...nel.org>, Ivan Delalande
 <colona@...sta.com>, Matthieu Baerts <matttbe@...nel.org>, Mat Martineau
 <martineau@...nel.org>, Geliang Tang <geliang@...nel.org>, John Fastabend
 <john.fastabend@...il.com>, Davide Caratti <dcaratti@...hat.com>, Kuniyuki
 Iwashima <kuniyu@...zon.com>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, mptcp@...ts.linux.dev, Johannes Berg
 <johannes@...solutions.net>
Subject: Re: [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken
On Sat, 16 Nov 2024 00:48:17 +0000 Dmitry Safonov wrote:
> On Sat, 16 Nov 2024 at 00:08, Jakub Kicinski <kuba@...nel.org> wrote:
> > On Wed, 13 Nov 2024 18:46:39 +0000 Dmitry Safonov via B4 Relay wrote:  
> > > 2. Inet-diag allocates netlink message for sockets in
> > >    inet_diag_dump_one_icsk(), which uses a TCP-diag callback
> > >    .idiag_get_aux_size(), that pre-calculates the needed space for
> > >    TCP-diag related information. But as neither socket lock nor
> > >    rcu_readlock() are held between allocation and the actual TCP
> > >    info filling, the TCP-related space requirement may change before
> > >    reaching tcp_diag_put_md5sig(). I.e., the number of TCP-MD5 keys on
> > >    a socket. Thankfully, TCP-MD5-diag won't overwrite the skb, but will
> > >    return EMSGSIZE, triggering WARN_ON() in inet_diag_dump_one_icsk().  
> >
> > Would it be too ugly if we simply retried with a 32kB skb if the initial
> > dump failed with EMSGSIZE?  
> 
> Yeah, I'm not sure. I thought of keeping it simple and just marking
> the nlmsg "inconsistent". This is arguably a change of meaning for
> NLM_F_DUMP_INTR because previously, it meant that the multi-message
> dump became inconsistent between recvmsg() calls. And now, it is also
> utilized in the "do" version if it raced with the socket setsockopts()
> in another thread.
NLM_F_DUMP_INTR is an interesting idea, but exactly as you say NLM_F_DUMP_INTR
was a workaround for consistency of the dump as a whole. Single message
we can re-generate quite easily in the kernel, so forcing the user to
handle INTR and retry seems unnecessarily cruel ;)
> > > In order to remove the new limit from (4) solution, my plan is to
> > > convert the dump of TCP-MD5 keys from an array to
> > > NL_ATTR_TYPE_NESTED_ARRAY (or alike), which should also address (1).
> > > And for (3), it's needed to teach tcp-diag how-to remember not only
> > > socket on which previous recvmsg() stopped, but potentially TCP-MD5
> > > key as well.  
> >
> > Just putting the same attribute type multiple times is preferable
> > to array types.  
> 
> Cool. I didn't know that. I think I was confused by iproute way of
> parsing [which I read very briefly, so might have misunderstood]:
> : while (RTA_OK(rta, len)) {
> :         type = rta->rta_type & ~flags;
> :         if ((type <= max) && (!tb[type]))
> :                 tb[type] = rta;
> :         rta = RTA_NEXT(rta, len);
> : }
> https://github.com/iproute2/iproute2/blob/main/lib/libnetlink.c#L1526
> 
> which seems like it will just ignore duplicate attributes.
> 
> That doesn't mean iproute has to dictate new code in kernel, for sure.
Right, the table based parsing doesn't work well with multi-attr,
but other table formats aren't fundamentally better. Or at least
I never came up with a good way of solving this. And the multi-attr
at least doesn't suffer from the u16 problem.
Powered by blists - more mailing lists
 
