[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4e0e0eb18036401e942651c86a956a41@AcuMS.aculab.com>
Date: Mon, 3 Feb 2020 12:12:03 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Dumazet' <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: RE: Freeing 'temporary' IPv4 route table entries.
From: Eric Dumazet
> Sent: 31 January 2020 15:54
> On 1/31/20 2:26 AM, David Laight wrote:
> > If I call sendmsg() on a raw socket (or probably
> > an unconnected UDP one) rt_dst_alloc() is called
> > in the bowels of ip_route_output_flow() to hold
> > the remote address.
> >
> > Much later __dev_queue_xmit() calls dst_release()
> > to delete the 'dst' referenced from the skb.
> >
> > Prior to f8864972 it did just that.
> > Afterwards the actual delete is 'laundered' through the
> > rcu callbacks.
> > This is probably ok for dst that are actually attached
> > to sockets or tunnels (which aren't freed very often).
> > But it leads to horrid long rcu callback sequences
> > when a lot of messages are sent.
> > (A sample of 1 gave nearly 100 deletes in one go.)
> > There is also the additional cost of deferring the free
> > (and the extra retpoline etc).
> >
> > ISTM that the dst_alloc() done during a send should
> > set a flag so that the 'dst' can be immediately
> > freed since it is known that no one can be picking up
> > a reference as it is being freed.
> >
> > Thoughts?
> >
>
> I thought these routes were cached in per-cpu caches.
>
> At least for UDP I do not see rcu callbacks being queueed.
I've done a bit more investigation.
For raw_ip sockets with inet->hdrincl set (ie the application
builds the IPv4 header) flowi4_flags has FLOWI_FLAG_KNOWN_NH set.
This is detected inside __mkroute_output() (in ipv4/route.c)
and forces rt_dst_alloc() be called instead of using the
'dst' from (I think) fib_select_path().
(Is this basically the arp table entry?)
rt_set_nexthop() then calls rt_add_uncached_list().
I suspect that dst_release() is called after every
transmit - but normally just decrements the ref count.
However for raw sends it frees the uncached route.
I think the 'fault' is down to c27c9322d which fixed
an issue where the code was using the IP address from the
pre-built packet instead of the one from the destination
address.
I think there are two issues:
1) if __mkroute_output() creates an 'uncached' route
it can be freed without waiting for rcu grace.
2) if a raw packets destination address matches then
the cached route can be used.
Oh - nothing seems to check DST_HOST any more
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists