[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA-gNpCWG2XJaf-X@shredder>
Date: Mon, 28 Apr 2025 18:35:18 +0300
From: Ido Schimmel <idosch@...dia.com>
To: Ferenc Fejes <ferenc@...es.dev>
Cc: dsahern@...il.com, netdev <netdev@...r.kernel.org>, kuniyu@...zon.com
Subject: Re: [question] robust netns association with fib4 lookup
On Mon, Apr 28, 2025 at 12:20:06PM +0200, Ferenc Fejes wrote:
> On Fri, 2025-04-25 at 21:17 +0300, Ido Schimmel wrote:
> > On Thu, Apr 24, 2025 at 01:33:08PM +0200, Ferenc Fejes wrote:
> > > Hi,
> > >
> > > tl;dr: I want to trace fib4 lookups within a network namespace with eBPF.
> > > This
> > > works well with fib6, as the struct net ptr passed as an argument to
> > > fib6_table_lookup [0], so I can read the inode from it and pass it to
> > > userspace.
> > >
> > >
> > > Additional context. I'm working on a fib table and fib rule lookup tracer
> > > application that hooks fib_table_lookup/fib6_table_lookup and
> > > fib_rules_lookup
> > > with fexit eBPF probes and gathers useful data from the struct flowi4 and
> > > flowi6
> > > used for the lookup as well as the resulting nexthop (gw, seg6, mpls tunnel)
> > > if
> > > the lookup is successful. If this works, my plan is to extend it to
> > > neighbour,
> > > fdb and mdb lookups.
> > >
> > > Tracepoints exist for fib lookups v4 [1] and v6 [2] but in my tracer I would
> > > like to have netns filtering. For example: "check unsuccessful fib4 rule and
> > > table lookups in netns foo". Unfortunately I can't find a reliable way to
> > > associate netns info with fib4 lookups. The main problems are as follows.
> > >
> > > Unlike fib6_table_lookup for v6, fib_table_lookup for v4 does not have a
> > > struct
> > > net argument. This makes sense, as struct net is not needed there. But
> > > without
> > > it, the netns association is not as easy as in the v6 case.
> > >
> > > On the other hand, fib_lookup [3], which in most cases calls
> > > fib_table_lookup,
> > > has a struct net parameter. Even better, there is the struct fib_result ptr
> > > returned by fib_table_lookup. This would be the perfect candidate to hook
> > > into,
> > > but unfortunately it is an inline function.
> > >
> > > If there are custom fib rules in the netns, __fib_lookup [4] is called,
> > > which is
> > > hookable. This has all the necessary info like netns, table and result. To
> > > use
> > > this I have to add the custom rule to the traced netns and remove it
> > > immediately. This will enforce the __fib_lookup codepath. But I feel that at
> > > some point this bug(?) will be fixed and the kernel will notice the absence
> > > of
> > > custom rules and switch back to the original codepath.
> > >
> > > But this option is useless for tracing unsuccessful lookups. The stack looks
> > > like this:
> > > __fib_lookup <-- netns info available
> > > fib_rules_lookup <-- losing netns info... :-(
> > > fib4_rule_action <-- unsuccessful result available
> > > fib_table_lookup <-- source of unsuccessful result
> > >
> > > My current workaround is to restore the netns info using the struct flowi4
> > > pointer. When we have the stack above, I use an eBPF hashmap and use the
> > > flowi4
> > > pointer as the key and netns as the value. Then in the fib_table_lookup I
> > > look
> > > up the netns id based on the value of the flowi4 pointer. Since this is the
> > > common case, it works, but looks like fib_table_lookup is called from other
> > > places as well (even its rare).
> > >
> > > Is there any other way to get the netns info for fib4 lookups? If not, would
> > > it
> > > be worth an RFC to pass the struct net argument to fib_table_lookup as well,
> > > as
> > > is currently done in fib6_table_lookup?
> >
> > I think it makes sense to make both tracepoints similar and pass the net
> > argument to trace_fib_table_lookup()
>
> Thank you for looking into it.
>
> >
> > > Unfortunately this includes some callers to fib_table_lookup. The
> > > netns id would also be presented in the existing tracepoints ([1] and
> > > [2]). Thanks in advance for any suggestion.
> >
> > By "netns id" you mean the netns cookie? It seems that some TCP trace
> > events already expose it (see include/trace/events/tcp.h). It would be
> > nice to finally have "perf" filter these FIB events based on netns.
>
> No, by netns id I mean struct net::ns::inum, which is the inode number
> associated with the netns. This is convenient since it's easy to look up this
> value in userspace with the lsns tool or just stat through the procfs for the
> inode.
>
> Looks like struct net::net_cookie is for similar purpose and can be used from
> restricted context (e.g.: xdp/tc/cls eBPF progs) where rich context (struct net
> for example) as in a fexit/fentry probe is not available.
I'm not sure the inode number is a good identifier for a namespace. See
this comment from the namespace maintainer for a patch that tried to add
a BPF helper to read this value:
https://lore.kernel.org/all/87efzq8jbi.fsf@xmission.com/
More here:
https://lore.kernel.org/netdev/87h93xqlui.fsf@xmission.com/
Which I suspect is why Daniel added the netns cookie:
https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net/
Regarding retrieval of this cookie, there is SO_NETNS_COOKIE:
https://lore.kernel.org/all/20210623135646.1632083-1-m@lambda.lt/
Seems to work fine [1]. Maybe ip-netns can be extended to retrieve the
cookie with something like:
ip netns cookie [ NETNSNAME | PID ]
[1]
# cat so_netns_cookie.c
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/socket.h>
int main(int argc, char *argvp[])
{
socklen_t vallen;
uint64_t cookie;
int sock;
sock = socket(AF_INET, SOCK_STREAM, 0);
if (sock < 0)
return sock;
vallen = sizeof(cookie);
if (getsockopt(sock, SOL_SOCKET, SO_NETNS_COOKIE, &cookie, &vallen) != 0)
return -1;
printf("cookie = %lu\n", cookie);
close(sock);
return 0;
}
# gcc -Wall so_netns_cookie.c -o so_netns_cookie
# ip netns add ns1
# ip netns add ns2
# ./so_netns_cookie
cookie = 1
# ip netns exec ns1 ./so_netns_cookie
cookie = 2
# ip netns exec ns2 ./so_netns_cookie
cookie = 3
Powered by blists - more mailing lists