[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230724190934.GE11388@unreal>
Date: Mon, 24 Jul 2023 22:09:34 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
pabeni@...hat.com, mkubecek@...e.cz, lorenzo@...nel.org
Subject: Re: [PATCH net-next 1/2] net: store netdevs in an xarray
On Fri, Jul 21, 2023 at 06:42:36PM -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
>
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
>
> To illustrate let's look at an example:
>
> System state:
> start: # [A, B, C]
> del: B # [A, C]
>
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
>
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
>
> System state:
> start: # [A, B]
> add: C # [A, C, B]
> del: B # [A, C]
>
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
>
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):
>
> #devs | hash | xa | delta
> 2 | 18.3 | 20.1 | + 9.8%
> 16 | 18.3 | 20.1 | + 9.5%
> 64 | 18.3 | 26.3 | +43.8%
> 128 | 20.4 | 26.3 | +28.6%
> 256 | 20.0 | 26.4 | +32.1%
> 1024 | 26.6 | 26.7 | + 0.2%
> 8192 |541.3 | 33.5 | -93.8%
>
> No surprises since the hash table has 256 entries.
> The microbenchmark scans indexes in order, if the pattern is more
> random xa starts to win at 512 devices already. But that's a lot
> of devices, in practice.
>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> include/net/net_namespace.h | 4 +-
> net/core/dev.c | 82 ++++++++++++++++++++++++-------------
> 2 files changed, 57 insertions(+), 29 deletions(-)
<...>
> + if (!ifindex)
> + err = xa_alloc_cyclic(&net->dev_by_index, &ifindex, NULL,
> + xa_limit_31b, &net->ifindex, GFP_KERNEL);
> + else
> + err = xa_insert(&net->dev_by_index, ifindex, NULL, GFP_KERNEL);
> + if (err)
> + return err;
Please pay attention that xa_alloc_cyclic() returns 1 if the allocation
succeeded after wrapping. So the more accurate check is "if (err < 0) ..."
Thanks
Powered by blists - more mailing lists