lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230724190934.GE11388@unreal>
Date: Mon, 24 Jul 2023 22:09:34 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
	pabeni@...hat.com, mkubecek@...e.cz, lorenzo@...nel.org
Subject: Re: [PATCH net-next 1/2] net: store netdevs in an xarray

On Fri, Jul 21, 2023 at 06:42:36PM -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
> 
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
> 
> To illustrate let's look at an example:
> 
>                System state:
>   start:       # [A, B, C]
>   del:  B      # [A, C]
> 
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
> 
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
> 
>                System state:
>   start:       # [A, B]
>   add:  C      # [A, C, B]
>   del:  B      # [A, C]
> 
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
> 
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):
> 
>  #devs | hash |  xa  | delta
>     2  | 18.3 | 20.1 | + 9.8%
>    16  | 18.3 | 20.1 | + 9.5%
>    64  | 18.3 | 26.3 | +43.8%
>   128  | 20.4 | 26.3 | +28.6%
>   256  | 20.0 | 26.4 | +32.1%
>  1024  | 26.6 | 26.7 | + 0.2%
>  8192  |541.3 | 33.5 | -93.8%
> 
> No surprises since the hash table has 256 entries.
> The microbenchmark scans indexes in order, if the pattern is more
> random xa starts to win at 512 devices already. But that's a lot
> of devices, in practice.
> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
>  include/net/net_namespace.h |  4 +-
>  net/core/dev.c              | 82 ++++++++++++++++++++++++-------------
>  2 files changed, 57 insertions(+), 29 deletions(-)

<...>

> +	if (!ifindex)
> +		err = xa_alloc_cyclic(&net->dev_by_index, &ifindex, NULL,
> +				      xa_limit_31b, &net->ifindex, GFP_KERNEL);
> +	else
> +		err = xa_insert(&net->dev_by_index, ifindex, NULL, GFP_KERNEL);
> +	if (err)
> +		return err;

Please pay attention that xa_alloc_cyclic() returns 1 if the allocation
succeeded after wrapping. So the more accurate check is "if (err < 0) ..."

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ