netdev - Re: [PATCH net-next 1/2] net: store netdevs in an xarray

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20788d4df9bbcdce9453be3fd047fdf8e0465714.camel@redhat.com>
Date: Mon, 24 Jul 2023 10:18:04 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc: netdev@...r.kernel.org, edumazet@...gle.com, mkubecek@...e.cz, 
	lorenzo@...nel.org
Subject: Re: [PATCH net-next 1/2] net: store netdevs in an xarray

On Fri, 2023-07-21 at 18:42 -0700, Jakub Kicinski wrote:
> Iterating over the netdev hash table for netlink dumps is hard.
> Dumps are done in "chunks" so we need to save the position
> after each chunk, so we know where to restart from. Because
> netdevs are stored in a hash table we remember which bucket
> we were in and how many devices we dumped.
> 
> Since we don't hold any locks across the "chunks" - devices may
> come and go while we're dumping. If that happens we may miss
> a device (if device is deleted from the bucket we were in).
> We indicate to user space that this may have happened by setting
> NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
> if it sees that. Somehow I doubt most user space gets this right..
> 
> To illustrate let's look at an example:
> 
>                System state:
>   start:       # [A, B, C]
>   del:  B      # [A, C]
> 
> with the hash table we may dump [A, B], missing C completely even
> tho it existed both before and after the "del B".
> 
> Add an xarray and use it to allocate ifindexes. This way we
> can iterate ifindexes in order, without the worry that we'll
> skip one. We may still generate a dump of a state which "never
> existed", for example for a set of values and sequence of ops:
> 
>                System state:
>   start:       # [A, B]
>   add:  C      # [A, C, B]
>   del:  B      # [A, C]
> 
> we may generate a dump of [A], if C got an index between A and B.
> System has never been in such state. But I'm 90% sure that's perfectly
> fine, important part is that we can't _miss_ devices which exist before
> and after. User space which wants to mirror kernel's state subscribes
> to notifications and does periodic dumps so it will know that C exists
> from the notification about its creation or from the next dump
> (next dump is _guaranteed_ to include C, if it doesn't get removed).
> 
> To avoid any perf regressions keep the hash table for now. Most
> net namespaces have very few devices and microbenchmarking 1M lookups
> on Skylake I get the following results (not counting loopback
> to number of devs):

A possibly dumb question: why using an xarray over a plain list? It
looks like the idea is to additionally use xarray for device lookup
beyond for dumping?

WRT the above, have you considered instead replacing dev_name_head with
an rhashtable? (and add the mentioned list)

Cheers,

Paolo