lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c025155-7bbc-40a9-a4cd-9670d35193af@nvidia.com>
Date: Thu, 9 May 2024 06:06:18 -0700
From: William Tu <witu@...dia.com>
To: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: jiri@...dia.com, bodong@...dia.com, kuba@...nel.org
Subject: Re: [PATCH RFC net-next] net: cache the __dev_alloc_name()



On 5/9/24 12:46 AM, Paolo Abeni wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, 2024-05-07 at 11:55 -0700, William Tu wrote:
>> On 5/7/24 12:26 AM, Paolo Abeni wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, 2024-05-06 at 20:32 +0000, William Tu wrote:
>>>> When a system has around 1000 netdevs, adding the 1001st device becomes
>>>> very slow. The devlink command to create an SF
>>>>     $ devlink port add pci/0000:03:00.0 flavour pcisf \
>>>>       pfnum 0 sfnum 1001
>>>> takes around 5 seconds, and Linux perf and flamegraph show 19% of time
>>>> spent on __dev_alloc_name() [1].
>>>>
>>>> The reason is that devlink first requests for next available "eth%d".
>>>> And __dev_alloc_name will scan all existing netdev to match on "ethN",
>>>> set N to a 'inuse' bitmap, and find/return next available number,
>>>> in our case eth0.
>>>>
>>>> And later on based on udev rule, we renamed it from eth0 to
>>>> "en3f0pf0sf1001" and with altname below
>>>>     14: en3f0pf0sf1001: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
>>>>         altname enp3s0f0npf0sf1001
>>>>
>>>> So eth0 is actually never being used, but as we have 1k "en3f0pf0sfN"
>>>> devices + 1k altnames, the __dev_alloc_name spends lots of time goint
>>>> through all existing netdev and try to build the 'inuse' bitmap of
>>>> pattern 'eth%d'. And the bitmap barely has any bit set, and it rescanes
>>>> every time.
>>>>
>>>> I want to see if it makes sense to save/cache the result, or is there
>>>> any way to not go through the 'eth%d' pattern search. The RFC patch
>>>> adds name_pat (name pattern) hlist and saves the 'inuse' bitmap. It saves
>>>> pattens, ex: "eth%d", "veth%d", with the bitmap, and lookup before
>>>> scanning all existing netdevs.
>>> An alternative heuristic that should be cheap and possibly reasonable
>>> could be optimistically check for <name>0..<name><very small int>
>>> availability, possibly restricting such attempt at scenarios where the
>>> total number of hashed netdevice names is somewhat high.
>>>
>>> WDYT?
>>>
>>> Cheers,
>>>
>>> Paolo
>> Hi Paolo,
>>
>> Thanks for your suggestion!
>> I'm not clear with that idea.
>>
>> The current code has to do a full scan of all netdevs in a list, and the
>> name list is not sorted / ordered. So to get to know, ex: eth0 .. eth10,
>> we still need to do a full scan, find netdev with prefix "eth", and get
>> net available bit 11 (10+1).
>> And in another use case where users doesn't install UDEV rule to rename,
>> the system can actually create eth998, eth999, eth1000....
>>
>> What if we create prefix map (maybe using xarray)
>> idx   entry=(prefix, bitmap)
>> --------------------
>> 0      eth, 1111000000...
>> 1      veth, 1000000...
>> 2      can, 11100000...
>> 3      firewire, 00000...
>>
>> but then we need to unset the bit when device is removed.
>> William
> Sorry for the late reply. I mean something alike the following
> (completely untested!!!):
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d2ce91a334c1..0d428825f88a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1109,6 +1109,12 @@ static int __dev_alloc_name(struct net *net, const char *name, char *res)
>          if (!p || p[1] != 'd' || strchr(p + 2, '%'))
>                  return -EINVAL;
>
> +       for (i = 0; i < 4; ++i) {
> +               snprintf(buf, IFNAMSIZ, name, i);
> +               if (!__dev_get_by_name(net, buf))
> +                       goto found;
> +       }
> +
>          /* Use one page as a bit array of possible slots */
>          inuse = bitmap_zalloc(max_netdevices, GFP_ATOMIC);
>          if (!inuse)
> @@ -1144,6 +1150,7 @@ static int __dev_alloc_name(struct net *net, const char *name, char *res)
>          if (i == max_netdevices)
>                  return -ENFILE;
>
> +found:
>          /* 'res' and 'name' could overlap, use 'buf' as an intermediate buffer */
>          strscpy(buf, name, IFNAMSIZ);
>          snprintf(res, IFNAMSIZ, buf, i);
>
> ---
> plus eventually some additional check to use such heuristic only if the
> total number of devices is significantly high. That would need some
> additional book-keeping, not added here.
>
> Cheers,
>
> Paolo
Hi Paolo,
Thanks, now I understand the idea.
Will give it a try.
William
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ