lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <18787829-0189-40EF-9AD2-B270F3EBA2C6@gmail.com>
Date: Tue, 9 May 2023 23:16:37 +0300
From: Martin Zaharinov <micron10@...il.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: Eric Dumazet <edumazet@...gle.com>,
 netdev <netdev@...r.kernel.org>
Subject: Re: Very slow remove interface from kernel

Hi Ido

yes is physical card intel 82599 dual port 10G on 2 socket system with 24 core on 3Ghz

this is time : 

time ./vlanadd

real	0m12.347s
user	0m8.863s
sys	0m2.594s

time ./vlanrem

real	8m59.105s
user	0m11.931s
sys	0m0.035s


for 1sec with : watch -n.1 "ip a | grep UP | wcā€

and run vlanrem 

in 1sec ~ remove 4-5 vlans

and i think rcu make problem.

i found one post from 2009 : https://lore.kernel.org/all/20091024144610.GC6638@linux.vnet.ibm.com/T/

yes is old and may be is make many changes after that .

i have same case with slow remove interface and with ppp interface when drop users over 800-900 make same problem to remove device and reconnect (readd)

m.

> On 9 May 2023, at 23:08, Ido Schimmel <idosch@...sch.org> wrote:
> 
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1 
>> 
>> 
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>> 
>> real 4m51.633s  ā€”ā€” here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
> 
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
> 
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
> 
> When using a batch file to perform the deletion:
> 
> # time -p ip -b vlan_del.batch 
> real 35.25
> user 0.02
> sys 3.61
> 
> And to check where we are blocked most of the time while using the batch
> file:
> 
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
>    __schedule
>    schedule
>    schedule_timeout
>    wait_for_completion
>    rcu_barrier
>    netdev_run_todo
>    rtnetlink_rcv_msg
>    netlink_rcv_skb
>    netlink_unicast
>    netlink_sendmsg
>    ____sys_sendmsg
>    ___sys_sendmsg
>    __sys_sendmsg
>    do_syscall_64
>    entry_SYSCALL_64_after_hwframe
>    -                ip (3660)
>        25089479
> [...]
> 
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
> 
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
> 
>> 
>> 
>> Config is very clean i remove big part of CONFIG options .
>> 
>> is there options to debug what is happen.
>> 
>> m


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ