[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6A3B4C11-EF48-4CE9-9EC7-5882E330D7EA@gmail.com>
Date: Tue, 7 Sep 2021 09:42:14 +0300
From: Martin Zaharinov <micron10@...il.com>
To: Guillaume Nault <gnault@...hat.com>
Cc: Pali Rohár <pali@...nel.org>,
Greg KH <gregkh@...uxfoundation.org>,
netdev <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport
endpoint is not connected
Perf top from text
PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup
9.73% [kernel] [k] mutex_spin_on_owner
9.07% [pppoe] [k] pppoe_rcv
2.77% [nf_nat] [k] device_cmp
1.66% [kernel] [k] osq_lock
1.65% [kernel] [k] _raw_spin_lock
1.61% [kernel] [k] __local_bh_enable_ip
1.35% [nf_nat] [k] inet_cmp
1.30% [kernel] [k] __netif_receive_skb_core.constprop.0
1.16% [kernel] [k] menu_select
0.99% [kernel] [k] cpuidle_enter_state
0.96% [ixgbe] [k] ixgbe_clean_rx_irq
0.86% [kernel] [k] __dev_queue_xmit
0.70% [kernel] [k] __cond_resched
0.69% [sch_cake] [k] cake_dequeue
0.67% [nf_tables] [k] nft_do_chain
0.63% [kernel] [k] rcu_all_qs
0.61% [kernel] [k] fib_table_lookup
0.57% [kernel] [k] __schedule
0.57% [kernel] [k] skb_release_data
0.54% [kernel] [k] sched_clock
0.54% [kernel] [k] __copy_skb_header
0.53% [kernel] [k] dev_queue_xmit_nit
0.53% [kernel] [k] _raw_spin_lock_irqsave
0.50% [kernel] [k] kmem_cache_free
0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970
0.47% [ixgbe] [k] ixgbe_clean_tx_irq
0.45% [kernel] [k] timerqueue_add
0.45% [kernel] [k] lapic_next_deadline
0.45% [kernel] [k] csum_partial_copy_generic
0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook
0.44% [kernel] [k] kmem_cache_alloc
0.44% [nf_conntrack] [k] nf_conntrack_lock
> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@...il.com> wrote:
>
> Hi
> Sorry for delay but not easy to catch moment .
>
>
> See this is mpstatl 1 :
>
> Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU)
>
> 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05
> 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51
> 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21
> 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84
> 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67
> 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75
> 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61
> 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11
> 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58
> 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30
> 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20
> 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07
> 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20
> 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92
> 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03
> 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17
> Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26
>
>
> I attache and one screenshot from perf top (Screenshot is send on preview mail)
>
> And I see in lsmod
>
> pppoe 20480 8198
> pppox 16384 1 pppoe
> ppp_generic 45056 16364 pppox,pppoe
> slhc 16384 1 ppp_generic
>
> To slow remove pppoe session .
>
> And from log :
>
> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>
>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@...hat.com> wrote:
>>
>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote:
>>> And one more that see.
>>>
>>> Problem is come when accel start finishing sessions,
>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans ,
>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans.
>>> May be kernel destroy old session slow and entrained other users by locking other sessions.
>>> is there a way to speed up the closing of stopped/dead sessions.
>>
>> What are the CPU stats when that happen? Is it users space or kernel
>> space that keeps it busy?
>>
>> One easy way to check is to run "mpstat 1" for a few seconds when the
>> problem occurs.
>>
>
Powered by blists - more mailing lists