lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fef628e51694ab7bae2098b762328830@nuclearcat.com>
Date:   Wed, 21 Feb 2018 21:30:49 +0200
From:   Denys Fedoryshchenko <nuclearcat@...learcat.com>
To:     Guillaume Nault <g.nault@...halink.fr>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: ppp/pppoe, still panic 4.15.3 in ppp_push

On 2018-02-21 20:55, Guillaume Nault wrote:
> On Wed, Feb 21, 2018 at 12:26:51PM +0200, Denys Fedoryshchenko wrote:
>> It seems even rebuilding seemingly stable version triggering crashes 
>> too
>> (but different ones)
> Different ones? The trace following your message looks very similar to
> your first KASAN report. Or are you refering to the lockup you posted
> on Sun, 18 Feb 2018?
> 
> Also, which stable versions are you refering to?
Trace i sent in previous email - is latest kernel, vanilla, just more 
debug options and few options disabled.
One of disabled was spitting some errors (it is obviously bug), 
CONFIG_XFRM, in nf_xfrm_me_harder (i reported about it).
And i disabled namespaces, as they are often source of trouble.

Today i will try to revert just:
drivers, net, ppp: convert asyncppp.refcnt from atomic_t to refcount_t
drivers, net, ppp: convert syncppp.refcnt from atomic_t to refcount_t
drivers, net, ppp: convert ppp_file.refcnt from atomic_t to  refcount_t

Because i suspect previously, after reverting this patches i got 
different kernel
panic (and i didn't noticed that, now too late to identify between other 
crashes),
seems it was not KASAN.
I will report results after testing, unfortunately i can't test it more 
than once per day.

"Stable" for me was 4.14.2 - but it looks like on that kernel i am 
getting different issue now.
I will paste it below.

Another observation, just hour ago, i noticed on another server, where i 
am testing 4.15, and 4.14.20
(at moment of testing 4.14.20, but no debug at that moment), when i 
killed accel-pppd (pppoe server software),
with 8k sessions online, i got some weird behaviour, accel-pppd process 
got stuck, same as
ifconfig and "ip link", and even kexec -e didn't worked(got stuck too), 
unless i did kexec -e -x
(so it wont try to make interfaces down on kexec).
I will try to reproduce this bug as well, with debug enabled (lockdep 
and so) i hope it is not related.

> 
> I'm interested in the ppp_generic.o file that produced the following
> trace. Just to be sure that the differences come from the new debugging
> options.
Also kernel config:
https://nuclearcat.com/bughunting/config.txt
https://nuclearcat.com/bughunting/ppp_generic.o

This is in 4.14.2, was seemingly stable before:

[50401.388670] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 1 timed out
[50401.389014] ------------[ cut here ]------------
[50401.389340] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 
dev_watchdog+0x15c/0x1b9
[50401.389925] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
[50401.391869] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
4.14.2-build-0134 #4
[50401.392191] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
[50401.392513] task: ffff880434d72640 task.stack: ffffc90001914000
[50401.392836] RIP: 0010:dev_watchdog+0x15c/0x1b9
[50401.393155] RSP: 0018:ffff8804364c3e90 EFLAGS: 00010286
[50401.393470] RAX: 0000000000000039 RBX: ffff88042f6e0000 RCX: 
0000000000000000
[50401.393787] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 
ffffffff828dbc64
[50401.394103] RBP: ffff8804364c3eb0 R08: 0000000000000001 R09: 
0000000000000000
[50401.394420] R10: 0000000000000002 R11: ffff8803fa075c00 R12: 
0000000000000001
[50401.394739] R13: 0000000000000040 R14: 0000000000000003 R15: 
ffffffff81e05108
[50401.395064] FS:  0000000000000000(0000) GS:ffff8804364c0000(0000) 
knlGS:0000000000000000
[50401.395645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50401.395970] CR2: 00007fff25fc20a8 CR3: 0000000001e09005 CR4: 
00000000001606e0
[50401.396294] Call Trace:
[50401.396613]  <IRQ>
[50401.396934]  ? qdisc_rcu_free+0x3f/0x3f
[50401.397255]  call_timer_fn.isra.4+0x17/0x7b
[50401.397576]  expire_timers+0x6f/0x7e
[50401.397899]  run_timer_softirq+0x6d/0x8f
[50401.398219]  ? ktime_get+0x3b/0x8c
[50401.398540]  ? lapic_next_event+0x18/0x1c
[50401.398862]  ? clockevents_program_event+0xa3/0xbb
[50401.399186]  __do_softirq+0xbc/0x1ab
[50401.399510]  irq_exit+0x4d/0x8e
[50401.399832]  smp_apic_timer_interrupt+0x73/0x80
[50401.400157]  apic_timer_interrupt+0x8d/0xa0
[50401.400480]  </IRQ>
[50401.400801] RIP: 0010:mwait_idle+0x4e/0x61
[50401.401123] RSP: 0018:ffffc90001917ec0 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
[50401.401714] RAX: 0000000000000000 RBX: ffff880434d72640 RCX: 
0000000000000000
[50401.402037] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[50401.402362] RBP: ffffc90001917ec0 R08: 0000000000000000 R09: 
0000000000000001
[50401.402685] R10: ffffc90001917e58 R11: 000000000000037a R12: 
0000000000000000
[50401.403008] R13: 0000000000000000 R14: ffff880434d72640 R15: 
ffff880434d72640
[50401.403336]  arch_cpu_idle+0xa/0xc
[50401.403660]  default_idle_call+0x20/0x22
[50401.403983]  do_idle+0xb3/0x13b
[50401.404303]  cpu_startup_entry+0x1a/0x1c
[50401.404627]  start_secondary+0x102/0x105
[50401.404949]  secondary_startup_64+0xa5/0xa5
[50401.405271] Code: d9 54 58 00 00 75 38 48 89 df c6 05 cd 54 58 00 01 
e8 dc 24 fe ff 44 89 e1 48 89 c2 48 89 de 48 c7 c7 51 df d6 81 e8 db f6 
7c ff <0f> ff eb 10 41 ff c4 48 05 40 01 0
0 00 41 39 cc 75 9a eb 0d 48
[50401.406154] ---[ end trace 433f9a49b32a3b4f ]---
[50401.406489] ixgbe 0000:04:00.1 eth1: initiating reset due to tx 
timeout
[50401.406828] ixgbe 0000:04:00.1 eth1: Reset adapter

  [50416.686709] watchdog: BUG: soft lockup - CPU#5 stuck for 23s! 
[tc:24389]
  [50416.687042] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
  [50416.687707] watchdog: BUG: soft lockup - CPU#7 stuck for 23s! 
[accel-pppd:24377]
  [50416.687708] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
  [50416.687720] CPU: 7 PID: 24377 Comm: accel-pppd Tainted: G        W   
     4.14.2-build-0134 #4
  [50416.687721] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
  [50416.687722] task: ffff8803f278d940 task.stack: ffffc90011a38000
  [50416.687727] RIP: 0010:queued_spin_lock_slowpath+0x125/0x16e
  [50416.687728] RSP: 0018:ffffc90011a3bd20 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
  [50416.687729] RAX: 0000000000000000 RBX: ffff8804035abfb8 RCX: 
ffff8804365d9a40
  [50416.687730] RDX: ffff880436419a60 RSI: 0000000000200000 RDI: 
ffff8804035abfa0
  [50416.687730] RBP: ffffc90011a3bd20 R08: 0000000000000001 R09: 
00000000ffffffff
  [50416.687731] R10: ffffc90011a3bda8 R11: ffff8803f278d940 R12: 
ffff88041a5fde00
  [50416.687731] R13: 0000000000008021 R14: ffff8803fc5778c0 R15: 
ffff8803fc577840
  [50416.687732] FS:  00007f7b5726e700(0000) GS:ffff8804365c0000(0000) 
knlGS:0000000000000000
  [50416.687733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [50416.687733] CR2: 0000000001d5f408 CR3: 000000042bf7c006 CR4: 
00000000001606e0
  [50416.687734] Call Trace:
  [50416.687738]  _raw_spin_lock+0x1a/0x1c
  [50416.687741]  ppp_push+0x56/0x49b [ppp_generic]
  [50416.687743]  ? __kmalloc_node_track_caller+0xba/0xdf
  [50416.687746]  __ppp_xmit_process+0x38/0x4c9 [ppp_generic]
  [50416.687747]  ppp_xmit_process+0x39/0x8d [ppp_generic]
  [50416.687749]  ppp_write+0xbb/0xc7 [ppp_generic]
  [50416.687750]  __vfs_write+0x21/0x111
  [50416.687752]  ? handle_mm_fault+0xb1/0xc0
  [50416.687753]  vfs_write+0xcd/0x176
  [50416.687754]  SyS_write+0x49/0x83
  [50416.687755]  entry_SYSCALL_64_fastpath+0x17/0x98
  [50416.687756] RIP: 0033:0x7f7b5ba4e14d
  [50416.687756] RSP: 002b:00007f7b5726dbd0 EFLAGS: 00000293 ORIG_RAX: 
0000000000000001
  [50416.687757] RAX: ffffffffffffffda RBX: 00007f7b5244aae8 RCX: 
00007f7b5ba4e14d
  [50416.687758] RDX: 000000000000000c RSI: 00007f7b4c7e6180 RDI: 
0000000000004e9e
  [50416.687758] RBP: 00007f7b5726dba0 R08: 0000000000000000 R09: 
0000000000633e00
  [50416.687758] R10: 0000000000000000 R11: 0000000000000293 R12: 
0000000000000000
  [50416.687759] R13: 00007f7b5716c4ff R14: 0000000000000003 R15: 
00007f7b5716c590
  [50416.687759] Code: 89 c2 c1 e8 12 48 c1 ea 0c ff c8 83 e2 30 48 98 48 
81 c2 40 9a 01 00 48 03 14 c5 a0 73 d7 81 48 89 0a 8b 41 08 85 c0 75 04 
f3 90 <eb> f5 48 8b 11 48 85 d2 74 03 0f 1
8 0a 44 8b 07 66 45 85 c0 0f
  [50416.687771] Kernel panic - not syncing: softlockup: hung tasks
  [50416.687772] CPU: 7 PID: 24377 Comm: accel-pppd Tainted: G        W   
  L  4.14.2-build-0134 #4
  [50416.687773] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
  [50416.687773] Call Trace:
  [50416.687774]  <IRQ>
  [50416.687776]  dump_stack+0x4d/0x63
  [50416.687778]  panic+0xcd/0x215
  [50416.687780]  watchdog_timer_fn+0x1a4/0x1c6
  [50416.687782]  __hrtimer_run_queues+0x76/0xc0
  [50416.687783]  hrtimer_interrupt+0xa0/0x162
  [50416.687785]  smp_apic_timer_interrupt+0x6e/0x80
  [50416.687786]  apic_timer_interrupt+0x8d/0xa0
  [50416.687786]  </IRQ>
  [50416.687787] RIP: 0010:queued_spin_lock_slowpath+0x125/0x16e
  [50416.687788] RSP: 0018:ffffc90011a3bd20 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
  [50416.687789] RAX: 0000000000000000 RBX: ffff8804035abfb8 RCX: 
ffff8804365d9a40
  [50416.687789] RDX: ffff880436419a60 RSI: 0000000000200000 RDI: 
ffff8804035abfa0
  [50416.687789] RBP: ffffc90011a3bd20 R08: 0000000000000001 R09: 
00000000ffffffff
  [50416.687790] R10: ffffc90011a3bda8 R11: ffff8803f278d940 R12: 
ffff88041a5fde00
  [50416.687790] R13: 0000000000008021 R14: ffff8803fc5778c0 R15: 
ffff8803fc577840
  [50416.687792]  _raw_spin_lock+0x1a/0x1c
  [50416.687793]  ppp_push+0x56/0x49b [ppp_generic]
  [50416.687794]  ? __kmalloc_node_track_caller+0xba/0xdf
  [50416.687795]  __ppp_xmit_process+0x38/0x4c9 [ppp_generic]
  [50416.687797]  ppp_xmit_process+0x39/0x8d [ppp_generic]
  [50416.687798]  ppp_write+0xbb/0xc7 [ppp_generic]
  [50416.687798]  __vfs_write+0x21/0x111
  [50416.687799]  ? handle_mm_fault+0xb1/0xc0
  [50416.687800]  vfs_write+0xcd/0x176
  [50416.687801]  SyS_write+0x49/0x83
  [50416.687802]  entry_SYSCALL_64_fastpath+0x17/0x98
  [50416.687802] RIP: 0033:0x7f7b5ba4e14d
  [50416.687803] RSP: 002b:00007f7b5726dbd0 EFLAGS: 00000293 ORIG_RAX: 
0000000000000001
  [50416.687803] RAX: ffffffffffffffda RBX: 00007f7b5244aae8 RCX: 
00007f7b5ba4e14d
  [50416.687804] RDX: 000000000000000c RSI: 00007f7b4c7e6180 RDI: 
0000000000004e9e
  [50416.687804] RBP: 00007f7b5726dba0 R08: 0000000000000000 R09: 
0000000000633e00
  [50416.687805] R10: 0000000000000000 R11: 0000000000000293 R12: 
0000000000000000
  [50416.687805] R13: 00007f7b5716c4ff R14: 0000000000000003 R15: 
00007f7b5716c590
  [50416.723251] CPU: 5 PID: 24389 Comm: tc Tainted: G        W    L  
4.14.2-build-0134 #4
  [50416.723831] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
  [50416.724151] task: ffff8803f0f19980 task.stack: ffffc90011a70000
  [50416.724477] RIP: 0010:queued_spin_lock_slowpath+0x54/0x16e
  [50416.724799] RSP: 0018:ffffc90011a73940 EFLAGS: 00000202 ORIG_RAX: 
ffffffffffffff10
  [50416.725384] RAX: 0000000000000101 RBX: ffff8803fc577000 RCX: 
0000000000000101
  [50416.725709] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 
ffff8803fc5778c0
  [50416.726034] RBP: ffffc90011a73940 R08: 0000000000000001 R09: 
0000000000002e88
  [50416.726355] R10: 0000000000000024 R11: 0000000000000000 R12: 
ffff8803eea1d060
  [50416.726678] R13: ffff8803fc5778c0 R14: ffffc90011a739f0 R15: 
0000000000000000
  [50416.727001] FS:  00007f72228f2700(0000) GS:ffff880436540000(0000) 
knlGS:0000000000000000
  [50416.727582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [50416.727904] CR2: 0000000001aea038 CR3: 00000003fb09a001 CR4: 
00000000001606e0
  [50416.728230] Call Trace:
  [50416.728550]  _raw_spin_lock_bh+0x25/0x27
  [50416.728872]  ppp_get_stats64+0x4c/0xaf [ppp_generic]
  [50416.729195]  dev_get_stats+0x33/0x84
  [50416.729513]  rtnl_fill_stats+0x41/0x121
  [50416.729831]  rtnl_fill_ifinfo+0x567/0xdf0
  [50416.730148]  rtnl_dump_ifinfo+0x232/0x2e0
  [50416.730470]  ? __alloc_pages_nodemask+0xd2/0xa99
  [50416.730790]  ? handle_mm_fault+0xb1/0xc0
  [50416.731109]  ? __do_page_fault+0x31c/0x3b6
  [50416.731431]  netlink_dump+0xfc/0x271
  [50416.731749]  netlink_recvmsg+0x1af/0x2d9
  [50416.732068]  sock_recvmsg+0x14/0x16
  [50416.732388]  ___sys_recvmsg+0xea/0x1a1
  [50416.732708]  ? lru_deactivate_file_fn+0x29f/0x29f
  [50416.733028]  ? pagevec_lru_move_fn+0xad/0xc1
  [50416.733348]  ? alloc_pages_vma+0x152/0x19c
  [50416.733669]  ? page_add_new_anon_rmap+0xa0/0xa9
  [50416.733988]  ? __handle_mm_fault+0xda4/0xf40
  [50416.734310]  ? sock_sendmsg+0x12/0x1d
  [50416.734628]  __sys_recvmsg+0x3d/0x5e
  [50416.734945]  ? __sys_recvmsg+0x3d/0x5e
  [50416.735264]  SyS_recvmsg+0xd/0x17
  [50416.735576]  entry_SYSCALL_64_fastpath+0x17/0x98
  [50416.735890] RIP: 0033:0x7f7221afdac0
  [50416.736202] RSP: 002b:00007ffcbbc1f328 EFLAGS: 00000246 ORIG_RAX: 
000000000000002f
  [50416.736772] RAX: ffffffffffffffda RBX: 00007f7221da8b98 RCX: 
00007f7221afdac0
  [50416.737092] RDX: 0000000000000000 RSI: 00007ffcbbc1f390 RDI: 
0000000000000003
  [50416.737411] RBP: 00007f7221da8b98 R08: 0000000000007d7c R09: 
0000000000000000
  [50416.737734] R10: 00007ffcbbc1f3d0 R11: 0000000000000246 R12: 
000000000066a520
  [50416.738055] R13: 0000000000000040 R14: 00007f7221da8b40 R15: 
0000000000002710
  [50416.738377] Code: ff ff 75 33 83 fe 01 89 ca 89 f0 41 0f 45 d0 f0 0f 
b1 17 39 c6 74 04 89 c6 eb e1 ff ca 0f 84 20 01 00 00 8b 07 84 c0 74 04 
f3 90 <eb> f6 66 c7 07 01 00 e9 0c 01 00 0
0 48 c7 c0 40 9a 01 00 65 48
Feb 21 00:04:31 10.0.252.9 [50417.724420] Shutting down cpus with NMI
Feb 21 00:04:31 10.0.252.9 [50417.724748] Kernel Offset: disabled
Feb 21 00:04:31 10.0.252.9 [50417.728891] Rebooting in 5 seconds..

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ