lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 18 Feb 2018 12:01:02 +0200
From:   Denys Fedoryshchenko <nuclearcat@...learcat.com>
To:     Guillaume Nault <g.nault@...halink.fr>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: ppp/pppoe, still panic 4.15.3 in ppp_push

On 2018-02-16 20:48, Guillaume Nault wrote:
> On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote:
>> On 2018-02-15 21:42, Guillaume Nault wrote:
>> > On Thu, Feb 15, 2018 at 09:34:42PM +0200, Denys Fedoryshchenko wrote:
>> > > On 2018-02-15 21:31, Guillaume Nault wrote:
>> > > > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
>> > > > > On 2018-02-15 17:55, Guillaume Nault wrote:
>> > > > > > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
>> > > > > > > Here we go:
>> > > > > > >
>> > > > > > >  <srv> [24558.921549]
>> > > > > > > ==================================================================
>> > > > > > >  <srv> [24558.922167] BUG: KASAN: use-after-free in
>> > > > > > > ppp_ioctl+0xa6a/0x1522
>> > > > > > > [ppp_generic]
>> > > > > > >  <srv> [24558.922776] Write of size 8 at addr ffff8803d35bf3f8 by task
>> > > > > > > accel-pppd/12622
>> > > > > > >  <srv> [24558.923113]
>> > > > > > >  <srv> [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
>> > > > > > > W
>> > > > > > > 4.15.3-build-0134 #1
>> > > > > > >  <srv> [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
>> > > > > > > BIOS P80
>> > > > > > > 04/02/2015
>> > > > > > >  <srv> [24558.924406] Call Trace:
>> > > > > > >  <srv> [24558.924753]  dump_stack+0x46/0x59
>> > > > > > >  <srv> [24558.925103]  print_address_description+0x6b/0x23b
>> > > > > > >  <srv> [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > > > > > >  <srv> [24558.925797]  kasan_report+0x21b/0x241
>> > > > > > >  <srv> [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > > > > > >  <srv> [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
>> > > > > > >  <srv> [24558.926829]  ? sock_sendmsg+0x89/0x99
>> > > > > > >  <srv> [24558.927176]  ? __vfs_write+0xd9/0x4ad
>> > > > > > >  <srv> [24558.927523]  ? kernel_read+0xed/0xed
>> > > > > > >  <srv> [24558.927872]  ? SyS_getpeername+0x18c/0x18c
>> > > > > > >  <srv> [24558.928213]  ? bit_waitqueue+0x2a/0x2a
>> > > > > > >  <srv> [24558.928561]  ? wake_atomic_t_function+0x115/0x115
>> > > > > > >  <srv> [24558.928898]  vfs_ioctl+0x6e/0x81
>> > > > > > >  <srv> [24558.929228]  do_vfs_ioctl+0xa00/0xb10
>> > > > > > >  <srv> [24558.929571]  ? sigprocmask+0x1a6/0x1d0
>> > > > > > >  <srv> [24558.929907]  ? sigsuspend+0x13e/0x13e
>> > > > > > >  <srv> [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
>> > > > > > >  <srv> [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
>> > > > > > >  <srv> [24558.930904]  ? sigprocmask+0x1d0/0x1d0
>> > > > > > >  <srv> [24558.931252]  SyS_ioctl+0x39/0x55
>> > > > > > >  <srv> [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
>> > > > > > >  <srv> [24558.931942]  do_syscall_64+0x1b1/0x31f
>> > > > > > >  <srv> [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > > > > > >  <srv> [24558.932627] RIP: 0033:0x7f302849d8a7
>> > > > > > >  <srv> [24558.932965] RSP: 002b:00007f3029a52af8 EFLAGS: 00000206
>> > > > > > > ORIG_RAX:
>> > > > > > > 0000000000000010
>> > > > > > >  <srv> [24558.933578] RAX: ffffffffffffffda RBX: 00007f3027d861e3 RCX:
>> > > > > > > 00007f302849d8a7
>> > > > > > >  <srv> [24558.933927] RDX: 00007f3023f49468 RSI: 000000004004743a RDI:
>> > > > > > > 0000000000003a67
>> > > > > > >  <srv> [24558.934266] RBP: 00007f3029a52b20 R08: 0000000000000000 R09:
>> > > > > > > 000055c8308d8e40
>> > > > > > >  <srv> [24558.934607] R10: 0000000000000008 R11: 0000000000000206 R12:
>> > > > > > > 00007f3023f49358
>> > > > > > >  <srv> [24558.934947] R13: 00007ffe86e5723f R14: 0000000000000000 R15:
>> > > > > > > 00007f3029a53700
>> > > > > > >  <srv> [24558.935288]
>> > > > > > >  <srv> [24558.935626] Allocated by task 12622:
>> > > > > > >  <srv> [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
>> > > > > > > [ppp_generic]
>> > > > > > >  <srv> [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
>> > > > > > >  <srv> [24558.936640]  SyS_connect+0x14b/0x1b7
>> > > > > > >  <srv> [24558.936975]  do_syscall_64+0x1b1/0x31f
>> > > > > > >  <srv> [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > > > > > >  <srv> [24558.937655]
>> > > > > > >  <srv> [24558.937993] Freed by task 12622:
>> > > > > > >  <srv> [24558.938321]  kfree+0xb0/0x11d
>> > > > > > >  <srv> [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
>> > > > > > >  <srv> [24558.938994]  __fput+0x2ba/0x51a
>> > > > > > >  <srv> [24558.939332]  task_work_run+0x11c/0x13d
>> > > > > > >  <srv> [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
>> > > > > > >  <srv> [24558.940022]  do_syscall_64+0x2ea/0x31f
>> > > > > > >  <srv> [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > > > > > >  <srv> [24558.947099]
>> > > > > >
>> > > > > > Your first guess was right. It looks like we have an issue with
>> > > > > > reference counting on the channels. Can you send me your ppp_generic.o?
>> > > > > http://nuclearcat.com/ppp_generic.o
>> > > > > Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
>> > > > >
>> > > > From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called
>> > > > concurrently on the same ppp_file. Even if this ppp_file was pointed at
>> > > > by two different file descriptors, I can't see how this could defeat
>> > > > the reference counting mechanism. I'm going to think more about it.
>> > > >
>> > > > Can you test with CONFIG_REFCOUNT_FULL? (and keep
>> > > > d780cd44e3ce ("drivers, net, ppp: convert ppp_file.refcnt from
>> > > > atomic_t to refcount_t")).
>> > > Ok, i will try that tonight. On vanilla kernel or reversing
>> > > mentioned in
>> > > previous email patch?
>> > On vanilla kernel. The other is really a shot in the dark.
>> 
>> As far as i can see there is only KASAN triggered again(and server 
>> rebooted
>> shortly after that), but nothing else:
>> 
> Ok, so no refcount failure detected. Not what I expected... but that's
> still an information. It's getting even harder to find a ppp scenario
> that could lead to such symptoms.
> If that's acceptable for you, you can try reverting the few commits
> that entered after 4.14.
> 
> 02612bb05e51df8489db5e94d0cf8d1c81f87b0c pppoe: take ->needed_headroom
> of lower device into account on xmit
> 0171c41835591e9aa2e384b703ef9a6ae367c610 ppp: unlock all_ppp_mutex
> before registering device
> e6675000f9a404f7651724c0b2e2e71f7247d3a1 ppp: exit_net cleanup checks 
> added
> f02b2320b27c16b644691267ee3b5c110846f49e ppp: Destroy the mutex when 
> cleanup
> 90e229ef61fad240554f5899eb122fbe44990f78 ppp: allow usage in namespaces
> 709c89b45b874b2f81a074b8802a736009873f48 drivers, net, ppp: convert
> syncppp.refcnt from atomic_t to refcount_t
> d780cd44e3cea119a3346e6d7c04d35b9c50d54b drivers, net, ppp: convert
> ppp_file.refcnt from atomic_t to refcount_t
> 313a912155c78ed87ad6fca175dc56b75fd00a58 drivers, net, ppp: convert
> asyncppp.refcnt from atomic_t to refcount_t
> 
> Sorry, but I have nothing better to propose for now. At least that
> should help narrowing the problem space.
> I'm going to stress test ppp_generic and pppoe on my side.
> 
Quick update.
Testing 5 first patches didn't changed anything.
But revering more, with last 4 patches also (i did all together) is 
changing things, probably i need to repeat one night more reverting just 
all refcount_t patches.

  [25222.173840] ------------[ cut here ]------------
  [25222.174259] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 3 timed 
out
  [25222.174618] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:323 
dev_watchdog+0x44a/0x555
  [25222.175212] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
  [25222.177133] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G    B   W        
4.15.3-build-0134 #6
  [25222.184121] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
  [25222.184457] RIP: 0010:dev_watchdog+0x44a/0x555
  [25222.184791] RSP: 0018:ffff8803f22c7d98 EFLAGS: 00010292
  [25222.185127] RAX: 0000000000000000 RBX: ffff8803ded00438 RCX: 
0000000000000000
  [25222.185463] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 
ffffed007e458fa8
  [25222.185797] RBP: ffff8803ded00000 R08: 0000000000000001 R09: 
0000000000000000
  [25222.186133] R10: ffff8803f22c7e30 R11: 0000000000000001 R12: 
ffff8803ded28450
  [25222.186471] R13: 0000000000000003 R14: dffffc0000000000 R15: 
ffff8803ded283c0
  [25222.186804] FS:  0000000000000000(0000) GS:ffff8803f22c0000(0000) 
knlGS:0000000000000000
  [25222.187401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [25222.187739] CR2: 0000561f5bffc128 CR3: 0000000445a0d003 CR4: 
00000000001606e0
  [25222.188077] Call Trace:
  [25222.188410]  <IRQ>
  [25222.188740]  ? dev_graft_qdisc+0xfa/0xfa
  [25222.189072]  call_timer_fn+0x15/0x72
  [25222.189407]  ? dev_graft_qdisc+0xfa/0xfa
  [25222.189741]  expire_timers+0x1b9/0x1d5
  [25222.190072]  run_timer_softirq+0x184/0x361
  [25222.190400]  ? expire_timers+0x1d5/0x1d5
  [25222.190723]  ? enqueue_hrtimer+0xce/0xd8
  [25222.191048]  ? __hrtimer_run_queues+0x1ec/0x24d
  [25222.191373]  __do_softirq+0x17f/0x34a
  [25222.191702]  irq_exit+0x8f/0xf9
  [25222.192034]  smp_apic_timer_interrupt+0xcb/0xd6
  [25222.192365]  apic_timer_interrupt+0x92/0xa0
  [25222.192695]  </IRQ>
  [25222.193023] RIP: 0010:mwait_idle+0x99/0xac
  [25222.193355] RSP: 0018:ffff8803f030fef8 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff11
  [25222.193956] RAX: 0000000000000000 RBX: ffff8803f02e3500 RCX: 
0000000000000000
  [25222.194290] RDX: 1ffff1007e05c6a0 RSI: 0000000000000000 RDI: 
0000000000000000
  [25222.194626] RBP: ffff8803f02e3500 R08: ffffed007ccc8eef R09: 
ffff8803e6647728
  [25222.194958] R10: ffff8803f030fdd0 R11: 0000000000000001 R12: 
0000000000000000
  [25222.195292] R13: dffffc0000000000 R14: ffffed007e05c6a0 R15: 
ffff8803f02e3500
  [25222.195627]  do_idle+0xe6/0x19a
  [25222.195963]  cpu_startup_entry+0x18/0x1a
  [25222.196295]  secondary_startup_64+0xa5/0xb0
  [25222.196625] Code: 68 87 40 01 00 75 3f 48 89 ef c6 05 5c 87 40 01 01 
e8 64 93 fa ff 44 89 e9 48 89 c2 48 89 ee 48 c7 c7 80 28 68 83 e8 25 69 
6d fe <0f> ff eb 17 41 ff c5 49 81 c4 40 0
1 00 00 44 3b 6c 24 04 0f 85
  [25222.197511] ---[ end trace 4b04e9c6754a1cd5 ]---

and then

  [25222.197853] ixgbe 0000:04:00.1 eth1: initiating reset due to tx 
timeout
  [25222.198194] ixgbe 0000:04:00.1 eth1: Reset adapter
  [25227.805896] ixgbe 0000:04:00.1 eth1: initiating reset due to tx 
timeout
  [25232.925944] ixgbe 0000:04:00.1 eth1: initiating reset due to tx 
timeout
  [25236.084968] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[accel-pppd:12627]
  [25236.085562] Modules linked in: pppoe pppox ppp_generic slhc 
netconsole configfs coretemp nf_nat_pptp nf_nat_proto_gre 
nf_conntrack_pptp nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 
xt_set xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net 
ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables 8021q garp mrp stp llc ixgbe dca
  [25236.087496] CPU: 0 PID: 12627 Comm: accel-pppd Tainted: G    B   W   
      4.15.3-build-0134 #6
  [25236.088095] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
  [25236.088430] RIP: 0010:queued_spin_lock_slowpath+0xb1/0x418
  [25236.088759] RSP: 0018:ffff8803e6457a98 EFLAGS: 00000213 ORIG_RAX: 
ffffffffffffff11
  [25236.089353] RAX: 00000000000001fb RBX: ffff880345e75fe0 RCX: 
ffffffff811aeca3
  [25236.089685] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 
ffff880345e75fe0
  [25236.090026] RBP: ffffed0068bcebfc R08: 06030a0001012180 R09: 
ffffed006cc9beb2
  [25236.090369] R10: ffffed006cc9beb3 R11: 0000000000000001 R12: 
0000000000000003
  [25236.090705] R13: 0000000000008021 R14: 0000000000008021 R15: 
00000000034e4b06
  [25236.091043] FS:  00007f94bd26c700(0000) GS:ffff8803f2200000(0000) 
knlGS:0000000000000000
  [25236.091636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [25236.091966] CR2: 00007ffc0935eff8 CR3: 00000003d709b003 CR4: 
00000000001606f0
  [25236.092304] Call Trace:
  [25236.092638]  ppp_push+0x112/0xdda [ppp_generic]
  [25236.092975]  ? enqueue_hrtimer+0xce/0xd8
  [25236.093304]  ? hrtimer_start_range_ns+0x827/0x854
  [25236.093635]  __ppp_xmit_process+0xc6a/0xdd5 [ppp_generic]
  [25236.093969]  ? __kmalloc_reserve.isra.5+0x29/0x96
  [25236.094302]  ? memset+0x1f/0x31
  [25236.094631]  ? ppp_receive_nonmp_frame+0x138c/0x138c [ppp_generic]
  [25236.094962]  ? __alloc_skb+0x2ec/0x431
  [25236.095292]  ? __kmalloc_reserve.isra.5+0x96/0x96
  [25236.095620]  ? timerfd_release+0x1d3/0x1d3
  [25236.095950]  ppp_xmit_process+0xc3/0x194 [ppp_generic]
  [25236.096284]  ppp_write+0x1b7/0x1c3 [ppp_generic]
  [25236.096617]  __vfs_write+0xd9/0x4ad
  [25236.096953]  ? kernel_read+0xed/0xed
  [25236.097283]  ? vfs_copy_file_range+0x6a8/0x6a8
  [25236.097614]  ? bit_waitqueue+0x2a/0x2a
  [25236.097946]  ? __fsnotify_inode_delete+0xc/0xc
  [25236.098276]  ? __fsnotify_inode_delete+0xc/0xc
  [25236.098610]  ? SyS_sendmmsg+0x13/0x13
  [25236.098936]  vfs_write+0x18c/0x378
  [25236.099258]  SyS_write+0xc4/0x13b
  [25236.099579]  ? SyS_read+0x13b/0x13b
  [25236.099902]  ? exit_to_usermode_loop+0x7c/0xaf
  [25236.100225]  ? SyS_read+0x13b/0x13b
  [25236.100550]  do_syscall_64+0x1b1/0x31f
  [25236.100879]  entry_SYSCALL_64_after_hwframe+0x21/0x86
  [25236.101210] RIP: 0033:0x7f94bca53b2d
  [25236.101536] RSP: 002b:00007f94bd26bb80 EFLAGS: 00000293 ORIG_RAX: 
0000000000000001
  [25236.102127] RAX: ffffffffffffffda RBX: 00007f94bb59f1e3 RCX: 
00007f94bca53b2d
  [25236.102461] RDX: 000000000000000c RSI: 00007f94b78895d0 RDI: 
0000000000002f92
  [25236.102793] RBP: 00007f94bd26bbb0 R08: 0000000000000030 R09: 
0000000000000027
  [25236.103127] R10: 0000000000000000 R11: 0000000000000293 R12: 
00007f94b6450eb8
  [25236.103460] R13: 00007ffc8c047a6f R14: 0000000000000000 R15: 
00007f94bd26c700
  [25236.103790] Code: 83 03 00 00 48 89 dd 49 89 dc 48 b8 00 00 00 00 00 
fc ff df 48 c1 ed 03 41 83 e4 07 48 01 c5 41 83 c4 03 8a 45 00 41 38 c4 
7c 0c <84> c0 74 08 48 89 df e8 31 54 17 0
0 8b 03 84 c0 74 04 f3 90 eb

Then system autorebooted.
Maybe i am hitting some qdisc bug now...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ