lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a878bd00-a521-58b5-d54a-b2e4c542dcb0@gmail.com>
Date:   Sat, 16 Mar 2019 16:10:26 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     VDR User <user.vdr@...il.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>
Cc:     netdev@...r.kernel.org
Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb

On 16.03.2019 15:38, VDR User wrote:
>> Part of the issue though is that we don't know how reliable that test
>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>> that it had actually resolved the issue.
> 
> Previously I thought I could easily & consistently reproduce the crash
> but the more testing I did, the more I realized that wasn't the case.
> That's why my confidence was low in that reversing commit 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
> needed to do a lot more testing over the weekend to be sure. But, I
> can now confirm that reversing that commit did not solve the problem.
> I didn't ifdown/ifup after the crash so the nic eventually recovered
> on its own I guess. The `ethtool -S` output is:
> 
> NIC statistics:
>      tx_packets: 5370650
>      rx_packets: 57340787
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 26
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 57332905
>      broadcast: 6409
>      multicast: 1473
>      tx_aborted: 0
>      tx_underrun: 0
> 
> The dmesg log looks the same as before:
> 
> [95579.984062] ------------[ cut here ]------------
> [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
> [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
> dev_watchdog+0x1bb/0x1e0
> [95579.984276] Modules linked in: snd_hda_codec_hdmi
> snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
> snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
> usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
> [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
> [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
> [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
> [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
> 00 00
> [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
> [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
> [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
> [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
> [95579.986325] Call Trace:
> [95579.986356]  <SOFTIRQ>
> [95579.986389]  ? qdisc_put_unlocked+0x40/0x40
> [95579.986423]  call_timer_fn+0x19/0xa0
> [95579.986456]  run_timer_softirq+0x337/0x380
> [95579.986488]  ? qdisc_put_unlocked+0x40/0x40
> [95579.986521]  ? rcu_process_callbacks+0xcb/0x380
> [95579.986555]  __do_softirq+0xd6/0x21c
> [95579.986586]  ? __irqentry_text_end+0x18/0x18
> [95579.986619]  call_on_stack+0x10/0x60
> [95579.986646]  </SOFTIRQ>
> [95579.986674]  ? irq_exit+0x91/0xc0
> [95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
> [95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
> [95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
> [95579.986790]  ? cpuidle_enter_state+0x122/0x360
> [95579.986821]  ? cpuidle_enter+0xf/0x20
> [95579.986850]  ? call_cpuidle+0x1c/0x40
> [95579.986878]  ? do_idle+0x1e6/0x220
> [95579.986906]  ? cpu_startup_entry+0x25/0x40
> [95579.986934]  ? start_secondary+0x1a5/0x220
> [95579.986963]  ? startup_32_smp+0x15f/0x164
> [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---
> 
> Please let me know if there's anything I can do to help.
> Derek
> 
The other changes between 4.20 and 5.0 don't look like they could cause
the issue. And two critical ones have been reverted already.
So what would need to be done is bisecting the issue.

Heiner

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ