lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 8 Feb 2019 22:50:43 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Sander Eikelenboom <linux@...elenboom.it>,
        Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>
Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at
 lib/dynamic_queue_limits.c:27!

On 08.02.2019 22:45, Sander Eikelenboom wrote:
> On 08/02/2019 22:22, Heiner Kallweit wrote:
>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>> L.S.,
>>>>>
>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>> that I haven encountered with Linux 4.20.x.
>>>>>
>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>
>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>
>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>
>>> Hmm i did some diging and i think:
>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>
>> You're right. Thought this was added in 4.20 already.
>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>> have onboard Realtek network I have quite a few testers out there.
>> Does the issue occur under specific circumstances like very high load?
> 
> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
> on the host.
> 
>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>> as author of the underlying changes.
> 
> It could also be the barriers weren't that unneeded as assumed.

The barriers were removed after adding xmit_more handling. Therefore it would be good to
test also with only 
bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
removed.

> Since we are almost at RC6 i took the liberty to CC Eric now.
> 
Sure, thanks.

> BTW am i correct these patches are merely optimizations ?

Yes

> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
> to revert them for 5.0 and try again for 5.1 ?
> 
Before removing both it would be good to test with only the barrier-removal removed.

> --
> Sander
> 
Heiner

> 
>>
>>> would be candidates, which were merged in 5.0.
>>>
>>> I have reverted the first two, see how that works out.
>>>
>>> --
>>> Sander
>>>
>> Heiner
>>
>>>  
>>>>> --
>>>>> Sander
>>>>>
>>>> Heiner
>>>>
>>>>>
>>>>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27!
>>>>> [ 6466.571425] invalid opcode: 0000 [#1] SMP NOPTI
>>>>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1
>>>>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>>>>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140
>>>>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>>>>> [ 6466.648130] RSP: e02b:ffff88807d4c3e78 EFLAGS: 00010297
>>>>> [ 6466.659616] RAX: 0000000000000042 RBX: ffff8880049cf800 RCX: 0000000000000000
>>>>> [ 6466.672835] RDX: 0000000000000001 RSI: 0000000000000042 RDI: ffff8880049cf8c0
>>>>> [ 6466.684521] RBP: ffff888077df7260 R08: 0000000000000001 R09: 0000000000000000
>>>>> [ 6466.696824] R10: 00000000387c2336 R11: 00000000387c2336 R12: 0000000010000000
>>>>> [ 6466.709953] R13: ffff888077df6898 R14: ffff888077df75c0 R15: 0000000000454677
>>>>> [ 6466.722165] FS:  00007fd869147200(0000) GS:ffff88807d4c0000(0000) knlGS:0000000000000000
>>>>> [ 6466.733228] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 6466.746581] CR2: 00007fd867dfd000 CR3: 0000000074884000 CR4: 0000000000000660
>>>>> [ 6466.758366] Call Trace:
>>>>> [ 6466.768118]  <IRQ>
>>>>> [ 6466.778214]  rtl8169_poll+0x4f4/0x640
>>>>> [ 6466.789198]  net_rx_action+0x23d/0x370
>>>>> [ 6466.798467]  __do_softirq+0xed/0x229
>>>>> [ 6466.807039]  irq_exit+0xb7/0xc0
>>>>> [ 6466.815471]  xen_evtchn_do_upcall+0x27/0x40
>>>>> [ 6466.826647]  xen_do_hypervisor_callback+0x29/0x40
>>>>> [ 6466.835902]  </IRQ>
>>>>> [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20
>>>>> [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>>>> [ 6466.874031] RSP: e02b:ffffc90003c0bdd0 EFLAGS: 00000246
>>>>> [ 6466.883452] RAX: 0000000000000000 RBX: 000000041f83bfe8 RCX: ffffffff8100102a
>>>>> [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: deadbeefdeadf00d
>>>>> [ 6466.903402] RBP: 0000000000000fe8 R08: 000000000000000b R09: 0000000000000000
>>>>> [ 6466.911201] R10: deadbeefdeadf00d R11: 0000000000000246 R12: 800000050c346067
>>>>> [ 6466.918491] R13: ffff8880607c4fe8 R14: ffff888005082800 R15: 0000000000000000
>>>>> [ 6466.926647]  ? xen_hypercall_mmu_update+0xa/0x20
>>>>> [ 6466.938195]  ? xen_set_pte_at+0x78/0xe0
>>>>> [ 6466.947046]  ? __handle_mm_fault+0xc43/0x1060
>>>>> [ 6466.955772]  ? do_mmap+0x44b/0x5b0
>>>>> [ 6466.964410]  ? handle_mm_fault+0xf8/0x200
>>>>> [ 6466.973290]  ? __do_page_fault+0x231/0x4a0
>>>>> [ 6466.981973]  ? page_fault+0x8/0x30
>>>>> [ 6466.990904]  ? page_fault+0x1e/0x30
>>>>> [ 6466.999585] Modules linked in:
>>>>> [ 6467.007533] ---[ end trace 94bec01608fe4061 ]---
>>>>> [ 6467.016751] RIP: e030:dql_completed+0x126/0x140
>>>>> [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90
>>>>> [ 6467.039726] RSP: e02b:ffff88807d4c3e78 EFLAGS: 00010297
>>>>> [ 6467.047243] RAX: 0000000000000042 RBX: ffff8880049cf800 RCX: 0000000000000000
>>>>> [ 6467.054202] RDX: 0000000000000001 RSI: 0000000000000042 RDI: ffff8880049cf8c0
>>>>> [ 6467.062000] RBP: ffff888077df7260 R08: 0000000000000001 R09: 0000000000000000
>>>>> [ 6467.069664] R10: 00000000387c2336 R11: 00000000387c2336 R12: 0000000010000000
>>>>> [ 6467.077715] R13: ffff888077df6898 R14: ffff888077df75c0 R15: 0000000000454677
>>>>> [ 6467.084916] FS:  00007fd869147200(0000) GS:ffff88807d4c0000(0000) knlGS:0000000000000000
>>>>> [ 6467.093352] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 6467.101492] CR2: 00007fd867dfd000 CR3: 0000000074884000 CR4: 0000000000000660
>>>>> [ 6467.110542] Kernel panic - not syncing: Fatal exception in interrupt
>>>>> [ 6467.118166] Kernel Offset: disabled
>>>>> (XEN) [2019-02-08 18:04:48.854] Hardware Dom0 crashed: rebooting machine in 5 seconds.
>>>>>
>>>>
>>>
>>>
>>
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ