linux-kernel - Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <70e9a3fe-158a-c3a2-a427-2343bc6c9031@gmail.com>
Date:   Sat, 9 Feb 2019 10:59:40 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Sander Eikelenboom <linux@...elenboom.it>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>
Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at
 lib/dynamic_queue_limits.c:27!

On 09.02.2019 10:34, Sander Eikelenboom wrote:
> On 09/02/2019 10:02, Heiner Kallweit wrote:
>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>
>>>
>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>> L.S.,
>>>>>>>>>
>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>
>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>
>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>
>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>
>>>>>>> Hmm i did some diging and i think:
>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>
>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>
>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>> on the host.
>>>>>
>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>> as author of the underlying changes.
>>>>>
>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>
>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>> test also with only 
>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>> removed.
>>>>
>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>
>>>> Sure, thanks.
>>>>
>>>>> BTW am i correct these patches are merely optimizations ?
>>>>
>>>> Yes
>>>>
>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>
>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>
>>>
>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>
>>> You could try :
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         dma_addr_t mapping;
>>>         u32 opts[2], len;
>>>         bool stop_queue;
>>> +       bool door_bell;
>>>         int frags;
>>>  
>>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         /* Force memory writes to complete before releasing descriptor */
>>>         dma_wmb();
>>>  
>>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>> +
>>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>  
>>>         /* Force all memory writes to complete before notifying device */
>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>         if (unlikely(stop_queue))
>>>                 netif_stop_queue(dev);
>>>  
>>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>> +       if (door_bell) {
>>>                 RTL_W8(tp, TxPoll, NPQ);
>>>                 mmiowb();
>>>         }
>>>
>> Thanks a lot for checking and for the proposed fix.
>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
> 
> I have done that already during the night .. the results:
> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>   (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
> 
> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>   The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>   this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>   compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>   Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
> 
>   If I can, it is a separate issue.
>   If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>   a revert would be the right thing to do (since as you indicated these are merely optimizations), 
>   which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>   (especially since I seem to still have other issues which need to be sorted out and time is limited)
> 
>   The timeout in question:
>         [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>         [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>         [28336.893358] Modules linked in:
>         [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>         [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>         [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>         [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>         [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>         [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>         [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>         [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>         [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>         [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>         [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>         [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>         [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>         [28337.090052] Call Trace:
>         [28337.103615]  <IRQ>
>         [28337.116587]  ? qdisc_destroy+0x120/0x120
>         [28337.128905]  call_timer_fn+0x19/0x90
>         [28337.141892]  expire_timers+0x8b/0xa0
>         [28337.153354]  run_timer_softirq+0x7e/0x160
>         [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
>         [28337.176548]  ? handle_percpu_irq+0x32/0x50
>         [28337.186734]  __do_softirq+0xed/0x229
>         [28337.196404]  ? hypervisor_callback+0xa/0x20
>         [28337.207822]  irq_exit+0xb7/0xc0
>         [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
>         [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
>         [28337.241261]  </IRQ>
>         [28337.253283] RIP: e033:0xff7e62
>         [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>         [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>         [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>         [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>         [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>         [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>         [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>         [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
> 
Thanks for your efforts. As usual this tx timeout trace says basically nothing except
"timeout" and root cause could be anything. Earlier you reported a memory allocation error,
did that occur again?
If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
contribute to the issue) and just submit a patch to effectively revert
2e6eedb4813e34d8d84ac0eb3afb668966f3f356.

> --
> Sander
> 
>  
>>>
>>> .
>>>
>> Heiner
>>
> 
>