[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <824acd0b-920c-7554-4a63-d80c7de2a8b6@gmail.com>
Date: Sat, 9 Feb 2019 12:50:03 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Sander Eikelenboom <linux@...elenboom.it>,
Eric Dumazet <eric.dumazet@...il.com>,
Realtek linux nic maintainers <nic_swsd@...ltek.com>,
Eric Dumazet <edumazet@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at
lib/dynamic_queue_limits.c:27!
On 09.02.2019 11:07, Sander Eikelenboom wrote:
> On 09/02/2019 10:59, Heiner Kallweit wrote:
>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>> L.S.,
>>>>>>>>>>>
>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below,
>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>>>
>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>>>
>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>
>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>
>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>>
>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>>>> on the host.
>>>>>>>
>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>>>> as author of the underlying changes.
>>>>>>>
>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>
>>>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>>>> test also with only
>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>> removed.
>>>>>>
>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>>
>>>>>> Sure, thanks.
>>>>>>
>>>>>>> BTW am i correct these patches are merely optimizations ?
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>>>
>>>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>>>
>>>>>
>>>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>>>
>>>>> You could try :
>>>>>
>>>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>> dma_addr_t mapping;
>>>>> u32 opts[2], len;
>>>>> bool stop_queue;
>>>>> + bool door_bell;
>>>>> int frags;
>>>>>
>>>>> if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>> /* Force memory writes to complete before releasing descriptor */
>>>>> dma_wmb();
>>>>>
>>>>> + door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>>>> +
>>>>> txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>>>
>>>>> /* Force all memory writes to complete before notifying device */
>>>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>> if (unlikely(stop_queue))
>>>>> netif_stop_queue(dev);
>>>>>
>>>>> - if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>>>> + if (door_bell) {
>>>>> RTL_W8(tp, TxPoll, NPQ);
>>>>> mmiowb();
>>>>> }
>>>>>
>>>> Thanks a lot for checking and for the proposed fix.
>>>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
>>>
>>> I have done that already during the night .. the results:
>>> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>>> (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
>>>
>>> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>>> The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>>> this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>>> compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>>> Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
>>>
>>> If I can, it is a separate issue.
>>> If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>>> a revert would be the right thing to do (since as you indicated these are merely optimizations),
>>> which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>>> (especially since I seem to still have other issues which need to be sorted out and time is limited)
>>>
>>> The timeout in question:
>>> [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>>> [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>>> [28336.893358] Modules linked in:
>>> [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G D 5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>>> [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010
>>> [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>>> [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>>> [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>>> [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>>> [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>>> [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>>> [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>>> [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>>> [28337.049832] FS: 00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>>> [28337.062524] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>>> [28337.090052] Call Trace:
>>> [28337.103615] <IRQ>
>>> [28337.116587] ? qdisc_destroy+0x120/0x120
>>> [28337.128905] call_timer_fn+0x19/0x90
>>> [28337.141892] expire_timers+0x8b/0xa0
>>> [28337.153354] run_timer_softirq+0x7e/0x160
>>> [28337.165931] ? handle_irq_event_percpu+0x4c/0x70
>>> [28337.176548] ? handle_percpu_irq+0x32/0x50
>>> [28337.186734] __do_softirq+0xed/0x229
>>> [28337.196404] ? hypervisor_callback+0xa/0x20
>>> [28337.207822] irq_exit+0xb7/0xc0
>>> [28337.218978] xen_evtchn_do_upcall+0x27/0x40
>>> [28337.230763] xen_do_hypervisor_callback+0x29/0x40
>>> [28337.241261] </IRQ>
>>> [28337.253283] RIP: e033:0xff7e62
>>> [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>>> [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>>> [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>>> [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>>> [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>>> [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>>> [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>>> [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
>>>
>> Thanks for your efforts. As usual this tx timeout trace says basically nothing except
>> "timeout" and root cause could be anything. Earlier you reported a memory allocation error,
>> did that occur again?
>> If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
>> contribute to the issue) and just submit a patch to effectively revert
>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.
>
> I can't say if that is correct, because i haven't tested that.
>
> Another thing I could test is:
> - putting all the r8169 patches (and prerequisites) that went into 5.0
> up to bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3, onto 4.20.7 and see what that does.
> If that would be feasible (not too many needed prerequisites out of r8169) and if
> you could spare me some time and prep such a branch somewhere so i can pull and compile that,
> that would be great.
>
Unfortunately there's quite a number of changes. Regarding __netdev_tx_sent_queue()
and watchdog timeout I found the following comment in drivers/net/ethernet/sfc/tx.c,
efx_enqueue_skb():
if (__netdev_tx_sent_queue(tx_queue->core_txq, skb_len, xmit_more)) {
struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
/* There could be packets left on the partner queue if those
* SKBs had skb->xmit_more set. If we do not push those they
* could be left for a long time and cause a netdev watchdog.
*/
if (txq2->xmit_more_available)
efx_nic_push_buffers(txq2);
But I'm not sure whether the situation in r8169 is comparable. The following patch
implements what I mentioned earlier: It leaves all other 5.0 changes in place and
effectively reverts 2e6eedb4813e34d8d84ac0eb3afb668966f3f356. Would be great if
you could give it a try.
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..3cca2ffb2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6192,7 +6192,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
struct device *d = tp_to_dev(tp);
dma_addr_t mapping;
u32 opts[2], len;
- bool stop_queue;
int frags;
if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -6234,6 +6233,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
txd->opts2 = cpu_to_le32(opts[1]);
+ netdev_sent_queue(dev, skb->len);
+
skb_tx_timestamp(skb);
/* Force memory writes to complete before releasing descriptor */
@@ -6246,14 +6247,14 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
tp->cur_tx += frags + 1;
- stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
- if (unlikely(stop_queue))
- netif_stop_queue(dev);
-
- if (__netdev_sent_queue(dev, skb->len, skb->xmit_more))
- RTL_W8(tp, TxPoll, NPQ);
+ RTL_W8(tp, TxPoll, NPQ);
- if (unlikely(stop_queue)) {
+ if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+ /* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
+ * not miss a ring update when it notices a stopped queue.
+ */
+ smp_wmb();
+ netif_stop_queue(dev);
/* Sync with rtl_tx:
* - publish queue status and cur_tx ring index (write barrier)
* - refresh dirty_tx ring index (read barrier).
--
2.20.1
Powered by blists - more mailing lists