[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <296232ac-e7ed-6e3c-36b9-ed430a21f632@candelatech.com>
Date: Mon, 27 Sep 2021 17:16:39 -0700
From: Ben Greear <greearb@...delatech.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: 5.15-rc3+ crash in fq-codel?
On 9/27/21 5:04 PM, Ben Greear wrote:
> On 9/27/21 4:49 PM, Eric Dumazet wrote:
>>
>>
>> On 9/27/21 4:30 PM, Ben Greear wrote:
>>> Hello,
>>>
>>> In a hacked upon kernel, I'm getting crashes in fq-codel when doing bi-directional
>>> pktgen traffic on top of mac-vlans. Unfortunately for me, I've made big changes to
>>> pktgen so I cannot easily run this test on stock kernels, and there is some chance
>>> some of my hackings have caused this issue.
>>>
>>> But, in case others have seen similar, please let me know. I shall go digging
>>> in the meantime...
>>>
>>> Looks to me like 'skb' is NULL in line 120 below.
>>
>>
>> pktgen must not be used in a mode where a single skb
>> is cloned and reused, if packet needs to be stored in a qdisc.
>>
>> qdisc of all sorts assume skb->next/prev can be used as
>> anchor in their list.
>>
>> If the same skb is queued multiple times, lists are corrupted.
>>
>> Please double check your clone_skb pktgen setup.
>>
>> I thought we had IFF_TX_SKB_SHARING for this, and that macvlan was properly clearing this bit.
>
> My pktgen config was not using any duplicated queueing in this case.
>
> I changed to pfifo fast and so far it is stable for ~10 minutes, where before it would crash
> within a minute. I'll let it bake overnight....
Still running stable. I also notice we have been using fq-codel for a while and haven't noticed
this problem (next most recent kernel we might have run similar test on would be 5.13-ish).
I'll duplicate this test on our older kernels tomorrow to see if it looks like a regression or
if we just haven't actually done this exact test in a while...
Thanks,
Ben
Powered by blists - more mailing lists