[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1d5fc498-c783-4857-b8e5-851e00561898@candelatech.com>
Date: Thu, 30 Sep 2021 09:44:34 -0700
From: Ben Greear <greearb@...delatech.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: 5.15-rc3+ crash in fq-codel?
On 9/29/21 6:36 PM, Ben Greear wrote:
> On 9/29/21 5:40 PM, Eric Dumazet wrote:
>>
>>
>> On 9/29/21 5:29 PM, Eric Dumazet wrote:
>>>
>>>
>>> On 9/29/21 5:04 PM, Ben Greear wrote:
>>>> On 9/29/21 4:48 PM, Ben Greear wrote:
>>>>> On 9/29/21 4:42 PM, Eric Dumazet wrote:
>>>>>>
>>>>>>
>>>>>> On 9/29/21 4:28 PM, Eric Dumazet wrote:
>>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Actually the bug seems to be in pktgen, vs NET_XMIT_CN
>>>>>>>
>>>>>>> You probably would hit the same issues with other qdisc also using NET_XMIT_CN
>>>>>>>
>>>>>>
>>>>>> I would try the following patch :
>>>>>>
>>>>>> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
>>>>>> index a3d74e2704c42e3bec1aa502b911c1b952a56cf1..0a2d9534f8d08d1da5dfc68c631f3a07f95c6f77 100644
>>>>>> --- a/net/core/pktgen.c
>>>>>> +++ b/net/core/pktgen.c
>>>>>> @@ -3567,6 +3567,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>>>>>> case NET_XMIT_DROP:
>>>>>> case NET_XMIT_CN:
>>>>>> /* skb has been consumed */
>>>>>> + pkt_dev->last_ok = 1;
>>>>>> pkt_dev->errors++;
>>>>>> break;
>>>>>> default: /* Drivers are not supposed to return other values! */
>>>>
>>>> While patching my variant of pktgen, I took a look at the 'default' case. I think
>>>> it should probably go above NET_XMIT_DROP and fallthrough into the consumed pkt path?
>>>>
>>>> Although, probably not a big deal since only bugs elsewhere would hit that path, and
>>>> we don't really know if skb would be consumed in that case or not.
>>>>
>>>
>>> This is probably dead code after commit
>>>
>>> commit f466dba1832f05006cf6caa9be41fb98d11cb848 pktgen: ndo_start_xmit can return NET_XMIT_xxx values
>>>
>>> So this does not really matter anymore.
>>>
>>>
>>
>> Alternative would be the following patch.
>> NET_XMIT_CN means the packet has been queued for transmit,
>> but that we might have dropped prior packets.
>>
>> Probably not a big deal to make the difference in pktgen.
>>
>> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
>> index a3d74e2704c42e3bec1aa502b911c1b952a56cf1..5c612cbc74c790f64aff5ce602843378284c7119 100644
>> --- a/net/core/pktgen.c
>> +++ b/net/core/pktgen.c
>> @@ -3557,6 +3557,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>> switch (ret) {
>> case NETDEV_TX_OK:
>> + case NET_XMIT_CN:
>> pkt_dev->last_ok = 1;
>> pkt_dev->sofar++;
>> pkt_dev->seq_num++;
>> @@ -3565,8 +3566,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>> goto xmit_more;
>> break;
>> case NET_XMIT_DROP:
>> - case NET_XMIT_CN:
>> /* skb has been consumed */
>> + pkt_dev->last_ok = 1;
>> pkt_dev->errors++;
>> break;
>> default: /* Drivers are not supposed to return other values! */
>>
>
> Yes, I like that the XMIT_CN then means to increment the seq_num, though for my own purposes,
> I wouldn't want to increment the sofar++ in that case (and maybe not do other logic in that case),
> since we know at least something dropped.
>
> For fq-codel, seems that XMIT_CN could mean that the attempted packet actually was queued
> for xmit, but at least some other packets were purged.
>
> Thanks,
> Ben
>
This does fix the crash for me (my patch in my tree is slightly different, but same idea).
Thanks,
Ben
Powered by blists - more mailing lists