[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a896ce5-ff52-0c44-752c-f6d238d6d8d9@candelatech.com>
Date: Wed, 29 Sep 2021 18:36:50 -0700
From: Ben Greear <greearb@...delatech.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: 5.15-rc3+ crash in fq-codel?
On 9/29/21 5:40 PM, Eric Dumazet wrote:
>
>
> On 9/29/21 5:29 PM, Eric Dumazet wrote:
>>
>>
>> On 9/29/21 5:04 PM, Ben Greear wrote:
>>> On 9/29/21 4:48 PM, Ben Greear wrote:
>>>> On 9/29/21 4:42 PM, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 9/29/21 4:28 PM, Eric Dumazet wrote:
>>>>>>
>>>>>
>>>>>>
>>>>>> Actually the bug seems to be in pktgen, vs NET_XMIT_CN
>>>>>>
>>>>>> You probably would hit the same issues with other qdisc also using NET_XMIT_CN
>>>>>>
>>>>>
>>>>> I would try the following patch :
>>>>>
>>>>> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
>>>>> index a3d74e2704c42e3bec1aa502b911c1b952a56cf1..0a2d9534f8d08d1da5dfc68c631f3a07f95c6f77 100644
>>>>> --- a/net/core/pktgen.c
>>>>> +++ b/net/core/pktgen.c
>>>>> @@ -3567,6 +3567,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>>>>> case NET_XMIT_DROP:
>>>>> case NET_XMIT_CN:
>>>>> /* skb has been consumed */
>>>>> + pkt_dev->last_ok = 1;
>>>>> pkt_dev->errors++;
>>>>> break;
>>>>> default: /* Drivers are not supposed to return other values! */
>>>
>>> While patching my variant of pktgen, I took a look at the 'default' case. I think
>>> it should probably go above NET_XMIT_DROP and fallthrough into the consumed pkt path?
>>>
>>> Although, probably not a big deal since only bugs elsewhere would hit that path, and
>>> we don't really know if skb would be consumed in that case or not.
>>>
>>
>> This is probably dead code after commit
>>
>> commit f466dba1832f05006cf6caa9be41fb98d11cb848 pktgen: ndo_start_xmit can return NET_XMIT_xxx values
>>
>> So this does not really matter anymore.
>>
>>
>
> Alternative would be the following patch.
> NET_XMIT_CN means the packet has been queued for transmit,
> but that we might have dropped prior packets.
>
> Probably not a big deal to make the difference in pktgen.
>
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index a3d74e2704c42e3bec1aa502b911c1b952a56cf1..5c612cbc74c790f64aff5ce602843378284c7119 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -3557,6 +3557,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>
> switch (ret) {
> case NETDEV_TX_OK:
> + case NET_XMIT_CN:
> pkt_dev->last_ok = 1;
> pkt_dev->sofar++;
> pkt_dev->seq_num++;
> @@ -3565,8 +3566,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
> goto xmit_more;
> break;
> case NET_XMIT_DROP:
> - case NET_XMIT_CN:
> /* skb has been consumed */
> + pkt_dev->last_ok = 1;
> pkt_dev->errors++;
> break;
> default: /* Drivers are not supposed to return other values! */
>
Yes, I like that the XMIT_CN then means to increment the seq_num, though for my own purposes,
I wouldn't want to increment the sofar++ in that case (and maybe not do other logic in that case),
since we know at least something dropped.
For fq-codel, seems that XMIT_CN could mean that the attempted packet actually was queued
for xmit, but at least some other packets were purged.
Thanks,
Ben
--
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc http://www.candelatech.com
Powered by blists - more mailing lists