[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a862beed-3361-4f78-b412-87b78095ac84@kernel.org>
Date: Mon, 27 Oct 2025 11:33:43 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>, Chris Arges <carges@...udflare.com>
Cc: netdev@...r.kernel.org, makita.toshiaki@....ntt.co.jp,
Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
<davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
ihor.solodrai@...ux.dev, toshiaki.makita1@...il.com, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [PATCH net V1 2/3] veth: stop and start all TX queue in netdev
down/up
On 25/10/2025 02.54, Jakub Kicinski wrote:
> On Thu, 23 Oct 2025 16:59:37 +0200 Jesper Dangaard Brouer wrote:
>> The veth driver started manipulating TXQ states in commit
>> dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring
>> to reduce TX drops").
>>
>> Other drivers manipulating TXQ states takes care of stopping
>> and starting TXQs in NDOs. Thus, adding this to veth .ndo_open
>> and .ndo_stop.
>
> Kinda, but taking a device up or down resets the qdisc, IIRC.
>
> So stopping the qdisc for real drivers is mostly a way to make sure
> that there's nothing entering the xmit handler as the driver dismantles
> its state.
>
> I'm not sure if this is an official rule, but I'm under the impression
> that stopping the queues or carrier loss (and
> netif_tx_stop_all_queues(peer) in close() is stopping peer's Tx queue
> on carrier loss) is inadvisable as it may lead to old packets getting
> transmitted when carrier comes back.
>
> IOW based on the commit msg - I'm not sure this patch is needed..
During incident, when doing ip link set 'down' flushed all packets in
the qdisc, but the TXQs were not reset (started again) on link 'up'.
Thus, the qdisc would fill-up again and block all packets on interface.
Chris also tried to replace the qdisc, but the TXQ was still in stopped
mode QUEUE_STATE_DRV_XOFF state.
This was the origin of the patch, that we could not recover the machine
from this state. Thus, the idea of starting all queue on link 'up',
would give us a recovery mechanism. With dev_watchdog this change isn't
really needed.
As you mention this may lead to old packets getting transmitted when
carrier comes back, which would be a changed behavior, that we don't
want in a fixes patch. So, I will drop this patch.
--Jesper
Powered by blists - more mailing lists