[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250924012039.66a2411c.michal.pecio@gmail.com>
Date: Wed, 24 Sep 2025 01:20:39 +0200
From: Michal Pecio <michal.pecio@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: I Viswanath <viswanathiyyappan@...il.com>, andrew@...n.ch,
andrew+netdev@...n.ch, davem@...emloft.net, david.hunter.linux@...il.com,
edumazet@...gle.com, linux-kernel-mentees@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
netdev@...r.kernel.org, pabeni@...hat.com, petkan@...leusys.com,
skhan@...uxfoundation.org,
syzbot+78cae3f37c62ad092caa@...kaller.appspotmail.com
Subject: Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in
rtl8150_set_multicast
On Tue, 23 Sep 2025 07:28:09 -0700, Jakub Kicinski wrote:
> Excellent, could you check if there is any adverse effect of
> repeatedly writing the RCR register under heavy Tx traffic (without
> stopping/waking the Tx queue)? The driver seems to pause Tx when RCR
> is written, seems like an odd thing to do without a reason, but
> driver authors do the darndest things.
I don't know what's the point of this, because it doesn't prevent the
async "set RCR" control request racing with an async TX URB submitted
before the queue was stopped or after it was restarted.
Such races could be prevented by net core not calling this while TX
is outstanding and not issuing TX until the control request completes,
but it doesn't look like any of that is the case?
I sucessfully reproduced the double submit bug as follows:
ifconfig eth1 10.9.9.9
arp -s 10.9.8.7 87:87:87:87:87:87 # doesn't actually exist
ping -f 10.9.8.7
while :; do ifconfig eth1 allmulti; ifconfig eth1 -allmulti; done
For some reason I had to use two instances of 'ping -f', not sure why.
Then the double submission warning appears in a few seconds and also
some refcount issues, probably on skbs (dev->tx_skb gets mixed up).
With the patch, it all goes away and doesn't show up even after a few
minutes. I also tried with two TCP streams to a real machine and only
observed a 20KB/s decrease in throughput while the ifconfig allmulti
loop is running, probably due to USB bandwidth. So it looks OK.
But one annoying problem is that those control requests are posted
asynchronously and under my test they seem to accumulate faster than
they drain. I get brief or not so brief lockups when USB core cleans
this up on sudden disconnection. And rtl8150_disconnect() should kill
them, but it doesn't.
Not sure how this is supposed to work in a well-behaved net driver? Is
this callback expected to return without sleeping and have an immediate
effect? I can't see this working with USB.
Regards,
Michal
Powered by blists - more mailing lists