lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzTWwf/FyzBKGaww@chmeee>
Date:   Wed, 28 Sep 2022 16:20:33 -0700
From:   Kevin Mitchell <kevmitch@...sta.com>
To:     Antoine Tenart <atenart@...nel.org>
Cc:     Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: new warning caused by ("net-sysfs: update the queue counts in
 the unregistration path")

On Wed, Sep 28, 2022 at 11:46:20AM +0200, Antoine Tenart wrote:
> Quoting Kevin Mitchell (2022-09-28 03:27:46)
> > With the inclusion of d7dac083414e ("net-sysfs: update the queue counts in the
> > unregistration path"), we have started see the following message during one of
> > our stress tests that brings an interface up and down while continuously
> > trying to send out packets on it:
> >
> > et3_11_1 selects TX queue 0, but real number of TX queues is 0
> >
> > It seems that this is a result of a race between remove_queue_kobjects() and
> > netdev_cap_txqueue() for the last packets before setting dev->flags &= ~IFF_UP
> > in __dev_close_many(). When this message is displayed, netdev_cap_txqueue()
> > selects queue 0 anyway (the noop queue at this point). As it did before the
> > above commit, that queue (which I guess is still around due to reference
> > counting) proceeds to drop the packet and return NET_XMIT_CN. So there doesn't
> > appear to be a functional change. However, the warning message seems to be
> > spurious if not slightly confusing.
>
> Do you know the call traces leading to this? Also I'm not 100% sure to
> follow as remove_queue_kobjects is called in the unregistration path
> while the test is setting the iface up & down. What driver is used?

Sorry, my language was imprecise. The device is being unregistered and
re-registered. The driver is out of tree for our front panel ports. I don't
think this is specific to the driver, but I'd be happy to be convinced
otherwise.

The call trace to the queue removal is

[  628.165565]  dump_stack+0x74/0x90
(remove_queue_kobject)
[  628.165569]  netdev_unregister_kobject+0x7a/0xb3
[  628.165572]  rollback_registered_many+0x560/0x5c4
[  628.165576]  unregister_netdevice_queue+0xa3/0xfc
[  628.165578]  unregister_netdev+0x1e/0x25
[  628.165589]  fdev_free+0x26e/0x29d [strata_dma_drv]

The call trace to the warning message is

[ 1094.355489]  dump_stack+0x74/0x90
(netdev_cap_txqueue)
[ 1094.355495]  netdev_core_pick_tx+0x91/0xaf
[ 1094.355500]  __dev_queue_xmit+0x249/0x602
[ 1094.355503]  ? printk+0x58/0x6f
[ 1094.355510]  dev_queue_xmit+0x10/0x12
[ 1094.355518]  packet_sendmsg+0xe88/0xeee
[ 1094.355524]  ? update_curr+0x6b/0x15d
[ 1094.355530]  sock_sendmsg_nosec+0x12/0x1d
[ 1094.355533]  sock_write_iter+0x8a/0xb6
[ 1094.355539]  new_sync_write+0x7c/0xb4
[ 1094.355543]  vfs_write+0xfe/0x12a
[ 1094.355547]  ksys_write+0x6e/0xb9
[ 1094.355552]  ? exit_to_user_mode_prepare+0xd3/0xf0
[ 1094.355555]  __x64_sys_write+0x1a/0x1c
[ 1094.355559]  do_syscall_64+0x31/0x40
[ 1094.355564]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

>
> As you said and looking around queue 0 is somewhat special and used as a
> fallback. My suggestion would be to 1) check if the above race is
> expected 2) if yes, a possible solution would be not to warn when
> real_num_tx_queues == 0 as in such cases selecting queue 0 would be the
> expected fallback (and you might want to check places like [1]).

Yes this is exactly where this is happening and that sounds like a good idea to
me. As far as I can tell, the message is completely innocuous. If there really
are no cases where it is useful to have this warning for real_num_tx_queues ==
0, I could submit a patch to not emit it in that case.

>
> Thanks,
> Antoine
>
> [1] https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4126

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ