lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB00299035ECB2E34F60BC2C74E9FE9@MWHPR11MB0029.namprd11.prod.outlook.com>
Date:   Mon, 9 Jan 2023 19:36:06 +0000
From:   "Saleem, Shiraz" <shiraz.saleem@...el.com>
To:     Jason Gunthorpe <jgg@...pe.ca>,
        Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>,
        "Ertman, David M" <david.m.ertman@...el.com>,
        "Wesierski, DawidX" <dawidx.wesierski@...el.com>
CC:     "kamalheib1@...il.com" <kamalheib1@...il.com>,
        "leon@...nel.org" <leon@...nel.org>,
        "sashal@...nel.org" <sashal@...nel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Igor Raits <igor.raits@...ddata.com>
Subject: RE: Network do not works with linux >= 6.1.2. Issue bisected to
 "425c9bd06b7a70796d880828d15c11321bdfb76d" (RDMA/irdma: Report the correct
 link speed)

> Subject: Re: Network do not works with linux >= 6.1.2. Issue bisected to
> "425c9bd06b7a70796d880828d15c11321bdfb76d" (RDMA/irdma: Report the
> correct link speed)
> 
> On Fri, Jan 06, 2023 at 08:55:29AM +0100, Jaroslav Pulchart wrote:
> > [  257.967099] task:NetworkManager  state:D stack:0     pid:3387
> > ppid:1      flags:0x00004002
> > [  257.975446] Call Trace:
> > [  257.977901]  <TASK>
> > [  257.980004]  __schedule+0x1eb/0x630 [  257.983498]
> > schedule+0x5a/0xd0 [  257.986641]  schedule_timeout+0x11d/0x160 [
> > 257.990654]  __wait_for_common+0x90/0x1e0 [  257.994666]  ?
> > usleep_range_state+0x90/0x90 [  257.998854]
> > __flush_workqueue+0x13a/0x3f0 [  258.002955]  ?
> > __kernfs_remove.part.0+0x11e/0x1e0
> > [  258.007661]  ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [
> > 258.012721]  __ib_unregister_device+0x62/0xa0 [ib_core] [  258.017959]
> > ib_unregister_device+0x22/0x30 [ib_core] [  258.023024]
> > irdma_remove+0x1a/0x60 [irdma] [  258.027223]
> > auxiliary_bus_remove+0x18/0x30 [  258.031414]
> > device_release_driver_internal+0x1aa/0x230
> > [  258.036643]  bus_remove_device+0xd8/0x150 [  258.040654]
> > device_del+0x18b/0x3f0 [  258.044149]  ice_unplug_aux_dev+0x42/0x60
> > [ice]
> 
> We talked about this already - wasn't it on this series?

This is yet another path (when ice ports are added to a bond) I believe where the RDMA aux device
is removed holding the RTNL lock. It's being exposed now with this recent irdma patch - 425c9bd06b7a,
causing a deadlock.

ice_lag_event_handler [rtnl_lock]
 ->ice_lag_changeupper_event
     ->ice_unplug_aux_dev
        ->irdma_remove
            ->ib_unregister_device
               ->ib_cache_cleanup_one
                  ->flush_workqueue(ib)
                     ->irdma_query_port
                         -> ib_get_eth_speed [rtnl_lock]

Previous discussion was on ethtool channel config change, https://lore.kernel.org/linux-rdma/Y5ES3kmYSINlAQhz@x130/,
which David E. is taking care of.

We are working on a patch for this issue.

> 
> Don't hold locks when removing aux devices.
> 
> > [  258.048707]  ice_lag_changeupper_event+0x287/0x2a0 [ice] [
> > 258.054038]  ice_lag_event_handler+0x51/0x130 [ice] [  258.058930]
> > raw_notifier_call_chain+0x41/0x60 [  258.063381]
> > __netdev_upper_dev_link+0x1a0/0x370
> > [  258.068008]  netdev_master_upper_dev_link+0x3d/0x60
> > [  258.072886]  bond_enslave+0xd16/0x16f0 [bonding] [  258.077517]  ?
> > nla_put+0x28/0x40 [  258.080756]  do_setlink+0x26c/0xc10 [
> > 258.084249]  ? avc_alloc_node+0x27/0x180 [  258.088173]  ?
> > __nla_validate_parse+0x141/0x190 [  258.092708]
> > __rtnl_newlink+0x53a/0x620 [  258.096549]  rtnl_newlink+0x44/0x70
> 
> Especially not the rtnl.
> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ