netdev - RE: [REGRESSION] mlx5: Driver remove during hot unplug is broken

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR05MB4866B2F0A712A5DEAC1AFDA0D1670@AM0PR05MB4866.eurprd05.prod.outlook.com>
Date:   Wed, 8 Jul 2020 15:44:04 +0000
From:   Parav Pandit <parav@...lanox.com>
To:     Niklas Schnelle <schnelle@...ux.ibm.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Shay Drory <shayd@...lanox.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "ubraun@...ux.ibm.com" <ubraun@...ux.ibm.com>,
        "kgraul@...ux.ibm.com" <kgraul@...ux.ibm.com>,
        "raspl@...ibm.com" <raspl@...ibm.com>
Subject: RE: [REGRESSION] mlx5: Driver remove during hot unplug is broken



> From: Niklas Schnelle <schnelle@...ux.ibm.com>
> Sent: Wednesday, July 8, 2020 5:14 PM
> Hi Parav, Hi Shay,
> 
> On 7/8/20 12:43 PM, Parav Pandit wrote:
> > Hi Niklas,
> >
> ... snip ...
> >>>
> >
> > Sorry for my late response.
> > Yes, this looks good and I also found same in my analysis.
> > With latest code mlx5_pci_close() already does drain_health_wq(), so the
> additional call in remove_one() is redundant.
> > It should be just removed.
> > If you can verify below hunk in your setup, it will be really helpful.
> > You still need patch 42ea9f1b5c6 in your tree.
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index 8b658908f044..ebec2318dbc4 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -1382,7 +1382,6 @@ static void remove_one(struct pci_dev *pdev)
> >
> >         devlink_reload_disable(devlink);
> >         mlx5_crdump_disable(dev);
> > -       mlx5_drain_health_wq(dev);
> >         mlx5_unload_one(dev, true);
> >         mlx5_pci_close(dev);
> >         mlx5_mdev_uninit(dev);
> >
> 
> thanks for looking into this. Sadly it looks like this doesn't quite work, I'm
> getting the bewlow illegal dereference after removing the
> mlx5_drain_health_wq(dev) from remove_one() on top of v5.8-rc4.
> I think this is similar to what happned when I tried recreating the reordering
> on -rc3 which is why I contacted Shay as I wasn't able to get this to work again
> with the current code.
> (Sorry if there are formatting issues with the below, plugged this out of a
> x3270).
> 

It is likely because events_cleanup() freed the memory using kvfree() that health recovery context is trying to access in notifier chain.

While reviewing I see few more errors as below.
(a) mlx5_pci_err_detected() invokes mlx5_drain_health_wq() outside of intf_state_mutex.
(b) mlx5_enter_error_state() in commit b6e0b6bebe0 read and updates dev state outside of intf_state_mutex.
(c) due to drain_health_wq() introduction in mlx5_pci_close()  in commit 42ea9f1b5c625 health_wq is flushed twice.
(d) priv->events freed is kvfree() but allocated using kzalloc().

This code certainly needs rework.

In meantime to avoid this regression, I believe below hunk eliminates error introduced in patch 41798df9bfc.
Will you please help test it?

Shay and I did the review of below patch.
If it works I will get it through Saeed's tree and internal reviews.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ebec2318dbc4..529df5703737 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -785,11 +785,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,

 static void mlx5_pci_close(struct mlx5_core_dev *dev)
 {
-       /* health work might still be active, and it needs pci bar in
-        * order to know the NIC state. Therefore, drain the health WQ
-        * before removing the pci bars
-        */
-       mlx5_drain_health_wq(dev);
        iounmap(dev->iseg);
        pci_clear_master(dev->pdev);
        release_bar(dev->pdev);
@@ -1235,6 +1230,7 @@ void mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
        if (cleanup) {
                mlx5_unregister_device(dev);
                mlx5_devlink_unregister(priv_to_devlink(dev));
+               mlx5_drain_health_wq(dev);
        } else {
                mlx5_detach_device(dev);
        }
@@ -1366,6 +1362,11 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *id)
        return 0;

 err_load_one:
+       /* health work might still be active, and it needs pci bar in
+        * order to know the NIC state. Therefore, drain the health WQ
+        * before removing the pci bars
+        */
+       mlx5_drain_health_wq(dev);
        mlx5_pci_close(dev);
 pci_init_err:
        mlx5_mdev_uninit(dev);




> 64.735421¨ Unable to handle kernel pointer dereference in virtual kernel
> address space 64.735470¨ Failing address: 6b6b6b6b6b6b6000 TEID:
> 6b6b6b6b6b6b6803 64.735476¨ Fault in home space mode while using kernel
> ASCE.
> 64.735484¨ AS:0000000088da0007 R3:0000000000000024 64.735533¨ Oops:
> 0038 ilc:3 Ý#1¨ PREEMPT SMP 64.735538¨ Modules linked in: rpcrdma sunrpc
> rdma_ucm rdma_cm iw_cm ib_cm configfs mlx5_ib ib_uverbs ib_core
> mlx5_core dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> s390_trng ghash_s390 prng ctr aes_s390 des_s390 libdes sha3_512_s390
> sha3_256_s390 sha512_s390 sha1_s390 vfio_ccw vfi 64.735421¨ Unable to
> handle kernel pointer dereference in virtual kernel address space 64.735558¨
> CPU: 0 PID: 762 Comm: kworker/u128:3 Not tainted 5.8.0-rc4-dirty #2
> 64.735561¨ Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) 64.735618¨
> Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work
> mlx5_core 64.735623¨ Krnl PSW : 0704e00180000000 00000000876936c6
> (notifier_call_chain+0x3e/0xe8)01: HCPGSP2629I The virtual machine is
> placed in CP mode due to a SIGP stop from CPU 01.01: HCPGSP2629I The
> virtual machine is placed in CP mode due to a SIGP stop from CPU 00.
> 64.735702¨            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> 64.735704¨ Krnl GPRS: 00000000a38fc246 0000000080000001
> 00000000e814db00 0000000000000080
> 64.735706¨            0000000000000001 ffffffffffffffff 0000000000000000
> 0000000000000001
> 64.735708¨            0000000000000080 0000000000000001 ffffffffffffffff
> 6b6b6b6b6b6b6b6b
> 64.735710¨            00000000e5ff0100 0000000000000000 000003e000973bf8
> 000003e000973ba0
> 64.735716¨ Krnl Code: 00000000876936ba: b9040083            lgr     %r8,%r3
> 64.735716¨            00000000876936be: b9040074            lgr     %r7,%r4
> 64.735716¨           #00000000876936c2: a7d80000            lhi     %r13,0
> 64.735716¨           >00000000876936c6: e390b0080004        lg      %r9,8(%r11)
> 64.735716¨            00000000876936cc: e320b0000004        lg      %r2,0(%r11)
> 64.735716¨            00000000876936d2: c0e5ffffd773        brasl
> %r14,000000008768e5b8
> 64.735716¨            00000000876936d8: ec280032007e        cij
> %r2,0,8,000000008769373c
> 64.735716¨            00000000876936de: e310b0000004        lg      %r1,0(%r11)
> 64.735731¨ Call Trace:
> 64.735740¨  <00000000876936c6>¨ notifier_call_chain+0x3e/0xe8 64.735764¨
> <000000008769392c>¨ __atomic_notifier_call_chain+0x9c/0x138
> 64.735766¨  <00000000876939f2>¨ atomic_notifier_call_chain+0x2a/0x38
> 64.735782¨  <000003ff801e99c4>¨ mlx5_enter_error_state+0xec/0x100
> mlx5_core 64.735797¨  <000003ff801e9a0c>¨
> mlx5_fw_fatal_reporter_err_work+0x34/0xb8 mlx5_core¨ 64.735802¨
> <000000008768784c>¨ process_one_work+0x27c/0x478 64.735805¨
> <0000000087687aae>¨ worker_thread+0x66/0x368 64.735807¨
> <0000000087690a96>¨ kthread+0x176/0x1a0 64.735811¨
> <0000000088202820>¨ ret_from_fork+0x24/0x2c 64.735812¨ INFO: lockdep is
> turned off.
> 64.735814¨ Last Breaking-Event-Address:
> 64.735816¨  <0000000087693926>¨ __atomic_notifier_call_chain+0x96/0x138
> 64.735820¨ Kernel panic - not syncing: Fatal exception: panic_on_oops
> 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000
> 8762103E