[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BYAPR11MB33679575EACCE0581E710D83FCC19@BYAPR11MB3367.namprd11.prod.outlook.com>
Date: Mon, 2 May 2022 13:41:57 +0000
From: "G, GurucharanX" <gurucharanx.g@...el.com>
To: ivecera <ivecera@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "Saleem, Shiraz" <shiraz.saleem@...el.com>,
mschmidt <mschmidt@...hat.com>,
open list <linux-kernel@...r.kernel.org>,
"Jakub Kicinski" <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
"moderated list:INTEL ETHERNET DRIVERS"
<intel-wired-lan@...ts.osuosl.org>
Subject: RE: [Intel-wired-lan] [PATCH net v4] ice: Fix race during aux device
(un)plugging
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf Of
> Ivan Vecera
> Sent: Saturday, April 23, 2022 3:50 PM
> To: netdev@...r.kernel.org
> Cc: Saleem, Shiraz <shiraz.saleem@...el.com>; mschmidt
> <mschmidt@...hat.com>; open list <linux-kernel@...r.kernel.org>; Jakub
> Kicinski <kuba@...nel.org>; Paolo Abeni <pabeni@...hat.com>; Leon
> Romanovsky <leonro@...dia.com>; David S. Miller
> <davem@...emloft.net>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@...ts.osuosl.org>
> Subject: [Intel-wired-lan] [PATCH net v4] ice: Fix race during aux device
> (un)plugging
>
> Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device
> initialization and on other side ice_unplug_aux_dev() starts aux device deinit
> and at the end assigns NULL to pf->adev.
> This is wrong because pf->adev should always be non-NULL only when aux
> device is fully initialized and ready. This wrong order causes a crash when
> ice_send_event_to_aux() call occurs because that function depends on non-
> NULL value of pf->adev and does not assume that aux device is half-
> initialized or half-destroyed.
> After order correction the race window is tiny but it is still there, as Leon
> mentioned and manipulation with pf->adev needs to be protected by
> mutex.
>
> Fix (un-)plugging functions so pf->adev field is set after aux device init and
> prior aux device destroy and protect pf->adev assignment by new mutex.
> This mutex is also held during ice_send_event_to_aux() call to ensure that
> aux device is valid during that call.
> Note that device lock used ice_send_event_to_aux() needs to be kept to
> avoid race with aux drv unload.
>
> Reproducer:
> cycle=1
> while :;do
> echo "#### Cycle: $cycle"
>
> ip link set ens7f0 mtu 9000
> ip link add bond0 type bond mode 1 miimon 100
> ip link set bond0 up
> ifenslave bond0 ens7f0
> ip link set bond0 mtu 9000
> ethtool -L ens7f0 combined 1
> ip link del bond0
> ip link set ens7f0 mtu 1500
> sleep 1
>
> let cycle++
> done
>
> In short when the device is added/removed to/from bond the aux device is
> unplugged/plugged. When MTU of the device is changed an event is sent to
> aux device asynchronously. This can race with (un)plugging operation and
> because pf->adev is set too early (plug) or too late
> (unplug) the function ice_send_event_to_aux() can touch uninitialized or
> destroyed fields. In the case of crash below pf->adev->dev.mutex.
>
> Crash:
> [ 53.372066] bond0: (slave ens7f0): making interface the new active one
> [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
> p link
> [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes
> ready
> [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an
> up
> link
> [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed
> inval
> idating tc mappings. Priority traffic classification disabled!
> [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed
> inval
> idating tc mappings. Priority traffic classification disabled!
> [ 54.248204] bond0: (slave ens7f0): Releasing backup interface
> [ 54.253955] bond0: (slave ens7f1): making interface the new active one
> [ 54.274875] bond0: (slave ens7f1): Releasing backup interface
> [ 54.289153] bond0 (unregistering): Released all slaves
> [ 55.383179] MII link monitoring set to 100 ms
> [ 55.398696] bond0: (slave ens7f0): making interface the new active one
> [ 55.405241] BUG: kernel NULL pointer dereference, address:
> 0000000000000080
> [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
> p link
> [ 55.412198] #PF: supervisor write access in kernel mode
> [ 55.412200] #PF: error_code(0x0002) - not-present page
> [ 55.412201] PGD 25d2ad067 P4D 0
> [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G
> S
> 5.17.0-13579-g57f2d6540f03 #1
> [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an
> up
> link
> [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4
> 10/07/
> 2021
> [ 55.430226] Workqueue: ice ice_service_task [ice]
> [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20
> [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75
> 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
> [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
> [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX:
> 0000000000000001
> [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI:
> 0000000000000080
> [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09:
> 0000000000000041
> [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12:
> ff1a79d1c7e48bc0
> [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15:
> 0000000000000000
> [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000)
> knlGS:0000000000000000
> [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4:
> 0000000000771ef0
> [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 55.567305] PKRU: 55555554
> [ 55.570018] Call Trace:
> [ 55.572474] <TASK>
> [ 55.574579] ice_service_task+0xaab/0xef0 [ice]
> [ 55.579130] process_one_work+0x1c5/0x390
> [ 55.583141] ? process_one_work+0x390/0x390
> [ 55.587326] worker_thread+0x30/0x360
> [ 55.590994] ? process_one_work+0x390/0x390
> [ 55.595180] kthread+0xe6/0x110
> [ 55.598325] ? kthread_complete_and_exit+0x20/0x20
> [ 55.603116] ret_from_fork+0x1f/0x30
> [ 55.606698] </TASK>
>
> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
> Reviewed-by: Leon Romanovsky <leonro@...dia.com>
> Signed-off-by: Ivan Vecera <ivecera@...hat.com>
> ---
> drivers/net/ethernet/intel/ice/ice.h | 1 +
> drivers/net/ethernet/intel/ice/ice_idc.c | 25 +++++++++++++++--------
> drivers/net/ethernet/intel/ice/ice_main.c | 2 ++
> 3 files changed, 20 insertions(+), 8 deletions(-)
>
Tested-by: Gurucharan <gurucharanx.g@...el.com> (A Contingent worker at Intel)
Powered by blists - more mailing lists