lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220423121527.79fa5efb@ceranb>
Date:   Sat, 23 Apr 2022 12:15:27 +0200
From:   Ivan Vecera <ivecera@...hat.com>
To:     "Ertman, David M" <david.m.ertman@...el.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        poros <poros@...hat.com>, mschmidt <mschmidt@...hat.com>,
        Leon Romanovsky <leon@...nel.org>,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Saleem, Shiraz" <shiraz.saleem@...el.com>,
        "moderated list:INTEL ETHERNET DRIVERS" 
        <intel-wired-lan@...ts.osuosl.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3] ice: Fix race during aux device (un)plugging

On Fri, 22 Apr 2022 20:55:10 +0000
"Ertman, David M" <david.m.ertman@...el.com> wrote:

> > -----Original Message-----
> > From: Ertman, David M
> > Sent: Friday, April 22, 2022 10:42 AM
> > To: Ivan Vecera <ivecera@...hat.com>; netdev@...r.kernel.org
> > Cc: poros <poros@...hat.com>; mschmidt <mschmidt@...hat.com>; Leon
> > Romanovsky <leon@...nel.org>; Brandeburg, Jesse
> > <jesse.brandeburg@...el.com>; Nguyen, Anthony L
> > <anthony.l.nguyen@...el.com>; David S. Miller <davem@...emloft.net>;
> > Jakub Kicinski <kuba@...nel.org>; Paolo Abeni <pabeni@...hat.com>;
> > Saleem, Shiraz <shiraz.saleem@...el.com>; moderated list:INTEL ETHERNET
> > DRIVERS <intel-wired-lan@...ts.osuosl.org>; open list <linux-  
> > kernel@...r.kernel.org>  
> > Subject: RE: [PATCH net v3] ice: Fix race during aux device (un)plugging
> >   
> > > -----Original Message-----
> > > From: Ivan Vecera <ivecera@...hat.com>
> > > Sent: Wednesday, April 20, 2022 11:09 PM
> > > To: netdev@...r.kernel.org
> > > Cc: poros <poros@...hat.com>; mschmidt <mschmidt@...hat.com>;  
> > Leon  
> > > Romanovsky <leon@...nel.org>; Brandeburg, Jesse
> > > <jesse.brandeburg@...el.com>; Nguyen, Anthony L
> > > <anthony.l.nguyen@...el.com>; David S. Miller <davem@...emloft.net>;
> > > Jakub Kicinski <kuba@...nel.org>; Paolo Abeni <pabeni@...hat.com>;
> > > Ertman, David M <david.m.ertman@...el.com>; Saleem, Shiraz
> > > <shiraz.saleem@...el.com>; moderated list:INTEL ETHERNET DRIVERS  
> > <intel-  
> > > wired-lan@...ts.osuosl.org>; open list <linux-kernel@...r.kernel.org>
> > > Subject: [PATCH net v3] ice: Fix race during aux device (un)plugging
> > >
> > > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > > aux device initialization and on other side ice_unplug_aux_dev()
> > > starts aux device deinit and at the end assigns NULL to pf->adev.
> > > This is wrong because pf->adev should always be non-NULL only when
> > > aux device is fully initialized and ready. This wrong order causes
> > > a crash when ice_send_event_to_aux() call occurs because that function
> > > depends on non-NULL value of pf->adev and does not assume that
> > > aux device is half-initialized or half-destroyed.
> > > After order correction the race window is tiny but it is still there,
> > > as Leon mentioned and manipulation with pf->adev needs to be protected
> > > by mutex.
> > >
> > > Fix (un-)plugging functions so pf->adev field is set after aux device
> > > init and prior aux device destroy and protect pf->adev assignment by
> > > new mutex. This mutex is also held during ice_send_event_to_aux()
> > > call to ensure that aux device is valid during that call. Device
> > > lock used ice_send_event_to_aux() to avoid its concurrent run can
> > > be removed as this is secured by that mutex.
> > >
> > > Reproducer:
> > > cycle=1
> > > while :;do
> > >         echo "#### Cycle: $cycle"
> > >
> > >         ip link set ens7f0 mtu 9000
> > >         ip link add bond0 type bond mode 1 miimon 100
> > >         ip link set bond0 up
> > >         ifenslave bond0 ens7f0
> > >         ip link set bond0 mtu 9000
> > >         ethtool -L ens7f0 combined 1
> > >         ip link del bond0
> > >         ip link set ens7f0 mtu 1500
> > >         sleep 1
> > >
> > >         let cycle++
> > > done
> > >
> > > In short when the device is added/removed to/from bond the aux device
> > > is unplugged/plugged. When MTU of the device is changed an event is
> > > sent to aux device asynchronously. This can race with (un)plugging
> > > operation and because pf->adev is set too early (plug) or too late
> > > (unplug) the function ice_send_event_to_aux() can touch uninitialized
> > > or destroyed fields. In the case of crash below pf->adev->dev.mutex.
> > >
> > > Crash:
> > > [   53.372066] bond0: (slave ens7f0): making interface the new active one
> > > [   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an  
> > u  
> > > p link
> > > [   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes
> > > ready
> > > [   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an
> > > up
> > >  link
> > > [   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed
> > > inval
> > > idating tc mappings. Priority traffic classification disabled!
> > > [   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed
> > > inval
> > > idating tc mappings. Priority traffic classification disabled!
> > > [   54.248204] bond0: (slave ens7f0): Releasing backup interface
> > > [   54.253955] bond0: (slave ens7f1): making interface the new active one
> > > [   54.274875] bond0: (slave ens7f1): Releasing backup interface
> > > [   54.289153] bond0 (unregistering): Released all slaves
> > > [   55.383179] MII link monitoring set to 100 ms
> > > [   55.398696] bond0: (slave ens7f0): making interface the new active one
> > > [   55.405241] BUG: kernel NULL pointer dereference, address:
> > > 0000000000000080
> > > [   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an  
> > u  
> > > p link
> > > [   55.412198] #PF: supervisor write access in kernel mode
> > > [   55.412200] #PF: error_code(0x0002) - not-present page
> > > [   55.412201] PGD 25d2ad067 P4D 0
> > > [   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > > [   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted:  
> > G  
> > > S
> > >            5.17.0-13579-g57f2d6540f03 #1
> > > [   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an
> > > up
> > >  link
> > > [   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS  
> > 1.4.4  
> > > 10/07/
> > > 2021
> > > [   55.430226] Workqueue: ice ice_service_task [ice]
> > > [   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
> > > [   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f  
> > 84  
> > > 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17  
> > 75  
> > > 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
> > > [   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
> > > [   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX:
> > > 0000000000000001
> > > [   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI:
> > > 0000000000000080
> > > [   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09:
> > > 0000000000000041
> > > [   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12:
> > > ff1a79d1c7e48bc0
> > > [   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15:
> > > 0000000000000000
> > > [   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000)
> > > knlGS:0000000000000000
> > > [   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4:
> > > 0000000000771ef0
> > > [   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [   55.567305] PKRU: 55555554
> > > [   55.570018] Call Trace:
> > > [   55.572474]  <TASK>
> > > [   55.574579]  ice_service_task+0xaab/0xef0 [ice]
> > > [   55.579130]  process_one_work+0x1c5/0x390
> > > [   55.583141]  ? process_one_work+0x390/0x390
> > > [   55.587326]  worker_thread+0x30/0x360
> > > [   55.590994]  ? process_one_work+0x390/0x390
> > > [   55.595180]  kthread+0xe6/0x110
> > > [   55.598325]  ? kthread_complete_and_exit+0x20/0x20
> > > [   55.603116]  ret_from_fork+0x1f/0x30
> > > [   55.606698]  </TASK>
> > >
> > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
> > > Cc: Leon Romanovsky <leon@...nel.org>
> > > Signed-off-by: Ivan Vecera <ivecera@...hat.com>  
> > 
> > Sorry for previous mis-reply - hit the wrong button.
> > 
> > LGTM
> > Acked-by: Dave Ertman <david.m.ertman@...el.com>  
> 
> After thinking about this for a bit longer, I did think of one issue.
> 
> With the removal of the device_lock in ice_send_event_to_aux(), there is no guarantee that the
> function pointer will not become NULL by the auxiliary_driver unloading.  It is a very small window,
> but it could happen.
> 
> I think the device_lock should probably stay also.
> 
> DaveE
> 

The function pointer can't become NULL but adev->dev.driver can. So yeah, you are right the device lock
needs to be held as well.
Will submit v4.

Thx,
Ivan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ