lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Feb 2022 07:27:32 +0100
From:   Juergen Gross <jgross@...e.com>
To:     Marek Marczykowski-Górecki 
        <marmarek@...isiblethingslab.com>, linux-kernel@...r.kernel.org
Cc:     stable@...r.kernel.org,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Antoine Tenart <atenart@...nel.org>,
        "moderated list:XEN HYPERVISOR INTERFACE" 
        <xen-devel@...ts.xenproject.org>,
        "open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>
Subject: Re: [PATCH] xen/netfront: destroy queues before real_num_tx_queues is
 zeroed

On 20.02.22 14:42, Marek Marczykowski-Górecki wrote:
> xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
> delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5
> ("net-sysfs: update the queue counts in the unregistration path"),
> unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
> facts together means, that xennet_destroy_queues() called from
> xennet_remove() cannot do its job, because it's called after
> unregister_netdev(). This results in kfree-ing queues that are still
> linked in napi, which ultimately crashes:
> 
>      BUG: kernel NULL pointer dereference, address: 0000000000000000
>      #PF: supervisor read access in kernel mode
>      #PF: error_code(0x0000) - not-present page
>      PGD 0 P4D 0
>      Oops: 0000 [#1] PREEMPT SMP PTI
>      CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
>      RIP: 0010:free_netdev+0xa3/0x1a0
>      Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
>      RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
>      RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
>      RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
>      RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
>      R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
>      R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
>      FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
>      Call Trace:
>       <TASK>
>       xennet_remove+0x13d/0x300 [xen_netfront]
>       xenbus_dev_remove+0x6d/0xf0
>       __device_release_driver+0x17a/0x240
>       device_release_driver+0x24/0x30
>       bus_remove_device+0xd8/0x140
>       device_del+0x18b/0x410
>       ? _raw_spin_unlock+0x16/0x30
>       ? klist_iter_exit+0x14/0x20
>       ? xenbus_dev_request_and_reply+0x80/0x80
>       device_unregister+0x13/0x60
>       xenbus_dev_changed+0x18e/0x1f0
>       xenwatch_thread+0xc0/0x1a0
>       ? do_wait_intr_irq+0xa0/0xa0
>       kthread+0x16b/0x190
>       ? set_kthread_struct+0x40/0x40
>       ret_from_fork+0x22/0x30
>       </TASK>
> 
> Fix this by calling xennet_destroy_queues() from xennet_close() too,
> when real_num_tx_queues is still available. This ensures that queues are
> destroyed when real_num_tx_queues is set to 0, regardless of how
> unregister_netdev() was called.
> 
> Originally reported at
> https://github.com/QubesOS/qubes-issues/issues/7257
> 
> Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path")
> Cc: stable@...r.kernel.org # 5.16+
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>
> 
> ---
> While this fixes the issue, I'm not sure if that is the correct thing
> to do. xennet_remove() calls xennet_destroy_queues() under rtnl_lock,
> which may be important here? Just moving xennet_destroy_queues() before

I checked some of the call paths leading to xennet_close(), and all of
those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already
taken here. Could you test with adding an ASSERT_RTNL() in
xennet_destroy_queues()?

> unregister_netdev() in xennet_remove() did not helped - it crashed in
> another way (use-after-free in xennet_close()).

Yes, this would need to basically do the xennet_close() handling in
xennet_destroy() instead, which I believe is not really an option.

In case your test with the added ASSERT_RTNL() doesn't show any
problem you can add my:

Reviewed-by: Juergen Gross <jgross@...e.com>


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ