netdev - Re: [PATCH v2] ax25: Fix ax25 session cleanup problem in ax25

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52f70cc1.3a10e.1810a40afe0.Coremail.duoming@zju.edu.cn>
Date:   Sat, 28 May 2022 18:40:20 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     linux-hams@...r.kernel.org
Cc:     jreuter@...na.de, ralf@...ux-mips.org, davem@...emloft.net,
        edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
        thomas@...erried.de, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ax25: Fix ax25 session cleanup problem in
 ax25_release

Hello,

On Fri, 27 May 2022 23:18:32 +0800 Duoming wrote:

> The timers of ax25 are used for correct session cleanup.
> If we use ax25_release() to close ax25 sessions and
> ax25_dev is not null, the del_timer_sync() functions in
> ax25_release() will execute. As a result, the sessions
> could not be cleaned up correctly, because the timers
> have stopped.
> 
> This patch adds a device_up flag in ax25_dev in order to
> judge whether the device is up. If there are sessions to
> be cleaned up, the del_timer_sync() in ax25_release() will
> not execute. What's more, we add ax25_cb_del() in
> ax25_kill_by_device(), because the timers have been stopped
> and there are no functions that could delete ax25_cb if we
> do not call ax25_release().
> 
> Fixes: 82e31755e55f ("ax25: Fix UAF bugs in ax25 timers")
> Reported-and-tested-by: Thomas Osterried <thomas@...erried.de>
> Signed-off-by: Duoming Zhou <duoming@....edu.cn>
> ---
> Changes in v2:
>   - Add ax25_cb_del() in ax25_kill_by_device().
> 
>  include/net/ax25.h  |  1 +
>  net/ax25/af_ax25.c  | 15 ++++++++++-----
>  net/ax25/ax25_dev.c |  1 +
>  3 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/ax25.h b/include/net/ax25.h
> index 0f9790c455b..a427a05672e 100644
> --- a/include/net/ax25.h
> +++ b/include/net/ax25.h
> @@ -228,6 +228,7 @@ typedef struct ax25_dev {
>  	ax25_dama_info		dama;
>  #endif
>  	refcount_t		refcount;
> +	bool device_up;
>  } ax25_dev;
>  
>  typedef struct ax25_cb {
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 363d47f9453..92cbb08a6c5 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -81,6 +81,7 @@ static void ax25_kill_by_device(struct net_device *dev)
>  
>  	if ((ax25_dev = ax25_dev_ax25dev(dev)) == NULL)
>  		return;
> +	ax25_dev->device_up = false;
>  
>  	spin_lock_bh(&ax25_list_lock);
>  again:
> @@ -91,6 +92,7 @@ static void ax25_kill_by_device(struct net_device *dev)
>  				spin_unlock_bh(&ax25_list_lock);
>  				ax25_disconnect(s, ENETUNREACH);
>  				s->ax25_dev = NULL;
> +				ax25_cb_del(s);
>  				spin_lock_bh(&ax25_list_lock);
>  				goto again;
>  			}
> @@ -104,6 +106,7 @@ static void ax25_kill_by_device(struct net_device *dev)
>  				ax25_dev_put(ax25_dev);
>  			}
>  			release_sock(sk);
> +			ax25_cb_del(s);
>  			spin_lock_bh(&ax25_list_lock);
>  			sock_put(sk);
>  			/* The entry could have been deleted from the

There is a "refcount_t: underflow" problem, the call trace is shown below:

refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 15997 at lib/refcount.c:28 refcount_warn_saturate+0xc5/0x110
RIP: 0010:refcount_warn_saturate+0xc5/0x110
Code: 1b e0 d6 02 01 e8 46 82 1d 01 0f 0b eb 99 80 3d 08 e0 d6 02 00 75 90 48 c7 c7 80 87
RSP: 0018:ffff88800ab37db0 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed1001566fa8
RBP: ffff88800a3bb410 R08: ffffffff810ffe2f R09: ffff88800ab37a37
R10: ffffed1001566f46 R11: 0000000000000001 R12: ffff888008960000
R13: ffff88800953f2c0 R14: ffff888006500018 R15: ffff888008960080
FS:  00007f46981f3700(0000) GS:ffff88806c600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000042270a CR3: 0000000009b64000 CR4: 00000000000006e0
Call Trace:
<TASK>
__sk_destruct+0x2c/0x350
ax25_release+0x34e/0x4a0
__sock_release+0x6d/0x120
sock_close+0xf/0x20
__fput+0x10e/0x410
task_work_run+0x86/0xc0
exit_to_user_mode_prepare+0x194/0x1a0
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

The race condition is shown below:

      (Thread 1)                    |      (Thread 2)
ax25_create()                       |
  refcount_set(&ax25->refcount, 1)  |
ax25_bind()                         |
  ax25_cb_add()                     |
    ax25_cb_hold(ax25) //refcnt = 2 |
ax25_kill_by_device()               | ax25_release()
  ...                               |   ...
  release_sock();                   |     
  // no locks protect ax25_cb_del   |   lock_sock()  
  ax25_cb_del()                     |   ax25_destroy_socket()
    if (!hlist_unhashed(..))        |   ax25_cb_del()
      ...                           |    if (!hlist_unhashed(..))
      hlist_del_init()              |      
      ax25_cb_put(ax25) //refcnt = 1|      ...
                                    |      ax25_cb_put(ax25) //refcnt = 0
                                    |      ...
                                    |   sock_put(sk) 
                                    |     sk_free()
                                    |       sk_destruct()
                                    |         __sk_destruct()
                                    |           ax25_free_sock()
                                    |             ax25_cb_put(ax25) // refcount_t: underflow!

Moving ax25_cb_del() into lock_sock() can solve this problem, 
because there is a check in ax25_cb_del(). If we delete ax25 node
in hlist, the check will not be satisfied.

if (!hlist_unhashed(&ax25->ax25_node)) { //check
    spin_lock_bh(&ax25_list_lock);
    hlist_del_init(&ax25->ax25_node);  //delete ax25 node
    spin_unlock_bh(&ax25_list_lock);
    ax25_cb_put(ax25);
}

My successful test was this:

@@ -103,6 +105,7 @@ static void ax25_kill_by_device(struct net_device *dev)
                                dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
                                ax25_dev_put(ax25_dev);
                        }
+                       ax25_cb_del(s);
                        release_sock(sk);
                        spin_lock_bh(&ax25_list_lock);
                        sock_put(sk);

Best regards,
Duoming Zhou