lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2137329.lhgpPQ2jGW@natalenko.name>
Date:   Sat, 21 Aug 2021 13:14:34 +0200
From:   Oleksandr Natalenko <oleksandr@...alenko.name>
To:     linux-kernel@...r.kernel.org
Cc:     netdev@...r.kernel.org, linux-wireless@...r.kernel.org,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Johannes Berg <johannes@...solutions.net>,
        Felix Fietkau <nbd@....name>,
        Lorenzo Bianconi <lorenzo.bianconi83@...il.com>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Kalle Valo <kvalo@...eaurora.org>
Subject: Re: Divide error in minstrel_ht_get_tp_avg()

Hello.

On pondělí 7. června 2021 16:52:30 CEST Oleksandr Natalenko wrote:
> On Sat, May 29, 2021 at 06:57:29PM +0200, Oleksandr Natalenko wrote:
> > The following woe happened on my home router running as a Wi-Fi AP with
> > MT7612:
> > 
> > ```
> > [16592.157962] divide error: 0000 [#1] PREEMPT SMP PTI
> > [16592.163169] CPU: 2 PID: 683 Comm: mt76-tx phy0 Tainted: G         C    
> >    5.12.0-pf4 #1 [16592.171745] Hardware name: To Be Filled By O.E.M. To
> > Be Filled By O.E.M./J3710-ITX, BIOS P1.50 04/16/2018 [16592.182155] RIP:
> > 0010:minstrel_ht_get_tp_avg+0xb1/0x100 [mac80211] [16592.188795] Code: 04
> > 00 00 00 7f 1c 31 c9 81 fa f1 49 02 00 0f 9c c1 8d 0c cd 08 00 00 00 eb
> > 08 8b 47 30 b9 01 00 00 00 69 c0 e8 03 00 00 31 d2 <f7> f1 ba 66 0e 00 00
> > 39 d6 0f 4f f2 49 63 d1 48 8d 0c 52 48 8d 0c [16592.208796] RSP:
> > 0018:ffffb6a601293be8 EFLAGS: 00010246
> > [16592.214292] RAX: 000000000001a5e0 RBX: ffff9c658ee95170 RCX:
> > 0000000000000000 [16592.222054] RDX: 0000000000000000 RSI:
> > 0000000000000585 RDI: ffff9c658ee94000 [16592.229620] RBP:
> > 0000000000000006 R08: 0000000000000006 R09: 0000000000000012
> > [16592.237254] R10: 0000000000000007 R11: 0000000000000000 R12:
> > ffff9c658ee94000 [16592.244828] R13: 0000000000000012 R14:
> > ffff9c658ee9534c R15: 0000000000000585 [16592.252635] FS: 
> > 0000000000000000(0000) GS:ffff9c66f8500000(0000) knlGS:0000000000000000
> > [16592.261086] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [16592.267326] CR2: 0000555d81abe7b8 CR3: 0000000116c10000 CR4:
> > 00000000001006e0 [16592.274904] Call Trace:
> > [16592.277549]  minstrel_ht_update_stats+0x1fe/0x1320 [mac80211]
> > [16592.283730]  minstrel_ht_tx_status+0x67f/0x710 [mac80211]
> > [16592.289442]  rate_control_tx_status+0x6e/0xb0 [mac80211]
> > [16592.295267]  ieee80211_tx_status_ext+0x22e/0xb00 [mac80211]
> > [16592.311290]  ieee80211_tx_status+0x7d/0xa0 [mac80211]
> > [16592.316714]  mt76_tx_status_unlock+0x83/0xa0 [mt76]
> > [16592.321999]  mt76x02_send_tx_status+0x1b7/0x400 [mt76x02_lib]
> > [16592.334516]  mt76x02_tx_worker+0x8f/0xd0 [mt76x02_lib]
> > [16592.340091]  __mt76_worker_fn+0x78/0xb0 [mt76]
> > [16592.345067]  kthread+0x183/0x1b0
> > [16592.353378]  ret_from_fork+0x22/0x30
> > ```
> > 
> > `faddr2line` says it is this:
> > 
> > ```
> > 
> >  430 int
> >  431 minstrel_ht_get_tp_avg(struct minstrel_ht_sta *mi, int group, int
> >  rate,
> >  432                int prob_avg)
> >  433 {
> > 
> > …
> > 
> >  435     unsigned int ampdu_len = 1;
> > 
> > …
> > 
> >  441     if (minstrel_ht_is_legacy_group(group))
> > 
> > …
> > 
> >  443     else
> >  444         ampdu_len = minstrel_ht_avg_ampdu_len(mi);
> > 
> > …
> > 
> >  446     nsecs = 1000 * overhead / ampdu_len;
> >  
> >          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > ```
> > 
> > So, it seems `minstrel_ht_avg_ampdu_len()` can return 0, which is not
> > really legitimate.
> > 
> > Looking at `minstrel_ht_avg_ampdu_len()`, I see the following:
> > 
> > ```
> > 16:#define MINSTREL_SCALE  12
> > …
> > 18:#define MINSTREL_TRUNC(val) ((val) >> MINSTREL_SCALE)
> > ```
> > 
> > ```
> > 
> >  401 static unsigned int
> >  402 minstrel_ht_avg_ampdu_len(struct minstrel_ht_sta *mi)
> >  403 {
> > 
> > …
> > 
> >  406     if (mi->avg_ampdu_len)
> >  407         return MINSTREL_TRUNC(mi->avg_ampdu_len);
> > 
> > ```
> > 
> > So, likely, `mi->avg_ampdu_len` is non-zero, but it's too small, hence
> > right bitshift makes it zero.
> > 
> > Should there be some protection against such a situation?
> > 
> > Felix, could you maybe check this? Looks like you were the last one to
> > overhaul that part of code.
> 
> Quick hack from me:
> 
> commit a23d3f77658de968c5bb852ba478402c93958f7f
> Author: Oleksandr Natalenko <oleksandr@...alenko.name>
> Date:   Mon Jun 7 16:45:45 2021 +0200
> 
>     mac80211: minstrel_ht: force ampdu_len to be > 0
> 
>     This is a hack.
> 
>     Work around the following crash:
> 
>     ```
>     divide error: 0000 [#1] PREEMPT SMP PTI
>     CPU: 2 PID: 683 Comm: mt76-tx phy0 Tainted: G         C       
> 5.12.0-pf4 #1 Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./J3710-ITX, BIOS P1.50 04/16/2018 RIP:
> 0010:minstrel_ht_get_tp_avg+0xb1/0x100 [mac80211]
>     …
>     Call Trace:
>      minstrel_ht_update_stats+0x1fe/0x1320 [mac80211]
>      minstrel_ht_tx_status+0x67f/0x710 [mac80211]
>      rate_control_tx_status+0x6e/0xb0 [mac80211]
>      ieee80211_tx_status_ext+0x22e/0xb00 [mac80211]
>      ieee80211_tx_status+0x7d/0xa0 [mac80211]
>      mt76_tx_status_unlock+0x83/0xa0 [mt76]
>      mt76x02_send_tx_status+0x1b7/0x400 [mt76x02_lib]
>      mt76x02_tx_worker+0x8f/0xd0 [mt76x02_lib]
>      __mt76_worker_fn+0x78/0xb0 [mt76]
>      kthread+0x183/0x1b0
>      ret_from_fork+0x22/0x30
>     ```
> 
>     Link:
> https://lore.kernel.org/lkml/20210529165728.bskaozwtmwxnvucx@spock.localdom
> ain/ Signed-off-by: Oleksandr Natalenko <oleksandr@...alenko.name>
> ---
> diff --git a/net/mac80211/rc80211_minstrel_ht.c
> b/net/mac80211/rc80211_minstrel_ht.c index ecad9b10984f..6ad188c4101e
> 100644
> --- a/net/mac80211/rc80211_minstrel_ht.c
> +++ b/net/mac80211/rc80211_minstrel_ht.c
> @@ -440,8 +440,13 @@ minstrel_ht_get_tp_avg(struct minstrel_ht_sta *mi, int
> group, int rate,
> 
>  	if (minstrel_ht_is_legacy_group(group))
>  		overhead = mi->overhead_legacy;
> -	else
> +	else {
>  		ampdu_len = minstrel_ht_avg_ampdu_len(mi);
> +		if (unlikely(!ampdu_len)) {
> +			pr_err_once("minstrel_ht_get_tp_avg: ampdu_len == 0!");
> +			ampdu_len = 1;
> +		}
> +	}
> 
>  	nsecs = 1000 * overhead / ampdu_len;
>  	nsecs += minstrel_mcs_groups[group].duration[rate] <<

I've also found out that this happens exactly at midnight, IOW, at 00:00:00. 
Not every midnight, though.

Does it have something to do with timekeeping? This is strange, I wouldn't 
expect kernel to act like that. Probably, some client sends malformed frame? 
How to find out?

Thanks.

P.S. The hack works, BTW, the system survives, and AP runs as if nothing 
happened.

-- 
Oleksandr Natalenko (post-factum)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ