linux-kernel - RE: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100503191756.GA3479@localhost>
Date:	Mon, 3 May 2010 21:17:56 +0200
From:	NilsRadtkelkml@...nk-Future.de
To:	linux-kernel@...r.kernel.org
Subject: RE: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some
 accesspoint == hardlock

  Hello,

  Some of the previous messages about this BUG seem lost, so here again in a single message:

When connecting to the local university's wireless APs, rather regularly the kernel crashes.

So, connecting to one of those the system hardlocks. No panic LED blinking, no mouse cursor moving, 
no magic sysreq key working. Poweroff is the only way to revive the machine. Until the upcoming
lock-up.

What might serve as a hint: using knemo and kwifimanager (with wpa_supplicant in the bg) it feels like it is 
freezing more frequently. Not running kwifimanager seems to reduce the frequency w/o however eliminating it
completely. Sometimes iceweasel has been touched and then the crash occurred. 
Maybe some TX (or subsequent RX) provokes the crash? Some idea that also came up was that accessing iwlagn 
drivers isn't clean when it's happening from more that one place at the same time 
(->wpa_supplicant + kwifimanager)? [edit: this isn't probably it, or not solely]

This behaviour is currently only known to happen in this exact place/geolocation. So one assumption is that it
is somehow correlated to a particular type of access point, firmware respectively. Nonetheless, even
if there's some fw sending kill packets, the kernel should not crash, obviously.

The access points in range are of type: (obviously, fw versions are unknown..)

  - one 0:1d:8b PIRELLI BROADBAND SOLUTIONS 
  - couple of those 0:40:96 Cisco Systems, Inc.
  - one 0:1a:70 Cisco-Linksys

This notebook has no serial port, so we haven't been able to get a glimpse on what is happening at that 
precise moment. xconsole or dmesg haven't been helpful as the system just freezes and that's it.

Other circumstances leading to a crash (reproducibility):

  - RFKILL switch, switching radio on, (obviously) is a precondition. But then, it mainly 
    happens shortly after switching radio on,
  - it happens more frequently and faster after switching radio on when kde + firefox + xy is running

Please find attached an image of the bug. Finally I could catch a glimpse of what happens, I longued
for this quite some time.

Strangely, as stated in the previous message, this bug does only happen in conjunction at a specific 
geographical position, so far it only happened there that is. How does this correlate with the designated
code at line 2076 in iwl-agn-rs.c?:

  /* Sanity-check TPT calculations */
  BUG_ON(window->average_tpt != ((window->success_ratio *
      tbl->expected_tpt[index] + 64) / 128));

Furthermore, it seems like there was already an encounter with this bug before. Previously I had no means 
of grabbing more information off it, though, so I wrote down some notes:

2010-03-20: iwl-agn-rs.c line 2076: bug
The file where I kept my note for this bug contains: 

  - crashed during wireless AP scan using wpa_supplicant (after RFKILL switching to on..)
  - this happend right after a fresh reboot (after crashing for the same bug)

  - afterwards, auth.log contained this (bad, bad where's my data gone?):
Mar 20 18:24:01 localhost CRON[16684]: pam_unix(cron:session): session closed for user root
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@...6; it; rv:1.9.2.1) Gecko/20091211 Firefox/3.5.6 (like Firefox/3.5.11; Debian-3.5.11-1)"

Noticed the line in the bug from 2010-03-20? Yep, the same as in the one of today.

2010-04-30: One crash, two oopses: Msg restored mainly from remainders left in some brain's corner, w/ 
heavy massaging applied, mainly Code has been written down properly due to outside constrains..
Unfortunately, the proper ordering of lines got lost, so, yes, this is a poor report..
Order is as it's believed it was shown on-screen.

Number 1: nearly lost as subsequent oops made it scroll off-screen, only bottom half visible:

in CPU idle +x3f/x90

EIP rs_tx_status +x8f/x2030    SS: ESP 0668:f749fc4c

Code: 7f 22 69 c7 80 01 00 00 8b 8c 24 fc 01 00 00 8b 84 01 a8 00 00 00 6b 04 b0 64 39 44 24 1c 0f 8e 
      d2 fe ff ff 31 db e9 cb fe ff ff <0f> 0b fe 8b 54 24 4c 8b 8c 24 fc 01 00 00 8b 82 04 00 00 00

in iwl_rx_queue restock xbd/x130   iwlcore

panic -> oops begin 

kernel panic not syncing fatal exec in int

pid 0 comm: swapper tainted GD .33.2

Comments, ideas, fixes welcome... this BUG is highly annoying..

Happy to serve with more info if required (and available..).

  Regards,

      Nils

Download attachment "2010-04-29_kernel-oops_iwlagn_2076_cut4www.png" of type "image/png" (83116 bytes)