lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100503191756.GA3479@localhost>
Date:	Mon, 3 May 2010 21:17:56 +0200
From:	NilsRadtkelkml@...nk-Future.de
To:	linux-kernel@...r.kernel.org
Subject: RE: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some
 accesspoint == hardlock


  Hello,

  Some of the previous messages about this BUG seem lost, so here again in a single message:

When connecting to the local university's wireless APs, rather regularly the kernel crashes.

So, connecting to one of those the system hardlocks. No panic LED blinking, no mouse cursor moving, 
no magic sysreq key working. Poweroff is the only way to revive the machine. Until the upcoming
lock-up.

What might serve as a hint: using knemo and kwifimanager (with wpa_supplicant in the bg) it feels like it is 
freezing more frequently. Not running kwifimanager seems to reduce the frequency w/o however eliminating it
completely. Sometimes iceweasel has been touched and then the crash occurred. 
Maybe some TX (or subsequent RX) provokes the crash? Some idea that also came up was that accessing iwlagn 
drivers isn't clean when it's happening from more that one place at the same time 
(->wpa_supplicant + kwifimanager)? [edit: this isn't probably it, or not solely]

This behaviour is currently only known to happen in this exact place/geolocation. So one assumption is that it
is somehow correlated to a particular type of access point, firmware respectively. Nonetheless, even
if there's some fw sending kill packets, the kernel should not crash, obviously.

The access points in range are of type: (obviously, fw versions are unknown..)

  - one 0:1d:8b PIRELLI BROADBAND SOLUTIONS 
  - couple of those 0:40:96 Cisco Systems, Inc.
  - one 0:1a:70 Cisco-Linksys

This notebook has no serial port, so we haven't been able to get a glimpse on what is happening at that 
precise moment. xconsole or dmesg haven't been helpful as the system just freezes and that's it.

Other circumstances leading to a crash (reproducibility):

  - RFKILL switch, switching radio on, (obviously) is a precondition. But then, it mainly 
    happens shortly after switching radio on,
  - it happens more frequently and faster after switching radio on when kde + firefox + xy is running

Please find attached an image of the bug. Finally I could catch a glimpse of what happens, I longued
for this quite some time.

Strangely, as stated in the previous message, this bug does only happen in conjunction at a specific 
geographical position, so far it only happened there that is. How does this correlate with the designated
code at line 2076 in iwl-agn-rs.c?:

  /* Sanity-check TPT calculations */
  BUG_ON(window->average_tpt != ((window->success_ratio *
      tbl->expected_tpt[index] + 64) / 128));


Furthermore, it seems like there was already an encounter with this bug before. Previously I had no means 
of grabbing more information off it, though, so I wrote down some notes:

2010-03-20: iwl-agn-rs.c line 2076: bug
The file where I kept my note for this bug contains: 

  - crashed during wireless AP scan using wpa_supplicant (after RFKILL switching to on..)
  - this happend right after a fresh reboot (after crashing for the same bug)

  - afterwards, auth.log contained this (bad, bad where's my data gone?):
Mar 20 18:24:01 localhost CRON[16684]: pam_unix(cron:session): session closed for user root
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@...6; it; rv:1.9.2.1) Gecko/20091211 Firefox/3.5.6 (like Firefox/3.5.11; Debian-3.5.11-1)"

Noticed the line in the bug from 2010-03-20? Yep, the same as in the one of today.

2010-04-30: One crash, two oopses: Msg restored mainly from remainders left in some brain's corner, w/ 
heavy massaging applied, mainly Code has been written down properly due to outside constrains..
Unfortunately, the proper ordering of lines got lost, so, yes, this is a poor report..
Order is as it's believed it was shown on-screen.

Number 1: nearly lost as subsequent oops made it scroll off-screen, only bottom half visible:

in CPU idle +x3f/x90

EIP rs_tx_status +x8f/x2030    SS: ESP 0668:f749fc4c
  
Code: 7f 22 69 c7 80 01 00 00 8b 8c 24 fc 01 00 00 8b 84 01 a8 00 00 00 6b 04 b0 64 39 44 24 1c 0f 8e 
      d2 fe ff ff 31 db e9 cb fe ff ff <0f> 0b fe 8b 54 24 4c 8b 8c 24 fc 01 00 00 8b 82 04 00 00 00

in iwl_rx_queue restock xbd/x130   iwlcore

panic -> oops begin 

kernel panic not syncing fatal exec in int

pid 0 comm: swapper tainted GD .33.2


Comments, ideas, fixes welcome... this BUG is highly annoying..

Happy to serve with more info if required (and available..).

  Regards,

      Nils


Download attachment "2010-04-29_kernel-oops_iwlagn_2076_cut4www.png" of type "image/png" (83116 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ