lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201204111544.27866.thomas@fjellstrom.ca>
Date:	Wed, 11 Apr 2012 15:44:27 -0600
From:	Thomas Fjellstrom <thomas@...llstrom.ca>
To:	adam radford <aradford@...il.com>
Cc:	lkml <linux-kernel@...r.kernel.org>, linux-scsi@...r.kernel.org
Subject: Re: stuck in megaraid_sas.c megasas_adp_reset_gen2

On Wed Apr 11, 2012, adam radford wrote:
> On Wed, Apr 11, 2012 at 1:17 PM, Thomas Fjellstrom <thomas@...llstrom.ca> 
wrote:
> >> ADP_RESET_GEN2: HostDiag=a0
> >> 
> >> followed by a bunch of:
> >> 
> >> RESET_GEN2: retry=%x, hostdiag=a4
> >> 
> >> Now I'm not sure the hostdiag should be different between the two. if
> >> this aN identifier is similar to the aN identifiers in the MegaCli
> >> tool, then it would mean its trying to reset a device that doesn't
> >> exist? I only have a single M1015 card installed.
> 
> host diag register output a0 or a4 has absolutely nothing to do with
> MegaCli -aN command line argument for specifying adapter number.
> 
> > I just got a second M1015 card in today and gave it a go. Similar issues,
> > different log messages. (hand typed from picture taken of screens)
> > 
> > Lots of:
> > 
> > megasas: Waiting for 1 commands to complete
> 
> Can you try booting with kernel command line argument pcie_aspm=off

No problem.

Things are quite similar. Startup goes like:

<detected a onboard sata ports>
scsi: waiting for bus probes to complete...
Refined TSC...
Switched to clocksource tsc
<pause here>
udevd[...]: timeout: killing '/sbin/modprobe -b ...' (lots of these, so much 
that I hit scroll lock so I can see the kernel messages as they come up)
scsi 0:0:0:0: megasas: RESET cmd=12 retries=0
megasas: [ 0] waiting for 1 commands to complete
(many more waiting messages)
<hung task kworker/u:4>
Call Trace:
  [<ffffffff810641d0>] ? async_synchronize_cookie_domain+0xb2/...c
  [<ffffffff8105f583>] ? add_wait_queue+0x3c/0x3c
  ....
megasas: [55] waiting for 1 commands to complete
....
megasas: [175] waiting for 1 commands to complete
megasas: moving cmd[0]:ffff880234bcb940:0:ffff88002339beec0 the defer queue as 
internal
megaraid_sas: FW detected to be in faultstate, restarting it...
ADP_RESET_GEN2: HostDiag=a0
(10s wait)
megaraid_sas: FW restarted successfully,initializing next stage...
megaraid_sas: HBA recovery state machine,state 2 starting...
(30s wait)
megasas: Waiting for FW to come to ready state
megasas: FW now in ready state
megaraid_sas: command ffff880234bcb940, ffff8802339beec0:0detected to be pending 
while HBA reset
megasas: ffff880234bcb940 scsi cmd [12]detected on the internal queue, issue 
again.
megasas: reset successful
scsi: 0:0:0:0: megasas: RESET cmd=12 retries 0
megaraid_sas: no pending cmds after reset
megasas: reset successful
(20s wait)
(device offlined message here, missed it this time)
(detected all sata devices)

And it stalled there.

> -Adam


-- 
Thomas Fjellstrom
thomas@...llstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ