lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0707141826580.16611@filesrv1.baby-dragons.com>
Date:	Sat, 14 Jul 2007 19:32:38 -0700 (PDT)
From:	"Mr. James W. Laferriere" <babydr@...y-dragons.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
cc:	linux-raid maillist <linux-raid@...r.kernel.org>,
	Linux Kernel Maillist <linux-kernel@...r.kernel.org>
Subject: Re: raid5:md3: read error corrected ,  followed by ,  Machine Check
 Exception: .

 	Hello Alan (& Justin) ,

On Sun, 15 Jul 2007, Alan Cox wrote:
> On Sat, 14 Jul 2007 17:08:27 -0700 (PDT)
> "Mr. James W. Laferriere" <babydr@...y-dragons.com> wrote:
>
>>  	Hello All ,  I was under the impression that a 'machine check' would be
>> caused by some near to the CPU hardware failure ,  Not a bad disk ?
>
> It indicates a hardware failure

 	OK ,  Be funny :-} .  Tho it ain't .  I still think I am correct in 
stating that a "'machine check' would be caused by some near to the CPU hardware 
failure" .  Ie: like physically shoved into the motherboard .  Not a 
indirectly connected device .  Such as a disk drive .


>> Jul 14 23:00:26 filesrv2 kernel: sd 2:0:1:0: SCSI error: return code = 0x08000002
>> Jul 14 23:00:26 filesrv2 kernel: sdd: Current: sense key: Medium Error
>> Jul 14 23:00:26 filesrv2 kernel:     Additional sense: Read retries exhausted
>
> So your disk throws a fit

 	Actually it's brand new .  Infant mortallity ?  I at least have a cold 
spare available .  So Yes I am replacing that puppy .  I'll drop it into another 
system & give it the format command & see how much the user bad block table 
grows .  I'll bet I'll get a table full overflow on it .


>> Jul 14 23:17:06 filesrv2 kernel: raid5:md3: read error corrected (8 sectors at 72269184 on sdd)
>> Jul 14 23:17:06 filesrv2 kernel: raid5:md3: read error corrected (8 sectors at 72269192 on sdd)
>
> Raid happily recovers the mess

 	That they did .  scsi & raid .


>> Jul 14 23:20:28 filesrv2 kernel: raid5:md3: read error corrected (8 sectors at 76388336 on sdd)
>> Jul 14 23:20:28 filesrv2 kernel: raid5:md3: read error corrected (8 sectors at 76388344 on sdd)
>> Jul 14 23:38:48 filesrv2 -- MARK --
>> CPU 5: Machine Check Exception: 0000000000000004
>> CPU 4: Machine Check Exception: 0000000000000004
>
> And at some point at least 18 minutes after the raid incident you log
> CPU problems.

 	I didn't notice the 18 Minute differance .  Drats .
 	The 'MCE's have been ongoing for sometime .  I have replaced every item 
in the system except the chassis & scsi backplane & power supply(750Watts) .
 	Everything .  MB,cpu,memory,scsi controllers, ...
 	These MCE's only happen when I am trying to build or bonnie++ test the 
md3 .  It consists of (now 7+1spare) 146GB drives in the SuperMicro 
SYS-6035B-8B's backplane attached to a LSI22320 .

> Are you sure the box or the room it is in didn't get excessively hot ?

 	The room before and at the time of the MCE was never over 71F .  Now the 
memory temp might be a tad hot but I've got a small jet engine running in that 
chassis as noisy as it is with fans so I don't see how anything in there could 
be too hot .
 	Temp this time ,  71.2F .  everytime I try that array it MCE's .
 	I have another array in a JBOD box of 14 disks & I have no problems 
with this system as long as I disconnect all the drives in the 8 drive Backplane .
 	I guess I'll just have to do without the extra 143,564,800K of data 
space .
 		Twyl ,  JimL

ps: system face diagram ...
   +-------------------------+
   |0|1|2|3|4|6|7|-----------| 
+-| | | | | | | |-----------|
| | | | | | | | |-----------|
| +-------------------------+
|  \           /
|   +---md3---+
|
| +-------------------------+
| |0|1|2|3|4|6|0|1|2|3|4|5|6|
+-| | | | | | | | | | | | | |
   | | | | | | | | | | | | | |
   +-------------------------+
    \           /\          /
     +---md4---+  +---md4--+
-- 
+-----------------------------------------------------------------+
| James   W.   Laferriere | System   Techniques | Give me VMS     |
| Network        Engineer | 663  Beaumont  Blvd |  Give me Linux  |
| babydr@...y-dragons.com | Pacifica, CA. 94044 |   only  on  AXP |
+-----------------------------------------------------------------+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ