lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5096A3A4.8070602@ahsoftware.de>
Date:	Sun, 04 Nov 2012 18:19:32 +0100
From:	Alexander Holler <holler@...oftware.de>
To:	Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org
Subject: Re: AMD A10: MCE Instruction Cache Error

Am 04.11.2012 16:21, schrieb Borislav Petkov:
> On Sat, Nov 03, 2012 at 11:45:25AM +0100, Alexander Holler wrote:
>> Hmm, exactly that just happened twice in a row. Unfortunately the
>> screen was already disabled (screen saving mode), so I couldn't see
>> any message, if there was any. Just a dead box (not overclocked, I
>> don't do such, I even had enabled the power saving mode in the BIOS,
>> which seems to mean max. 3800 MHz). I think I should start getting
>> nervous. :(
>
> How do you know this happened twice if you couldn't see any message?

I was remotely logged in and there aren't that many faults which lead to 
complete stand still of hw (no reset).

But as you said I can't know, the only thing I know is that a box with 
new mb, memory and apu come to a complete stand still, and such shortly 
after I've received an emergency message which told me that a bit inside 
the cpu switched unexpected. Adding to that, the box did the same as 
what it did while it received the MCE, a backup from a sata-atached ssd 
to an usb3-hd which includes compression and encryption which keeps all 
cores at work most of the time for several hours.

> Also, can you enable netconsole or serial console, if possible, and try
> to catch full dmesg from the boot and up until it happens.

As I was logged in remotely by network, I know it wasn't the same MCE as 
before (just a disconnect and dead hw). But I don't know what else it 
was. And as I recently got hit by a broken RAM module, which was a pain 
to find, I'm not very happy that I have to go through similiar pain 
again with new HW.

The probability to get a working HW and SW combination just has become 
to low in the last years. All the (IT) companies better should spend the 
money they now give their lawyers their QA and engineering departments 
instead.

Sorry for the rant, also I'm used to live with hw and sw errors (as a 
sw-dev), I'm currently just a bit annoyed. ;)

I will setup something to monitor the box through the serial and will 
let it backup itself all the time, trying to catch some usefull information.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ