linux-kernel - Re: AMD A10: MCE Instruction Cache Error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121103044929.GB21829@liondog.tnic>
Date:	Sat, 3 Nov 2012 05:49:29 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	Alexander Holler <holler@...oftware.de>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: AMD A10: MCE Instruction Cache Error

On Fri, Nov 02, 2012 at 02:53:45PM +0100, Alexander Holler wrote:
> Am 02.11.2012 11:50, schrieb Alexander Holler:
> >Hello,
> >
> >I've just got the following on an AMD A10 5800K:
> >
> >------
> >[ 8395.999581] [Hardware Error]: CPU:0
> >MC1_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00002000010151
> >[ 8395.999586] [Hardware Error]:        MC1_ADDR: 0x0000ffffa00e1203
> >[ 8395.999588] [Hardware Error]: Instruction Cache Error: Parity error
> >during data load from IC.
> >[ 8395.999590] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
> >------
> >
> >Kernel is 3.6.5, MB is an Asus F2A85-M with BIOS 5103 (the latest).
> >
> >Can someone enlight me about what might be wrong with my (new) system
> >(memtest didn't show an errors)?
> >
> >What IC is meant? As far as I know, this processor doesn't support ECC,
> >so I wonder where that parity error does come from.
> 
> I assume IC means Instruction Cache. ;)

It says so earlier in the sentence: "Instruction Cache Error" :)

> As the kernel didn't reboot or halt, this seems to have been a
> correctable error.

Yes, it is (the "CE" bit in MC1_STATUS). Btw, I have reworked this code
to spit human-readable information first. It also says what the error
severity is now.

> Which leads me to another question. I have mcelog running, but it
> doesn't seem to receive the error. With my previous Intel-HW and an
> older kernel, mcelog received MCE errors (trip temperatur). But
> since the kernel now decodes those message themself, that doesn't
> seem to happen anymore. mcelog is silent, but now I've seen the
> above message on all my consoles.

Yes, AMD doesn't use mcelog.

> So now I have two question:
> 
> - First, if the error is something I should ask AMD about,

Not really, it is a single bit flip which got corrected, simply watch
out if you get more of those.

> - Second, if the kernel could mention that it is an recoverable
> error. And if so and if such errors aren't something to get panic
> (e.g. it isn't unusual to receive such), if the kernel could output
> that message with another priority.

As I said above, it got corrected. If it were critical, it would've
either panicked or you wouldnt've seen it at all (probably only after
reboot).

HTH.

-- 
Regards/Gruss,
    Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/