lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F31D31C65@ORSMSX106.amr.corp.intel.com>
Date:	Wed, 16 Oct 2013 20:47:05 +0000
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Mauro Carvalho Chehab <m.chehab@...sung.com>,
	Borislav Petkov <bp@...en8.de>
CC:	"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
	"Chen, Gong" <gong.chen@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	Aristeu Rozanski Filho <arozansk@...hat.com>,
	Steven Rostedt <srostedt@...hat.com>
Subject: RE: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

> Also, I suspect that, if an error happens to affect more than one DIMM
> (e. g. part of the location is not available for a given error),
> that the DIMM label will also not be properly shown.

There are a couple of cases here:

1) There are a number of DIMMs behind some flaky h/w that introduces errors
that are apparently blamed onto each of those DIMMs.

  All we can do here is statistical correlations ... each error is reported independently,
  it is up to some entity to notice the higher level topology connection. There is enough
  information in the UEFI error record to do that (assuming that BIOS filled out the
  necessary fields).

2) There is a single reported error that spans more than one DIMM.

  This can happen with a UC error in a pair of lock-step DIMMs.  Since the error is UC
  we know that two (or more) bits are bad.  But we have no way to tell whether the
  bad bits came from the same DIMM, or one bit from each (because we don't know
  which bits are bad - if we knew that, we could fix them :-)   The eMCA case should
  log two subsections in this case - one for each of the lockstep DIMMs involved. A user
  seeing this will should probably just replace both DIMMs to be safe.  If they wanted to
  diagnose further they should swap DIMMs around so this pair are no longer lockstepped
  and see if they start seeing correctable errors from each of the split pair - or if the UC
  errors move with one or the other of the DIMMs

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ