lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1327764771-28649-1-git-send-email-mchehab@redhat.com>
Date:	Sat, 28 Jan 2012 13:32:35 -0200
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	unlisted-recipients:; (no To-header on input)
Cc:	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	lwang@...hat.com, bp@...64.org, tony.luck@...el.com
Subject: [PATCH RFCv2 00/16] This is the version 2 of the HERM patches

This patch series is there to address some troubles with the
EDAC subsystem.

There are two groups of change in this series:

a) a trace-based class of events for hardware errors is
added (Hardware Events Report Mecanism - HERM);

The need of moving for a tracepoint-based approach were
widely discussed already at the ML. Basically, it offers
more flexibility than message dumps at the console, allowing
events filtering and other sorts of improvements.

The long-term target is that memory errors will generate
events like:

	Corrected error: memory read error on DIMM_1A (row 1, channel 0, rank=5, cpu=0, Err=0001:0090, addr = 0x7a789f03e)
	Uncorrected error: memory write error on DIMM_2B (row 2, channel 3, rank=4, cpu=1, Err=0001:0091, addr = 0xdeadbeef)

E. g. putting the user-relevant information first while 
keeping the technical details that could help the 
hardware manufacturers and the ones that might want to replace
a DRAM chip in parenthesis.

b) the edac core was changed to better support memory
controllers that aren't able to see csrows.

The EDAC subsystem were originally written to work with 
memory controllers directly connected to the DIMM chips.
Not all memory architectures use this concept. For example,
FBDIMM memories are connected via a buffer, called AMB [1].

When an AMB is present, the memory controller only sees
its communication bus, called "channel". This has nothing
to do with the "csrow channel" concept, widely used at
the subsystem, and mandatory. All drivers that work with
such architectures currently need to fake data, lying to
the edac core, in order for them to work.

Lying to the subsystem in general is not a good idea ;)

So, this series addresses it by splitting the DIMM information
from the EDAC csrow_info struct, and creating a new set of
DIMM-oriented sysfs nodes:

/sys/devices/system/edac/mc/mc0
├── dimm0
│   ├── dimm_dev_type
│   ├── dimm_edac_mode
│   ├── dimm_label
│   ├── dimm_location
│   ├── dimm_mem_type
│   └── dimm_size
...
└── dimm3
    ├── dimm_dev_type
    ├── dimm_edac_mode
    ├── dimm_label
    ├── dimm_location
    ├── dimm_mem_type
    └── dimm_size

The DIMM description looks like:

	dimm_dev_type:x8
	dimm_edac_mode:S8ECD8ED
	dimm_label:DIMM_3A
	dimm_location:branch 1 channel 0 dimm 1
	dimm_mem_type:Unbuffered-DDR3
	dimm_size:1024

Currently, the existing struct was not touched. The next step
(as indicated at the last patch on this series) is to
create the error counters.

Currently, is still an RFC, as it is not complete, and some
changes will require more test. Also, didn't try to compile
it yet on non x86 archs.

[1] http://www.interfacebus.com/Memory_Module_DDR2_FB_DIMM.html 

Please review.

Thanks!
Mauro

-

Mauro Carvalho Chehab (16):
  events/hw_event: Create a Hardware Events Report Mecanism (HERM)
  events/hw_event: use __string() trace macros for events
  hw_event: Consolidate uncorrected/corrected error msgs into one
  drivers/edac: rename channel_info to csrow_channel_info
  edac: Create a dimm struct and move the labels into it
  edac_mc_sysfs: Fix error handling
  edac: Add per dimm's sysfs nodes
  edac: Prepare to push down to drivers the filling of the dimm_info
  i5400_edac: Convert it to report memory with the new location
  i7300_edac: Convert it to report memory with the new location
  edac: move dimm properties to struct dimm_info
  edac: Don't initialize csrow's first_page & friends when not needed
  edac: move nr_pages to dimm struct
  edac: Add per-dimm sysfs show nodes
  edac: DIMM location cleanup
  edac: Add an error scope logic

 drivers/edac/amd64_edac.c       |   72 +++-------
 drivers/edac/amd76x_edac.c      |   14 +-
 drivers/edac/cell_edac.c        |   18 ++-
 drivers/edac/cpc925_edac.c      |   70 +++++-----
 drivers/edac/e752x_edac.c       |   48 ++++---
 drivers/edac/e7xxx_edac.c       |   49 ++++---
 drivers/edac/edac_mc.c          |  168 ++++++++++++++++++-----
 drivers/edac/edac_mc_sysfs.c    |  283 ++++++++++++++++++++++++++++++++++++---
 drivers/edac/i3000_edac.c       |   24 ++--
 drivers/edac/i3200_edac.c       |   24 ++--
 drivers/edac/i5000_edac.c       |   31 ++---
 drivers/edac/i5100_edac.c       |   67 +++++-----
 drivers/edac/i5400_edac.c       |   46 +++----
 drivers/edac/i7300_edac.c       |   47 ++++---
 drivers/edac/i7core_edac.c      |   46 +++----
 drivers/edac/i82443bxgx_edac.c  |   15 ++-
 drivers/edac/i82860_edac.c      |   13 +-
 drivers/edac/i82875p_edac.c     |   22 ++-
 drivers/edac/i82975x_edac.c     |   28 +++--
 drivers/edac/mpc85xx_edac.c     |   16 ++-
 drivers/edac/mv64x60_edac.c     |   22 ++--
 drivers/edac/pasemi_edac.c      |   24 ++--
 drivers/edac/ppc4xx_edac.c      |   25 ++--
 drivers/edac/r82600_edac.c      |   13 +-
 drivers/edac/sb_edac.c          |   44 ++++---
 drivers/edac/tile_edac.c        |   17 +--
 drivers/edac/x38_edac.c         |   24 ++--
 include/linux/edac.h            |   90 +++++++++++--
 include/trace/events/hw_event.h |  133 ++++++++++++++++++
 29 files changed, 1018 insertions(+), 475 deletions(-)
 create mode 100644 include/trace/events/hw_event.h

-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ