lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220203174942.31630-1-nchatrad@amd.com>
Date:   Thu, 3 Feb 2022 11:49:30 -0600
From:   Naveen Krishna Chatradhi <nchatrad@....com>
To:     <linux-edac@...r.kernel.org>, <x86@...nel.org>
CC:     <linux-kernel@...r.kernel.org>, <bp@...en8.de>, <mingo@...hat.com>,
        <mchehab@...nel.org>, <yazen.ghannam@....com>,
        Muralidhara M K <muralimk@....com>
Subject: [PATCH v7 00/12] x86/edac/amd64: Add support for GPU nodes

From: Muralidhara M K <muralimk@....com>

On heterogeneous systems made up of AMD CPUs and GPUs, where the
data fabrics of CPUs and GPUs are connected directly via custom links.
UMC MCA banks on GPUs can be viewed similar to the UMCs banks on the CPUs.
Hence, memory errors on GPU UMCs can be reported via edac framework.

This patchset applies on top of the following series
[v4,00/24] AMD MCA Address Translation Updates
https://patchwork.kernel.org/project/linux-edac/cover/20220127204115.384161-1-yazen.ghannam@amd.com/

Each patch was build tested individually. The entire set was
tested for address translation and error counts on GPU
memory.

This patchset does the following
1. edac.rst:
   a. Add Documentation support for heterogeneous systems

2. amd_nb.c:
   a. Add support for northbridges on Aldebaran GPU nodes
   b. export AMD node map details to be used by edac and mce modules
	
3. mce_amd module:
   a. Identify the node ID where the error is and map the node id
      to linux enumerated node id.

4. Modifies the amd64_edac module
   a. Refactor the code, define new family op routines and use
      struct amd64_pvt. Making struct fam_type obsolete.
   b. Enumerate UMCs and HBMs on the GPU nodes

5. DF3.5 Address translation support
   a. Support Data Fabric 3.5 Address translation
   b. Fixed UMC to CS mapping for errors


Muralidhara M K (6):
  EDAC/amd64: edac.rst: Add Doc support for heterogeneous systems
  x86/amd_nb: Add support for northbridges on Aldebaran
  EDAC/amd64: Move struct fam_type variables into amd64_pvt structure
  EDAC/amd64: Define dynamic family ops routines
  EDAC/amd64: Add AMD heterogeneous family 19h Model 30h-3fh
  EDAC/amd64: Add address translation support for DF3.5

Naveen Krishna Chatradhi (3):
  EDAC/mce_amd: Extract node id from MCA_IPID
  EDAC/amd64: Enumerate Aldebaran GPU nodes by adding family ops
  EDAC/amd64: Add Family ops to update GPU csrow and channel info

Yazen Ghannam (3):
  EDAC/amd64: Add check for when to add DRAM base and hole
  EDAC/amd64: Save the number of block instances
  EDAC/amd64: Add fixed UMC to CS mapping

 Documentation/driver-api/edac.rst |    9 +
 arch/x86/include/asm/amd_nb.h     |    9 +
 arch/x86/kernel/amd_nb.c          |  149 ++-
 drivers/edac/amd64_edac.c         | 1450 ++++++++++++++++++++---------
 drivers/edac/amd64_edac.h         |  203 +++-
 drivers/edac/mce_amd.c            |   23 +-
 include/linux/pci_ids.h           |    1 +
 7 files changed, 1345 insertions(+), 499 deletions(-)

-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ