lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1369250155-12226-1-git-send-email-suravee.suthikulpanit@amd.com>
Date:	Wed, 22 May 2013 14:15:52 -0500
From:	<suravee.suthikulpanit@....com>
To:	<iommu@...ts.linux-foundation.org>, <joro@...tes.org>
CC:	<ddutile@...hat.com>, <alex.williamson@...hat.com>,
	<linux-kernel@...r.kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: [PATCH 0/3] iommu/amd: IOMMU Error Reporting/Handling/Filtering

From: Suravee Suthikulpanit <suravee.suthikulpanit@....com>

This patch set implements framework for handling errors reported via IOMMU 
event log. It also implements mechanism to filter/suppress error messages when 
IOMMU hardware generates large amount event logs, which is often caused by 
devices performing invalid operations or from misconfiguring IOMMU hardware 
(e.g. IO_PAGE_FAULT and INVALID_DEVICE_QEQUEST").

DEVICE vs IOMMU ERRORS:
=======================
Event types in AMD IOMMU event log can be categorized as:
    - IOMMU error : An error which is specific to IOMMU hardware
    - Device error: An error which is specific to a device
    - Non-error   : Miscelleneous events which are not classified as errors.
This patch set implements frameworks for handling "IOMMU error" and "device error".
For IOMMU error, the driver will log the event in dmesg and panic since the IOMMU 
hardware is no longer functioning. For device error, the driver will decode and 
log the error in dmesg based on the error logging level specified at boot time.

ERROR LOGGING LEVEL:
====================
The filtering framework introduce 3 levels of event logging, 
"AMD_IOMMU_LOG_[DEFAULT|VERBOSE|DEBUG]".  Users can specify the level 
via a new boot option "amd_iommu_log=[default|verbose|debug]".
    - default: Each error message is truncated. Filtering is enabled.
    - verbose: Output detail error message. Filtering is enabled.
    - debug  : Output detail error message. Filtering is disabled.

ERROR THRESHOLD LEVEL:
======================
Error threshold is used by the log filtering logic to determine when to suppress 
the errors from a particular device. The threshold is defined as "the number of errors
(X) over a specified period (Y sec)". When the threshold is reached, IOMMU driver will
suppress subsequent error messages from the device for a predefined period (Z sec). 
X, Y, and Z is currently hard-coded to 10 errors, 5 sec, and 30 sec.

DATA STRUCTURE:
===============
A new structure "struct dte_err_info" is added. It contains error information
specific to each device table entry (DTE). The structure is allocated dynamically 
per DTE when IOMMU driver handle device error for the first time.

ERROR STATES and LOG FILTERING:
============================================
The filtering framework define 3 device error states "NONE", "PROBATION" and "SUPPRESS". 
 1. From IOMMU driver intialization, all devices are in DEV_ERR_NONE state.  
 2. During interupt handling, IOMMU driver processes each entry in the event log.
 3. If an entry is device error, the driver tags DTE with DEV_ERR_PROBATION and
    report error via via dmesg. 
 4. For non-debug mode, if the device threshold is reached, the device is moved into 
    DEV_ERR_SUPPRESS state in which all error messages are suppressed.
 5. After the suppress period has passed, the driver put the device in probation state,
    and errors are reported once again. If the device continues to generate errors, 
    it will be re-suppress once the next threshold is reached.

EXAMPLE OUTPUT:
===============
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97040 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97070 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x97060 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x4970 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98840 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98870 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x98860 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x4980 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x99040 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Event=IO_PAGE_FAULT dev=3:0.0 dom=0x1b addr=0x99060 flg=N Ex Sup M P W Pm Ill Ta
AMD-Vi: Warning: IOMMU error threshold (10) reached for device=3:0.0. Suppress for 30 secs.!!!

Suravee Suthikulpanit (3):
  iommu/amd: Adding amd_iommu_log cmdline option
  iommu/amd: Add error handling/reporting/filtering logic
  iommu/amd: Remove old event printing logic

 Documentation/kernel-parameters.txt |   10 +
 drivers/iommu/Makefile              |    2 +-
 drivers/iommu/amd_iommu.c           |   85 +-------
 drivers/iommu/amd_iommu_fault.c     |  368 +++++++++++++++++++++++++++++++++++
 drivers/iommu/amd_iommu_init.c      |   19 ++
 drivers/iommu/amd_iommu_proto.h     |    6 +
 drivers/iommu/amd_iommu_types.h     |   16 ++
 7 files changed, 426 insertions(+), 80 deletions(-)
 create mode 100644 drivers/iommu/amd_iommu_fault.c

-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ