lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200903093822.52012-1-suravee.suthikulpanit@amd.com>
Date:   Thu,  3 Sep 2020 09:38:20 +0000
From:   Suravee Suthikulpanit <suravee.suthikulpanit@....com>
To:     linux-kernel@...r.kernel.org, iommu@...ts.linux-foundation.org
Cc:     joro@...tes.org, sean.m.osborne@...cle.com,
        james.puthukattukaran@...cle.com, joao.m.martins@...cle.com,
        boris.ostrovsky@...cle.com, jon.grimm@....com,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: [PATCH 0/2 v2] iommu: amd: Fix intremap IO_PAGE_FAULT for VMs

Interrupt remapping IO_PAGE_FAULT has been observed under system w/
large number of VMs w/ pass-through devices. This can be reproduced with
64 VMs + 64 pass-through VFs of Mellanox MT28800 Family [ConnectX-5 Ex],
where each VM runs small-packet netperf test via the pass-through device
to the netserver running on the host. All VMs are running in reboot loop,
to trigger IRTE updates.

In addition, to accelerate the failure, irqbalance is triggered periodically
(e.g. 1-5 sec), which should generate large amount of updates to IRTE.
This setup generally triggers IO_PAGE_FAULT within 3-4 hours.

Investigation has shown that the issue is in the code to update IRTE
while remapping is enabled. Please see patch 2/2 for detail discussion.

This serires has been tested running in the setup mentioned above
upto 96 hours w/o seeing issues.

Changes from v1 (https://lkml.org/lkml/2020/9/2/26)
  * Fix typo in comments and commit messages
  * Fix logic to check for X86_FEATURE_CX16 support in patch 2/2

Thanks,
Suravee

Suravee Suthikulpanit (2):
  iommu: amd: Restore IRTE.RemapEn bit after programming IRTE
  iommu: amd: Use cmpxchg_double() when updating 128-bit IRTE

 drivers/iommu/amd/Kconfig |  2 +-
 drivers/iommu/amd/init.c  | 21 +++++++++++++++++++--
 drivers/iommu/amd/iommu.c | 19 +++++++++++++++----
 3 files changed, 35 insertions(+), 7 deletions(-)

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ