linux-kernel - Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180213163542.tdqhazwfjgqk3zuu@8bytes.org>
Date:   Tue, 13 Feb 2018 17:35:42 +0100
From:   Joerg Roedel <joro@...tes.org>
To:     Dmitry Safonov <dima@...sta.com>
Cc:     linux-kernel@...r.kernel.org, 0x7f454c46@...il.com,
        Alex Williamson <alex.williamson@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        iommu@...ts.linux-foundation.org
Subject: Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> dmar_fault() reports/handles/cleans DMAR faults in a cycle one-by-one.
> The nuisance is that it's set as a irq handler and runs with disabled
> interrupts - which works OK if you have only a couple of DMAR faults,
> but becomes a problem if your intel iommu has a plenty of mappings.

I don't think that a work-queue is the right solution here, it adds a
long delay until the log is processed. During that delay, and with high
fault rates the error log will overflow during that delay.

Here is what I think you should do instead to fix the soft-lockups:

First, unmask the fault reporting irq so that you will get subsequent
irqs. Then:

	* For Primary Fault Reporting just cycle once through all
	  supported fault recording registers.

	* For Advanced Fault Reporting, read start and end pointer of
	  the log and process all entries.

After that return from the fault handler and let the next irq handle
additional faults that might have been recorded while the previous
handler was running.

And of course, ratelimiting the fault printouts is always a good idea.

Regards,

	Joerg