lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 15 Feb 2018 19:09:46 +0000
From:   Dmitry Safonov <dima@...sta.com>
To:     Joerg Roedel <joro@...tes.org>
Cc:     linux-kernel@...r.kernel.org, 0x7f454c46@...il.com,
        Alex Williamson <alex.williamson@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        iommu@...ts.linux-foundation.org
Subject: Re: [PATCHv2 4/6] iommu/intel: Handle DMAR faults on workqueue

On Tue, 2018-02-13 at 17:38 +0000, Dmitry Safonov wrote:
> On Tue, 2018-02-13 at 17:35 +0100, Joerg Roedel wrote:
> > On Mon, Feb 12, 2018 at 04:48:23PM +0000, Dmitry Safonov wrote:
> > > dmar_fault() reports/handles/cleans DMAR faults in a cycle one-
> > > by-
> > > one.
> > > The nuisance is that it's set as a irq handler and runs with
> > > disabled
> > > interrupts - which works OK if you have only a couple of DMAR
> > > faults,
> > > but becomes a problem if your intel iommu has a plenty of
> > > mappings.
> > 
> > I don't think that a work-queue is the right solution here, it adds
> > a
> > long delay until the log is processed. During that delay, and with
> > high
> > fault rates the error log will overflow during that delay.
> > 
> > Here is what I think you should do instead to fix the soft-lockups:
> > 
> > First, unmask the fault reporting irq so that you will get
> > subsequent
> > irqs. Then:
> > 
> > 	* For Primary Fault Reporting just cycle once through all
> > 	  supported fault recording registers.
> > 
> > 	* For Advanced Fault Reporting, read start and end pointer of
> > 	  the log and process all entries.
> > 
> > After that return from the fault handler and let the next irq
> > handle
> > additional faults that might have been recorded while the previous
> > handler was running.
> 
> Ok, will re-do this way, thanks.
> 
> > And of course, ratelimiting the fault printouts is always a good
> > idea.

I've looked at this more,
It turns out that fault handler is used only to printk() errors about
faults happened. So, we shouldn't care about added delay with wq, as
it's just another ratelimit for printing of faults.
I also measured how much time it takes to read/clean fault and how much
time it takes to print info about the fault.

Heh, anyway I'll resend v3 of 6/6 patch, which ratelimits printks by
each fault, rather by each irq - it'll work for me with only that patch
applied.

-- 
            Dima

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ