lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191224015926.GC13083@ming.t460p>
Date:   Tue, 24 Dec 2019 09:59:26 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     John Garry <john.garry@...wei.com>, tglx@...utronix.de,
        "chenxiang (M)" <chenxiang66@...ilicon.com>, bigeasy@...utronix.de,
        linux-kernel@...r.kernel.org, hare@...e.com, hch@....de,
        axboe@...nel.dk, bvanassche@....org, peterz@...radead.org,
        mingo@...hat.com, Zhang Yi <yi.zhang@...hat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity
  for managed interrupt

On Mon, Dec 23, 2019 at 10:47:07AM +0000, Marc Zyngier wrote:
> On 2019-12-23 10:26, John Garry wrote:
> > > > > > I've also managed to trigger some of them now that I have
> > > > > access to
> > > > > > a decent box with nvme storage.
> > > > > 
> > > > > I only have 2x NVMe SSDs when this occurs - I should not be
> > > > > hitting this...
> > > > > 
> > > > > Out of curiosity, have you tried
> > > > > > with the SMMU disabled? I'm wondering whether we hit some
> > > > > livelock
> > > > > > condition on unmapping buffers...
> > > > > 
> > > > > No, but I can give it a try. Doing that should lower the CPU
> > > > > usage, though,
> > > > > so maybe masks the issue - probably not.
> > > > 
> > > > Lots of CPU lockup can is performance issue if there isn't
> > > > obvious bug.
> > > > 
> > > > I am wondering if you may explain it a bit why enabling SMMU may
> > > > save
> > > > CPU a it?
> > > The other way around. mapping/unmapping IOVAs doesn't comes for
> > > free.
> > > I'm trying to find out whether the NVMe map/unmap patterns trigger
> > > something unexpected in the SMMU driver, but that's a very long
> > > shot.
> > 
> > So I tested v5.5-rc3 with and without the SMMU enabled, and without
> > the SMMU enabled I don't get the lockup.
> 
> OK, so my hunch wasn't completely off... At least we have something
> to look into.
> 
> [...]
> 
> > Obviously this is not conclusive, especially with such limited
> > testing - 5 minute runs each. The CPU load goes up when disabling the
> > SMMU, but that could be attributed to extra throughput (1183K ->
> > 1539K) loading.
> > 
> > I do notice that since we complete the NVMe request in irq context,
> > we also do the DMA unmap, i.e. talk to the SMMU, in the same context,
> > which is less than ideal.
> 
> It depends on how much overhead invalidating the TLB adds to the
> equation, but we should be able to do some tracing and find out.
> 
> > I need to finish for the Christmas break today, so can't check this
> > much further ATM.
> 
> No worries. May I suggest creating a new thread in the new year, maybe
> involving Robin and Will as well?

Zhang Yi has observed the CPU lockup issue once when running heavy IO on
single nvme drive, and please CC him if you have new patch to try.

Then looks the DMA unmap cost is too big on aarch64 if SMMU is involved.


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ