lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211129105629.5ddfb6cf.alex.williamson@redhat.com>
Date:   Mon, 29 Nov 2021 10:56:29 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Matthew Ruffell <matthew.ruffell@...onical.com>
Cc:     linux-pci@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>,
        kvm@...r.kernel.org, nathan.langford@...lesunifiedtechnologies.com
Subject: Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing
 through 2x GPUs that share same pci switch via vfio

On Wed, 24 Nov 2021 18:52:16 +1300
Matthew Ruffell <matthew.ruffell@...onical.com> wrote:

> Hi Alex,
> 
> I have forward ported your patch to 5.16-rc2 to account for the vfio module
> refactor that happened recently. Attached below.
> 
> Have you had an opportunity to research if it is possible to conditionalise
> clearing DisINTx by looking at the interrupt status and seeing if there is a
> pending interrupt but no handler set?

Sorry, I've not had any time to continue looking at this.  When I last
left it I had found that interrupt bit in the status register was not
set prior to clearing INTxDisable in the command register, but the
status register was immediately set upon clearing INTxDisable.  That
suggests we could generalize re-masking INTx since we know there's not
a handler for it at this point, but it's not clear how this state gets
reported and cleared.  More generally, should the interrupt code leave
INTx unmasked for any case where there's no handler.  I'm not sure.

> We are testing a 5.16-rc2 kernel with the patch applied on Nathan's server
> currently, and we are also trying out the pci=clearmsi command line parameter
> that was discussed on linux-pci a few years ago in [1][2][3][4] along with
> setting snd-hda-intel.enable_msi=1 to see if it helps the crashkernel not get
> stuck copying IR tables.
> 
> [1] https://marc.info/?l=linux-pci&m=153988799707413
> [2] https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/
> [3] https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/
> [4] https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/
> 
> I will let you know how we get on.

Ok.  I've not had any luck reproducing audio INTx issues, any trying to
test it has led me on several tangent bug hunts :-\  Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ