linux-kernel - Re: [PATCH v3 1/1] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <PSXP216MB043824FACB1E66E3F31890D2804E0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
Date:   Thu, 21 Nov 2019 14:52:41 +0000
From:   Nicholas Johnson <nicholas.johnson-opensource@...look.com.au>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "mika.westerberg@...ux.intel.com" <mika.westerberg@...ux.intel.com>,
        "corbet@....net" <corbet@....net>,
        "benh@...nel.crashing.org" <benh@...nel.crashing.org>,
        "logang@...tatee.com" <logang@...tatee.com>
Subject: Re: [PATCH v3 1/1] PCI: Fix bug resulting in double hpmemsize being
 assigned to MMIO window

On Tue, Nov 19, 2019 at 07:38:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2019 at 03:17:04AM +0000, Nicholas Johnson wrote:
> > I did just discover linux-next and I built it. Should I be doing this 
> > more often to help find regressions?
> 
> Yes, if you build and run linux-next, that's a great service because
> it helps find problems before they appear in mainline.

Funnily enough, I just built Linux next-20191121 and it has a NULL 
dereference on start-up, which renders the system unusable.

Can anybody else please confirm? I enabled most of the new options since 
the last linux-next a few days before.

I did just compile on an i7-4770K using my USB SSD to boot. I suppose 
there is a tiny chance that the CPU had an error and produced bad code. 
It is not my machine. It was pegged at 100 degrees Celsius the whole 
time.... I do find it hard to believe that I am the first to notice it, 
though. I cannot find any bug reports on this.

If this turns out to be an actual bug, is there a preferred way to 
report it? It is probably not from pci subsystem.

I can do a bisect, but they consume a lot of time on a slow system.

Here is a preliminary bug report (assuming you are meant to report 
linux-next bugs here):
https://bugzilla.kernel.org/show_bug.cgi?id=205621

Cheers!

Regards,
Nicholas Johnson

> 
> > I will now concentrate on fixing the problem where pci=nocrs does not 
> > ignore the bus resource. One motherboard I own gives 00-7e or similar, 
> > instead of 00-ff. The nocrs does not help, and I had to patch the kernel 
> > myself. Only acpi=off fixes the problem, while knocking out SMT (MADT), 
> > IOMMU (DMAR) and the ability to suspend without crashing.
> > 
> > If you disagree that nocrs should override bus resource, then let me 
> > know and I will not attempt this.
> 
> I guess the problem is that with "pci=nocrs", we ignore the MMIO and
> I/O port resources from _CRS, but we still pay attention to bus number
> resources in _CRS?  That does sound like it would be unexpected
> behavior.
> 
> I *would* like to see the complete dmesg log because these _CRS
> methods are pretty reliable because Windows relies on them as well, so
> problems are frequently a result of Linux defects.  If we can fix
> Linux or automatically work around issues so users don't have to use
> "pci=nocrs" explicitly, that's the best.
> 
> Bjorn