[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31f66d70-95eb-12dd-1d01-0830d118f55a@redhat.com>
Date: Fri, 6 Nov 2020 21:41:37 +0100
From: David Hildenbrand <david@...hat.com>
To: Pavel Procopiuc <pavel.procopiuc@...il.com>
Cc: Vlastimil Babka <vbabka@...e.cz>,
Kalle Valo <kvalo@...eaurora.org>, ath11k@...ts.infradead.org,
linux-mm@...ck.org, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, linux-wireless@...r.kernel.org
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to
tail in __free_pages_core()"
On 06.11.20 18:32, Pavel Procopiuc wrote:
> Op 05.11.2020 om 21:23 schreef David Hildenbrand:
>>> So just to make sure I understand you correctly, you'd like to see if the problem with ath11k driver on my hardware persists when I boot pristine 5.10-rc2 kernel (without reverting commit 7fef431be9c9ac255838a9578331567b9dba4477) and with page_alloc.shuffle=1, right?
>>>
>>
>> Right, but as lists are randomized then it might take a couple of tries to reproduce. I‘ll have a look at the driver code / failing path on Monday, when back to work.
>
> I have done 5 boots of pristine 5.10-rc2 with page_alloc.shuffle=1. Out of those: 1st, 2nd, 4th and 5th resulted in
> working ath11k driver, logs were the same as with the commit 7fef431be9c9ac255838a9578331567b9dba4477 reverted. The 3rd
> one failed, but in a different way, I just had no output from the driver after initialization lines:
>
> Nov 06 18:19:41 razor kernel: Linux version 5.10.0-rc2 (root@...or) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #8 SMP Fri Nov 6 18:14:36 CET 2020
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Power on setup success
>
> I had this before and usually it was fixed after rebooting into Windows and back. This time I just went and rebooted
> into Linux again and driver was working on that boot (4th).
I'm sorry, but "WARNING: ath11k PCI support is experimental!" and such
occasional issues don't give me the best feeling that everything is
operating as it should :)
>
> After that I removed page_alloc.shuffle=1 and did 2 additional boots, both of them resulted in a non-working driver with
> the error messages about not being able to talk to firmware like I had before on the clean 5.10-rc2:
>
> Nov 06 18:24:07 razor kernel: Linux version 5.10.0-rc2 (root@...or) (gcc (Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
> p6) 2.34.0) #9 SMP Fri Nov 6 18:22:43 CET 2020
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
> Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
> Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Requested to power ON
> Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Power on setup success
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-22
> Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> Nov 06 18:25:39 razor kernel: mhi 0000:05:00.0: Device failed to exit MHI Reset state
>
Okay, that means that you should be able to reproduce
pre-7fef431be9c9ac255838a9578331567b9dba4477 with page_alloc.shuffle=1
as well ... it just might take a lot of tries to get a problematic page.
I could also imagine that loading the driver deferred, after quite some
system/mm activity could result in the same issue.
Looks like something either cannot handle a specific address we received
via dma_alloc_coherent(), or something is reading out of bounds, and the
content after our allocated page doesn't have the expected value anymore
(e.g., used to be zero, now no longer zero).
What puzzles me is that "err: 0". That should have been properly set by
HW, no?
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists