lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8ACA82DB-D2FE-4599-8A01-D42218FDE1E5@redhat.com>
Date:   Thu, 5 Nov 2020 12:13:43 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Kalle Valo <kvalo@...eaurora.org>,
        Pavel Procopiuc <pavel.procopiuc@...il.com>, david@...hat.com,
        ath11k@...ts.infradead.org, linux-mm@...ck.org,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-wireless@...r.kernel.org
Subject: Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"


> Am 05.11.2020 um 11:42 schrieb Vlastimil Babka <vbabka@...e.cz>:
> 
> On 11/5/20 10:04 AM, Kalle Valo wrote:
>> (changing the subject, adding more lists and people)
>> Pavel Procopiuc <pavel.procopiuc@...il.com> writes:
>>> Op 04.11.2020 om 10:12 schreef Kalle Valo:
>>>> Yeah, it is unfortunately time consuming but it is the best way to get
>>>> bottom of this.
>>> 
>>> I have found the commit that breaks things for me, it's
>>> 7fef431be9c9ac255838a9578331567b9dba4477 mm/page_alloc: place pages to
>>> tail in __free_pages_core()
>>> 
>>> I've reverted it on top of the 5.10-rc2 and ath11k driver loads fine
>>> and I have wifi working.
>> Oh, very interesting. Thanks a lot for the bisection, otherwise we would
>> have never found out whats causing this.
>> David & mm folks: Pavel noticed that his QCA6390 Wi-Fi 6 device (driver
>> ath11k) failed on v5.10-rc1. After bisecting he found that the commit
>> below causes the regression. I have not been able to reproduce this and
>> for me QCA6390 works fine. I don't know if this needs a specific kernel
>> configuration or what's the difference between our setups.
>> Any ideas what might cause this and how to fix it?
>> Full discussion: http://lists.infradead.org/pipermail/ath11k/2020-November/000501.html
>> commit 7fef431be9c9ac255838a9578331567b9dba4477
>> Author:     David Hildenbrand <david@...hat.com>
>> AuthorDate: Thu Oct 15 20:09:35 2020 -0700
>> Commit:     Linus Torvalds <torvalds@...ux-foundation.org>
>> CommitDate: Fri Oct 16 11:11:18 2020 -0700
>>     mm/page_alloc: place pages to tail in __free_pages_core()
> 
> Let me paste from the ath11k discussion:
> 
>> * Relevant errors from the log:
>> # journalctl -b | grep -iP '05:00|ath11k'
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 0xd2100000-0xd21fffff 64bit]
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI support is experimental!
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 0xd2100000-0xd21fffff 64bit]
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 0002)
>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Requested to power ON
>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Power on setup success
>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, result: 1, err: 0
> 
> This seems to be ath11k_qmi_respond_fw_mem_request(). Why is it failure with error 0? No idea.
> 
> What would happen if all the GFP_KERNEL in the file were changed to GFP_DMA32?
> 
> I'm thinking the hardware perhaps doesn't like too high physical addresses or something. But if I think correctly, freeing to tail should actually move them towards head. So it's weird.

It depends in which order memory is exposed to MM, which might depend on other factors in some configurations.

This smells like it exposes an existing bug. Can you reproduce also with zone shuffling enabled?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ