lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c05bdbe-92d9-7969-4824-82a51e70a069@web.de>
Date:   Wed, 9 Jun 2021 19:07:58 +0200
From:   Ingmar Klein <ingmar_klein@....de>
To:     Pali Rohár <pali@...nel.org>
Cc:     bhelgaas@...gle.com, Bjorn Helgaas <helgaas@...nel.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: QCA6174 pcie wifi: Add pci quirks

Yes, would be really nice if you could do that.
Seems to work perfectly fine.

Thanks and have a nice rest of the day!
Best regards,
Ingmar


Am 08.06.2021 um 20:34 schrieb Pali Rohár:
> Hello! So should I add also 0x003e device id in next patch iteration?
>
> On Saturday 05 June 2021 16:46:36 Ingmar Klein wrote:
>> Hi Pali and Bjorn,
>>
>> finally found the time to test.
>> Pali's v3 patch seems to work like a charm for my card with "0x003e" id
>> as well.
>> Just finished compiling a pve-kernel v5.11.21 with Pali's patch,
>> slightly adjusted for my test card and the Ubuntu kernel source (no
>> functional differences, just minor adjustments to make it fit the
>> Proxmox pve-kernel).
>>
>> System works just fine, in contrast to without patch. Of course, no long
>> term tests, yet. However, it is looking really good.
>> Thanks guys!
>>
>> Best regards,
>> Ingmar
>>
>>
>> Am 28.05.2021 um 20:47 schrieb Ingmar Klein:
>>> Hi Pali,
>>> sorry for not checking that detail!
>>> Of course no problem that you couldn't test that ID. Will be glad to
>>> do so.
>>>
>>> I'll let you know how this turns out.
>>>
>>> Best regards,
>>> Ingmar
>>>
>>>
>>> Am 28.05.2021 um 20:21 schrieb Pali Rohár:
>>>> Hello Ingmar!
>>>>
>>>> Now I see that in your patch you have Atheros card with id 0x003e:
>>>> https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/
>>>>
>>>>
>>>> With my patch I have tested 5 different Atheros cards but none has id
>>>> 0x003e:
>>>> https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/
>>>>
>>>>
>>>> So my patch does not fix that issue for your 0x003e card. I just do not
>>>> have such card for testing.
>>>>
>>>> Could you try to apply my patch and then add your id 0x003e into quirk
>>>> list if it helps?
>>>>
>>>> On Friday 28 May 2021 20:08:52 Ingmar Klein wrote:
>>>>> Thanks to both of you, Bjorn and Pali!
>>>>> I had hoped that Pali would come with an appropriate fix. Good to know,
>>>>> that this is taken care of.
>>>>>
>>>>> Will test ASAP, but I am confident, that it will work anyway.
>>>>> Should it unexpectedly not fix my issues, I'll let you know.
>>>>> Have a nice weekend!
>>>>> Best regards,
>>>>> Ingmar
>>>>>
>>>>>
>>>>> Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas:
>>>>>> On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote:
>>>>>>>> [cc +Pali]
>>>>>>>>
>>>>>>>> On Thu, 15 Apr 2021 20:02:23 +0200
>>>>>>>> Ingmar Klein <ingmar_klein@....de> wrote:
>>>>>>>>
>>>>>>>>> First thanks to you both, Alex and Bjorn!
>>>>>>>>> I am in no way an expert on this topic, so I have to fully rely
>>>>>>>>> on your
>>>>>>>>> feedback, concerning this issue.
>>>>>>>>>
>>>>>>>>> If you should have any other solution approach, in form of
>>>>>>>>> patch-set, I
>>>>>>>>> would be glad to test it out. Just let me know, what you think
>>>>>>>>> might
>>>>>>>>> make sense.
>>>>>>>>> I will wait for your further feedback on the issue. In the
>>>>>>>>> meantime I
>>>>>>>>> have my current workaround via quirk entry.
>>>>>>>>>
>>>>>>>>> By the way, my layman's question:
>>>>>>>>> Do you think, that the following topic might also apply for the
>>>>>>>>> QCA6174?
>>>>>>>>> https://www.spinics.net/lists/linux-pci/msg106395.html
>>>>>>> I have been testing more ath cards and I'm going to send a new
>>>>>>> version
>>>>>>> of this patch with including more PCI ids.
>>>>>> Dropping this patch in favor of Pali's new version.
>>>>>>
>>>>>>>>> Or in other words, should a similar approach be tried for the
>>>>>>>>> QCA6174
>>>>>>>>> and if yes, would it bring any benefit at all?
>>>>>>>>> I hope you can excuse me, in case the questions should not make
>>>>>>>>> too much
>>>>>>>>> sense.
>>>>>>>> If you run lspci -vvv on your device, what do LnkCap and LnkSta
>>>>>>>> report
>>>>>>>> under the express capability?  I wonder if your device even supports
>>>>>>>>> Gen1 speeds, mine does not.
>>>>>>>> I would not expect that patch to be relevant to you based on your
>>>>>>>> report.  I understand it to resolve an issue during link
>>>>>>>> retraining to a
>>>>>>>> higher speed on boot, not during a bus reset.  Pali can correct
>>>>>>>> if I'm
>>>>>>>> wrong.  Thanks,
>>>>>>> These two issues are are related. Both operations (PCIe Hot Reset and
>>>>>>> PCIe Link Retraining) cause reset of ath chips. Seems that they cause
>>>>>>> double reset. After reset these chips reads configuration from
>>>>>>> internal
>>>>>>> EEPROM/OTP and if another reset is triggered prior chip finishes
>>>>>>> internal configuration read then it stops working. My testing showed
>>>>>>> that ath10k chips completely disappear from the PCIe bus, some ath9k
>>>>>>> chips works fine but starts reporting incorrect PCI ID (0xABCD)
>>>>>>> and some
>>>>>>> other ath9k chips reports correct PCI ID but does not work. I had
>>>>>>> discussion with Adrian Chadd who knows probably everything about
>>>>>>> ath9k
>>>>>>> and confirmed me that this issue is there with ath9k and ath10k
>>>>>>> chips.
>>>>>>>
>>>>>>> He wrote me that workaround to turn card back from this "broken"
>>>>>>> state
>>>>>>> is to do PCIe Cold Reset of the card, which means turning power
>>>>>>> supply
>>>>>>> off for particular PCIe slot. Such thing is not supported on many
>>>>>>> low-end boards, so workaround cannot be applied.
>>>>>>>
>>>>>>> I was able to recover my testing cards from this "broken" state by
>>>>>>> PCIe
>>>>>>> Warm Reset (= reset via PERST# pin).
>>>>>>>
>>>>>>> I have tried many other reset methods (PCIe PM reset, Link Down, PCIe
>>>>>>> Hot Reset with bigger internal, ...) but nothing worked. So seems
>>>>>>> that
>>>>>>> the only workaround is to do PCIe Cold Reset or PCIe Warm Reset.
>>>>>>>
>>>>>>> I will send V2 of my patch with details and explanation.
>>>>>>>
>>>>>>> As kernel does not have API for doing PCIe Warm Reset, I think is
>>>>>>> another argument why kernel really needs it.
>>>>>>>
>>>>>>> I do not have any QCA6174 card for testing, but based on the fact I
>>>>>>> reproduced this issue with more ath9k and ath10 cards and Adrian
>>>>>>> confirmed that above reset issue is there, I think that it affects
>>>>>>> all
>>>>>>> AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers.
>>>>>>>
>>>>>>> I was told that AMI BIOS was patching their BIOSes found in
>>>>>>> notebooks to
>>>>>>> avoid triggering this issue on notebooks ath9k cards.
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> Am 15.04.2021 um 04:36 schrieb Alex Williamson:
>>>>>>>>>> On Wed, 14 Apr 2021 16:03:50 -0500
>>>>>>>>>> Bjorn Helgaas <helgaas@...nel.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> [+cc Alex]
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote:
>>>>>>>>>>>> Edit: Retry, as I did not consider, that my mail-client would
>>>>>>>>>>>> make this
>>>>>>>>>>>> party html.
>>>>>>>>>>>>
>>>>>>>>>>>> Dear maintainers,
>>>>>>>>>>>> I recently encountered an issue on my Proxmox server system,
>>>>>>>>>>>> that
>>>>>>>>>>>> includes a Qualcomm QCA6174 m.2 PCIe wifi module.
>>>>>>>>>>>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX
>>>>>>>>>>>>
>>>>>>>>>>>> On system boot and subsequent virtual machine start (with
>>>>>>>>>>>> passed-through
>>>>>>>>>>>> QCA6174), the VM would just freeze/hang, at the point where
>>>>>>>>>>>> the ath10k
>>>>>>>>>>>> driver loads.
>>>>>>>>>>>> Quick search in the proxmox related topics, brought me to the
>>>>>>>>>>>> following
>>>>>>>>>>>> discussion, which suggested a PCI quirk entry for the QCA6174
>>>>>>>>>>>> in the kernel:
>>>>>>>>>>>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I then went ahead, got the Proxmox kernel source (v5.4.106)
>>>>>>>>>>>> and applied
>>>>>>>>>>>> the attached patch.
>>>>>>>>>>>> Effect was as hoped, that the VM hangs are now gone. System
>>>>>>>>>>>> boots and
>>>>>>>>>>>> runs as intended.
>>>>>>>>>>>>
>>>>>>>>>>>> Judging by the existing quirk entries for Atheros, I would
>>>>>>>>>>>> think, that
>>>>>>>>>>>> my proposed "fix" could be included in the vanilla kernel.
>>>>>>>>>>>> As far as I saw, there is no entry yet, even in the latest
>>>>>>>>>>>> kernel sources.
>>>>>>>>>>> This would need a signed-off-by; see
>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is an old issue, and likely we'll end up just applying
>>>>>>>>>>> this as
>>>>>>>>>>> yet another quirk.  But looking at c3e59ee4e766 ("PCI: Mark
>>>>>>>>>>> Atheros
>>>>>>>>>>> AR93xx to avoid bus reset"), where it started, it seems to be
>>>>>>>>>>> connected to 425c1b223dac ("PCI: Add Virtual Channel to
>>>>>>>>>>> save/restore
>>>>>>>>>>> support").
>>>>>>>>>>>
>>>>>>>>>>> I'd like to dig into that a bit more to see if there are any
>>>>>>>>>>> clues.
>>>>>>>>>>> AFAIK Linux itself still doesn't use VC at all, and
>>>>>>>>>>> 425c1b223dac added
>>>>>>>>>>> a fair bit of code.  I wonder if we're restoring something out of
>>>>>>>>>>> order or making some simple mistake in the way to restore VC
>>>>>>>>>>> config.
>>>>>>>>>> I don't really have any faith in that bisect report in commit
>>>>>>>>>> c3e59ee4e766.  To double check I dug out the card from that
>>>>>>>>>> commit,
>>>>>>>>>> installed an old Fedora release so I could build kernel v3.13,
>>>>>>>>>> pre-dating 425c1b223dac and tested triggering a bus reset both via
>>>>>>>>>> setpci and by masking PM reset so that sysfs can trigger the
>>>>>>>>>> bus reset
>>>>>>>>>> path with the kernel save/restore code.  Both result in the system
>>>>>>>>>> hanging when the device is accessed either restoring from the
>>>>>>>>>> kernel
>>>>>>>>>> bus reset or reading from the device after the setpci reset.
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ