lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Sep 2020 22:42:02 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Hans de Goede <hdegoede@...hat.com>,
        Petr Tesarik <ptesarik@...e.cz>
Cc:     Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        netdev@...r.kernel.org, linux-clk@...r.kernel.org
Subject: Re: RTL8402 stops working after hibernate/resume

On 29.09.2020 22:35, Heiner Kallweit wrote:
> On 29.09.2020 22:08, Hans de Goede wrote:
>> Hi,
>>
>> On 9/29/20 9:07 PM, Petr Tesarik wrote:
>>> Hi Heiner (and now also Hans)!
>>>
>>> @Hans: I'm adding you to this conversation, because you're the author
>>> of commit b1e3454d39f99, which seems to break the r8169 driver on a
>>> laptop of mine.
>>
>> Erm, no, as you bi-sected yourself already commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>>
>> Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias
>> for Bay Trail / Cherry Trail") is about 18 months older.
>>
>>> On Fri, 25 Sep 2020 16:47:54 +0200
>>> Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>
>>>> On 25.09.2020 14:56, Petr Tesarik wrote:
>>>>> On Fri, 25 Sep 2020 11:52:41 +0200
>>>>> Petr Tesarik <ptesarik@...e.cz> wrote:
>>>>>   
>>>>>> On Fri, 25 Sep 2020 11:44:09 +0200
>>>>>> Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>>>  
>>>>>>> On 25.09.2020 10:54, Petr Tesarik wrote:
>>>>>> [...]
>>>>>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>>>>>        
>>>>>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>>>>>> If you can spend the effort to bisect the issue, this would be appreciated.
>>>>>>
>>>>>> OK, then I'm going to give it a try.
>>>>>
>>>>> Done. The system freezes when this commit is applied:
>>>>>
>>>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520
>>>>> Author: Heiner Kallweit <hkallweit1@...il.com>
>>>>> Date:   Wed Jun 17 22:55:40 2020 +0200
>>>>>
>>>>>      r8169: move switching optional clock on/off to pll power functions
>>>>>    
>>>> This sounds weird. On your system tp->clk should be NULL,
>>
>> Heiner, why do you say that tp->clk should be NULL on Petr's
>> system? Because it is an x86 based system?
>>
> This was a misunderstanding on my side, I was under the assumption
> that ether_clk applies to DT-configured systems only.
> 
>> Some X86 SoCs, specifically, the more tablet oriented Bay and Cherry
>> Trail SoCs, which are much more ARM SoC like then other X86 SoCs do
>> also use the clock framework and the SoC has a number of external clk
>> pins which are typically used by audio codecs and by ethernet chips.
>>
>>>> making
>>>> clk_prepare_enable() et al no-ops. Please check whether tp->clk
>>>> is NULL after the call to rtl_get_ether_clk().
>>>
>>> This might be part of the issue. On my system tp->clk is definitely not
>>> NULL:
>>>
>>> crash> *rtl8169_private.clk 0xffff9277aca58940
>>>    clk = 0xffff9277ac2c82a0
>>>
>>> crash> *clk 0xffff9277ac2c82a0
>>> struct clk {
>>>    core = 0xffff9277aef65d00,
>>>    dev = 0xffff9277aed000b0,
>>>    dev_id = 0xffff9277aec60c00 "0000:03:00.2",
>>>    con_id = 0xffff9277ad04b080 "ether_clk",
>>>    min_rate = 0,
>>>    max_rate = 18446744073709551615,
>>>    exclusive_count = 0,
>>>    clks_node = {
>>>      next = 0xffff9277ad2428d8,
>>>      pprev = 0xffff9277aef65dc8
>>>    }
>>> }
>>>
>>> The dev_id corresponds to the Ethernet controller:
>>>
>>> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)
>>>
>>> Looking at clk_find(), it matches this entry in clocks:
>>>
>>> struct clk_lookup {
>>>    node = {
>>>      next = 0xffffffffbc702f40,
>>>      prev = 0xffff9277bf7190c0
>>>    },
>>>    dev_id = 0x0,
>>>    con_id = 0xffff9277bf719524 "ether_clk",
>>>    clk = 0x0,
>>>    clk_hw = 0xffff9277ad2427f8
>>> }
>>>
>>> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
>>> at the platform initialization code, the "ether_clk" alias is created
>>> unconditionally. Hans already added.
>>
>> Petr, unconditionally is not really correct here, just as claiming
>> above that my commit broke things was not really correct either.
>>
>> I guess this is mostly semantics, but I don't appreciate
>> the accusatory tone here.
>>
>> The code in question binds to a clk-pmc-atom platform_device which
>> gets instantiated by drivers/platform/x86/pmc_atom.c. Which in turn
>> binds to a PCI device which is only present on Bay Trail and Cherry
>> Trail SoCs.
>>
>> IOW the commit operates as advertised in its Subject:
>> "clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail"
>>
>> So with that all clarified lets try to see if we can figure out
>> *why* this is actually happening.
>>
>> Petr, can you describe your hardware in some more detail,
>> in the bits quoted when you first Cc-ed me there is not that
>> much detail. What is the vendor and model of your laptop?
>>
>> Looking closer at commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>> I notice that the functions which now enable/disable the clock:
>> rtl_pll_power_up() and rtl_pll_power_down()
>>
>> Only run when the interface is up during suspend/resume.
>> Petr, I guess the laptop is not connected to ethernet when you
>> hibernate it?
>>
>> That means that on resume the clock will not be re-enabled.
>>
>> This is a subtle but important change and I believe that
>> this is what is breaking things. I guess that the PLL which
>> rtl_pll_power_up() / rtl_pll_power_down() controls is only
>> used for ethernet-timing.  But the external clock controlled
>> through pt->clk is a replacement for using an external
>> crystal with the r8169 chip. So with it disabled, the entire
>> chip does not have a clock and is essentially dead.
>> It can then e.g. not respond to any pci-e reads/writes done
>> to it.
>>
>> So I believe that the proper fix for this is to revert
>> commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>>
>> As that caused the whole chip's clock to be left off after
>> a suspend/resume while the interface is down.
>>
>> Also some remarks about this while I'm being a bit grumpy about
>> all this anyways (sorry):
>>
>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
>> to pll power functions") commit's message does not seem to really
>> explain why this change was made...
>>
>> 2. If a git blame would have been done to find the commit adding
>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
>> then you could have known that the clk in question is an external
>> clock for the entire chip, the commit message pretty clearly states
>> this (although "the entire" part is implied only) :
>>
>> "On some boards a platform clock is used as clock for the r8169 chip,
>> this commit adds support for getting and enabling this clock (assuming
>> it has an "ether_clk" alias set on it).
>>
> Even if the missing clock would stop the network chip completely,
> this shouldn't freeze the system as described by Petr.
> In some old RTL8169S spec an external 25MHz clock is mentioned,
> what matches the MII bus frequency. Therefore I'm not 100% convinced
> that the clock is needed for basic chip operation, but due to
> Realtek not releasing datasheets I can't verify this.
> 
> But yes, if reverting this change avoids the issue on Petr's system,
> then we should do it. A simple mechanical revert wouldn't work because
> source file structure has changed since then, so I'll prepare a patch
> that effectively reverts the change.
> 
Uups, the last part was written keeping the original introduction
of handling ether_clk in mind. My later change can be simply reverted.


>> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
>> enabled by the firmware") which is a previous attempt to fix this for some
>> x86 boards, but this causes all Cherry Trail SoC using boards to not reach
>> there lowest power states when suspending.
>>
>> This commit (together with an atom-pmc-clk driver commit adding the alias)
>> fixes things properly by making the r8169 get the clock and enable it when
>> it needs it."
>>
>> Regards,
>>
>> Hans
>>
> 
> Heiner
> 

Powered by blists - more mailing lists