lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2696c9c2-29f7-18cf-1893-bd2fc2f203f9@redhat.com>
Date:   Tue, 30 Oct 2018 12:05:03 +0100
From:   Hans de Goede <hdegoede@...hat.com>
To:     Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
        Dean Wallace <duffydack73@...il.com>,
        Andy Shevchenko <andy.shevchenko@...il.com>
Cc:     Stephen Boyd <sboyd@...nel.org>,
        Michael Turquette <mturquette@...libre.com>,
        linux-clk <linux-clk@...r.kernel.org>,
        Stable <stable@...r.kernel.org>,
        Johannes Stezenbach <js@...21.net>,
        Carlo Caione <carlo@...lessm.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mogens Jensen <mogens-jensen@...tonmail.com>
Subject: Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

Hi,

On 30-10-18 11:17, Hans de Goede wrote:
> Hi Pierre-Louis,
> 
> On 29-10-18 23:03, Pierre-Louis Bossart wrote:
>>
>> On 10/29/18 2:08 PM, Dean Wallace wrote:
>>> On 29-10-18, Andy Shevchenko wrote:
>>>> On Mon, Oct 29, 2018 at 7:52 PM Andy Shevchenko
>>>> <andy.shevchenko@...il.com> wrote:
>>>>> Cc: Pierre as well.
>>>>>
>>>>> On Mon, Oct 29, 2018 at 7:48 PM Stephen Boyd <sboyd@...nel.org> wrote:
>>>>>> Quoting Dean Wallace (2018-10-25 16:25:17)
>>>>>>> I have found a regression in 4.18.15 that means I lose sound on my old
>>>>>>> Toshiba Chromebook 2 (Swanky).  My system details are:-
>>>>>>>
>>>>>>> Toshiba Chromebook (Swanky)
>>>>>>> MrChromebox UEFI coreboot
>>>>>>> Arch Linux running latest alsa/pulseaudio
>>>>>>>
>>>>>>> Upgraded kernel from 4.18.14 to 4.18.15 and lost all sound output.  By
>>>>>>> output I mean, the card is still detected, the module loaded, all apps
>>>>>>> showing sound is being playing, but no actual audible sound comes
>>>>>>> through.  Upgraded to 4.18.16 same issue.
>>>>>>>
>>>>>>> Dug around and found Upstream commit 648e921888ad96ea3dc922739e96716ad3225d7f
>>>>>>> clk: x86: Stop marking clocks as CLK_IS_CRITICAL
>>>>>>> "This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail
>>>>>>> devices not being able to reach S0i3 greatly decreasing their battery
>>>>>>> drain when suspended."
>>>>>>>
>>>>>>> I reverted it and compiled 4.18.16 and have sound back again.  Could
>>>>>>> this be looked into, with possibility of fix.
>>>>>>>
>>>>>> Thanks for the bug report. I'm adding some people involved in the commit
>>>>>> you mention is causing audio regressions. The best plan is to probably
>>>>>> revert the commit from the 4.18 linux stable tree. Or there may be
>>>>>> another patch missing that would be useful to make this backported patch
>>>>>> work. Hopefully Hans or Andy knows.
>>>>> Hans has been investigating S0ix issues on Baytrail and Cherrytrail machines.
>>>>> I have a feeling that the problem can be fixed by properly handling
>>>>> clock in ASoC driver(s). Perhaps Hans and Pierre can figure this out
>>>>> better than me.
>>>> Looking to sound/soc/intel/boards/cht_bsw_rt5672.c I see no
>>>> suspend-resume hooks. Perhaps, adding them like in the commit
>>>> ac8bd9e13be2 ("r8169: Disable clk during suspend / resume") would
>>>> help.
>>
>> I missed this change while i was away. It's indeed the expectation that the audio mclk is handled by the firmware,
> 
> I believe that the statement "It's indeed the expectation that the
> audio mclk is handled by the firmware" is not correct for BYT/CHT
> platforms (and the commit causing the regression only affects BYT/CHT
> platforms). Various machine drivers under sound/soc/intel/boards have
> code to deal with the mclk themselves, like this:
> 
>          drv->mclk = devm_clk_get(&pdev->dev, "pmc_plt_clk_3");
> ...
> 
>          if (SND_SOC_DAPM_EVENT_ON(event)) {
>                  ret = clk_prepare_enable(ctx->mclk);
>          ...
>          } else {
>                  clk_disable_unprepare(ctx->mclk);
>          }
> 
> The above code is from sound/soc/intel/boards/cht_bsw_max98090_ti.c
> which is the machine driver used on the machines for which problems
> are now being reported.
> 
> And my commit introducing the problem is in essence a revert of:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=d31fd43c0f9a41e2678a1e78c0f22f0384c6edd3
> 
> Which is from 2017-07-14 and the ASoC code for BYT/CHT platforms is
> much older then that.
> 
> Also I asked Carlo why he wrote that patch and it was to fix a problem
> with ethernet on some laptops.
> 
>> not sure I understand why removing the CLK_IS_CRITICAL was necessary or what it has to do with S0ix.
> 
> It is necessary, because if it is set the clock never gets disabled and
> on x86 platforms using suspend2idle we are responsible for *all* the hardware
> power-management as OS. If we do not disable the clocks then we can only
> reach S0i1 instead of S0i3 when suspended leading to increased battery
> drain during suspend.
> 
> Also note that this patch is only causing problems on CHT + max98090 codec
> using machines. I've tested it on many other BYT/CHT machines myself and
> we've not had any other bug reports related to this.
> 
> So this clearly points to a problem with the clock management in the
> cht_bsw_max98090_ti.c machine driver.
> 
> I've some ideas how to fix this and I will prepare some patches to test.

p.s.

It seems that a sub-thread of this thread where me and the reporter have
been working on debugging this has lost most of the people in the Cc.

So here is a quick status update. So far this issue has been seen on
Swanky and Clapper model chromebooks.

Reverting the patch causing this regression and then doing:

for i in /sys/kernel/debug/clk/pmc_plt_clk_?; do echo -n "$i: "; cat $i/clk_flags; echo; done

has shown that before my patch to not set CLK_IS_CRITICAL the
CLK_IS_CRITICAL was being set on pmc_plt_clk_0. Which shows
that that clock is somehow being used for sounds on these boards.

I think that pmc_plt_clk_0 is being used instead of pmc_plt_clk_3,
but it could also be the case that both are being used.

I'm currently preparing patches to test both scenarios.

Regards,

Hans

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ