[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1bda8e8f-10df-4a10-a217-26cf50ef3577@ideasonboard.com>
Date: Wed, 5 Jun 2024 14:16:49 +0300
From: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
To: Ulf Hansson <ulf.hansson@...aro.org>
Cc: Geert Uytterhoeven <geert+renesas@...der.be>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>, "Rafael J . Wysocki" <rafael@...nel.org>,
Rob Herring <robh@...nel.org>, Saravana Kannan <saravanak@...gle.com>,
Claudiu Beznea <claudiu.beznea.uj@...renesas.com>,
Peng Fan <peng.fan@....com>, linux-pm@...r.kernel.org,
linux-serial@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
Devarsh Thakkar <devarsht@...com>,
Laurent Pinchart <laurent.pinchart@...asonboard.com>
Subject: Re: [PATCH/RFC 0/3] pmdomain: renesas: rmobile-sysc: Remove serial
console handling
On 05/06/2024 13:53, Ulf Hansson wrote:
> On Wed, 5 Jun 2024 at 12:41, Tomi Valkeinen
> <tomi.valkeinen@...asonboard.com> wrote:
>>
>> Hi Ulf,
>>
>> On 05/06/2024 12:34, Ulf Hansson wrote:
>>> + Tomi
>>>
>>> On Mon, 27 May 2024 at 14:41, Geert Uytterhoeven
>>> <geert+renesas@...der.be> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Since commit a47cf07f60dcb02d ("serial: core: Call
>>>> device_set_awake_path() for console port"), the serial driver properly
>>>> handles the case where the serial console is part of the awake path, and
>>>> it looked like we could start removing special serial console handling
>>>> from PM Domain drivers like the R-Mobile SYSC PM Domain driver.
>>>> Unfortunately the devil is in the details, as usual...
>>>>
>>>> Earlycon relies on the serial port to be initialized by the firmware
>>>> and/or bootloader. Linux is not aware of any hardware dependencies that
>>>> must be met to keep the port working, and thus cannot guarantee they
>>>> stay met, until the full serial driver takes over.
>>>>
>>>> E.g. all unused clocks and unused PM Domains are disabled in a late
>>>> initcall. As this happens after the full serial driver has taken over,
>>>> the serial port's clock and/or PM Domain are no longer deemed unused,
>>>> and this is typically not a problem.
>>>>
>>>> However, if the serial port's clock or PM Domain is shared with another
>>>> device, and that other device is runtime-suspended before the full
>>>> serial driver has probed, the serial port's clock and/or PM Domain will
>>>> be disabled inadvertently. Any subsequent serial console output will
>>>> cause a crash or system lock-up. E.g. on R/SH-Mobile SoCs, the serial
>>>> ports share their PM Domain with several other I/O devices. After the
>>>> use of pwm (Armadillo-800-EVA) or i2c (KZM-A9-GT) during early boot,
>>>> before the full serial driver takes over, the PM Domain containing the
>>>> early serial port is powered down, causing a lock-up when booted with
>>>> "earlycon".
>>>
>>> Hi Geert,
>>>
>>> Thanks for the detailed description of the problem! As pointed out in
>>> regards to another similar recent patch [1], this is indeed a generic
>>> problem, not limited to the serial console handling.
>>>
>>> At Linaro Connect a few weeks ago I followed up with Saravana from the
>>> earlier discussions at LPC last fall. We now have a generic solution
>>> for genpd drafted on plain paper, based on fw_devlink and the
>>> ->sync_state() callback. I am currently working on the genpd series,
>>> while Saravana will re-spin the series (can't find the link to the
>>> last version) for the clock framework. Ideally, we want these things
>>> to work in a very similar way.
>>>
>>> That said, allow me to post the series for genpd in a week or two to
>>> see if it can solve your problem too, for the serial console.
>>
>> Both the genpd and the clock solutions will make suppliers depend on all
>> their consumers to be probed, right?
>>
>> I think it is a solution, and should be worked on, but it has the
>> drawback that suppliers that have consumers that will possibly never be
>> probed, will also never be able to turn off unused resources.
>>
>> This was specifically the case with the TI ti-sci pmdomain case I was
>> looking at: the genpd driver (ti_sci_pm_domains.c) provides a lot of
>> genpds for totally unrelated devices, and so if, e.g., you don't have or
>> don't want to load a driver for the GPU, all PDs are affected.
>>
>> Even here the solutions you mention will help: instead of things getting
>> broken because genpds get turned off while they are actually in use, the
>> genpds will be kept enabled, thus fixing the breakage. Unfortunately,
>> they'll be kept enabled forever.
>>
>> I've been ill for quite a while so I haven't had the chance to look at
>> this more, but before that I was hacking around a bit with something I
>> named .partial_sync_state(). .sync_state() gets called when all the
>> consumers have probed, but .partial_sync_state() gets called when _a_
>> consumer has been probed.
>>
>> For the .sync_state() things are easy for the driver, as it knows
>> everything related has been probed, but for .partial_sync_state() the
>> driver needs to track resources internally. .partial_sync_state() will
>> tell the driver that a consumer device has probed, the driver can then
>> find out which specific resources (genpds in my case) that consumer
>> refers to, and then... Well, that's how far I got with my hacks =).
>>
>> So, I don't know if this .partial_sync_state() can even work, but I
>> think we do need something more on top of the .sync_state().
>
> Thanks for the update!
>
> You certainly have a point, but rather than implementing some platform
> specific method, I think we should be able enforce the call to
> ->sync_state(), based upon some condition/timeout - and even if all
> consumers haven't been probed.
Hmm, I think that was already implemented in some of the serieses out
there (or even in mainline already?), as I remember doing some
experiments with it. I don't like it much, though.
With a simple timeout, it'll always be just a bit too early for some
user (nfs mount took a bit more time than expected -> board frozen).
The only condition I can see that would somewhat work is a manual
trigger from the userspace. The boot scripts could then signal the
kernel when all the modules have been loaded and probably a suitable,
platform/use case specific amount of time has passed to allow the
drivers to probe.
It just feels a bit too much of a "let's hope this work" approach.
That said, the timeout/condition is probably acceptable for many cases,
where turning off a resource forcefully will just result in, say, a
temporarily blanked display, or something else that gets fixed if and
when the proper driver is probed.
Unfortunately, here with the case I have, the whole board gets halted if
the display subsystem genpd is turned off and the display driver is
loaded after that.
Tomi
Powered by blists - more mailing lists