[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3274bdec-e9a3-4c2d-ba8e-58caa033d451@ti.com>
Date: Tue, 20 Aug 2024 20:45:57 +0530
From: Beleswar Prasad Padhi <b-padhi@...com>
To: Jan Kiszka <jan.kiszka@...mens.com>,
Bjorn Andersson
<andersson@...nel.org>,
Mathieu Poirier <mathieu.poirier@...aro.org>,
<linux-remoteproc@...r.kernel.org>
CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Apurva Nandan
<a-nandan@...com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
Nishanth Menon <nm@...com>
Subject: Re: [PATCH] remoteproc: k3-r5: Fix driver shutdown
On 20-08-2024 20:29, Beleswar Prasad Padhi wrote:
>
> On 20-08-2024 19:50, Jan Kiszka wrote:
>> On 20.08.24 11:48, Beleswar Prasad Padhi wrote:
>>> On 20-08-2024 15:09, Jan Kiszka wrote:
>>>> On 20.08.24 11:30, Beleswar Prasad Padhi wrote:
>>>>> Hi Jan,
>>>>>
>>>>> On 19-08-2024 22:17, Jan Kiszka wrote:
>>>>>> From: Jan Kiszka <jan.kiszka@...mens.com>
>>>>>>
>>>>>> When k3_r5_cluster_rproc_exit is run, core 1 is shutdown and removed
>>>>>> first. When core 0 should then be stopped before its removal, it
>>>>>> will
>>>>>> find core1->rproc as NULL already and crashes. Happens on rmmod e.g.
>>>>> Did you check this on top of -next-20240820 tag? There was a
>>>>> series[0]
>>>>> which was merged recently which fixed this condition. I don't see
>>>>> this
>>>>> issue when trying on top of -next-20240820 tag.
>>>>> [0]:
>>>>> https://lore.kernel.org/all/20240808074127.2688131-1-b-padhi@ti.com/
>>>>>
>>>> I didn't try those yet, I was on 6.11-rcX. But from reading them
>>>> quickly, I'm not seeing the two issues I found directly addressed
>>>> there.
>>> Check the comment by Andrew Davis[0], that addresses the above issue.
>>>
>>> [0]:
>>> https://lore.kernel.org/all/0bba5293-a55d-4f13-887c-272a54d6e1ca@ti.com/
>>>
>>>
>> OK, then someone still needs to update his patch accordingly.
> That comment was addressed in the v4 series revision[1] and was merged
> to linux-next, available with tag -next-20240820. Request you to
> please check if the issue persists with -next-20240820 tag. I checked
> myself, and was not able to reproduce.
> [1]: https://lore.kernel.org/all/Zr9nbWnADDB+ZOlg@p14s/
>>
>>>>>> Fixes: 3c8a9066d584 ("remoteproc: k3-r5: Do not allow core1 to power
>>>>>> up before core0 via sysfs")
>>>>>> CC: stable@...r.kernel.org
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@...mens.com>
>>>>>> ---
>>>>>>
>>>>>> There might be one more because I can still make this driver crash
>>>>>> after an operator error. Were error scenarios tested at all?
>>>>> Can you point out what is this issue more specifically, and I can
>>>>> take
>>>>> this up then.
>>>> Try starting core1 before core0, and then again - system will hang or
>>> If you are trying to stop and then start the cores from sysfs, that is
>>> not yet supported. The hang is thus expected.
>> What? Then the driver is broken, even more. Why wasn't it fully
>> implemented?
Just wanted to point out that this "graceful shutdown" feature is
majorly dependent on the Device Manager Firmware(point 3) and minimal
changes to the remoteproc driver (point 2 and 4). Thus, as soon as
Firmware is capable, we will send out the patches for this feature.
> The driver is capable of starting a core and stopping it all well. The
> problem is, when we stop a core from sysfs (without resetting the SoC
> itself), the remotecore is powered off, but its resources are not
> relinquished. So when we start it back, there could be some memory
> corruptions. This feature of "graceful shutdown" of remotecores is
> almost implemented and will be posted to this driver soon. Request you
> to try out after that.
>
> With graceful shutdown feature, this will be the flow:
> 1. We issue a core stop operation from sysfs.
> 2. The remoteproc driver sends a special "SHUTDOWN" mailbox message to
> the remotecore.
> 3. The remotecore relinquishes all of its acquired resources through
> Device Manager Firmware and sends an ACK back.
> 4. The remotecore enters WFI state and then is resetted through Host
> core.
> 5. Then, if we try to do the core start operation from sysfs, core
> should be up as expected.
>
> Thanks,
> Beleswar
>>
>> Jan
>>
Powered by blists - more mailing lists