[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.21.1710201210580.10478@linux.fjfi.cvut.cz>
Date: Fri, 20 Oct 2017 16:59:44 +0200 (CEST)
From: David Kozub <zub@...ux.fjfi.cvut.cz>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
cc: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] clockevents/drivers/cs5535: improve resilience to spurious
interrupts
On Fri, 20 Oct 2017, Daniel Lezcano wrote:
> On 20/10/2017 09:49, David Kozub wrote:
>> On Fri, 20 Oct 2017, Daniel Lezcano wrote:
>>
>>> On 20/10/2017 00:25, Thomas Gleixner wrote:
>>>> On Fri, 20 Oct 2017, Daniel Lezcano wrote:
>>>>
>>>>> On 19/10/2017 22:57, David Kozub wrote:
>>>>>> This solves a BUG on ALIX 2c3 where mfgpt_tick is called before
>>>>>> clockevents_config_and_register returns. This caused mfgpt_tick to
>>>>>> call a
>>>>>> null function pointer.
>>>>>>
>>>>>> Thanks to Daniel Lezcano and Thomas Gleixner for helping me analyze
>>>>>> this
>>>>>> and suggesting a solution.
>>>>>>
>>>>>> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
>>>>>> Signed-off-by: David Kozub <zub@...ux.fjfi.cvut.cz>
>>>>>> ---
>>>>>
>>>>> Thank for sending this fix.
>>>>>
>>>>> Can you check if the commit 8f9327cbb is the one introducing
the
>>>>> regression ? So we can add the proper tags and propagate the fix to
>>>>> stable.
>>>>
>>>> No it's not.
>>>>
>>>> - if (cs5535_tick_mode == CLOCK_EVT_MODE_SHUTDOWN)
>>>> + if (clockevent_state_shutdown(&cs5535_clockevent))
>>>>
>>>> This particular problem of the missing detached state check has been
>>>> there
>>>> forever and went unnoticed for whatever reason.
>>>
>>> The detached condition was artificially caught by the initialized
>>> variable:
>>>
>>> -static unsigned int cs5535_tick_mode = CLOCK_EVT_MODE_SHUTDOWN;
>>>
>>> The patch 8f9327cbb removes the variable, so very likely this is where
>>> the problem appeared.
>>
>> I will try to test that. But I won't have access to the device till
>> Sunday evening. I've had big trouble trying to run kernels > 4.1-rc5 on
>> the device and if I'm looking correctly the commit was introduced in
>> 4.3-rc1. But I'll try to figure something out.
>
> David,
>
> thanks again for taking the time to report and investigate this issue.
> Usually people just give up and drop the legacy hardware without letting
> us know the kernel is broken with it. So don't spend too much time with
> it, just check if the commit before works, if not, just add in the log
> the kernel version you noticed the breakage.
>
> In case you are interested in doing some debugging to narrow down the
> offending commit and you know the versions working and not working, you
> can try the command git-bisect. It will use a dichotomy approach to find
> out the culprit.
Hi Daniel,
I originally didn't see this issue simply because 4.1-rc7 would not output
anything on the serial on the device. For kernels >= 4.1-rc7 the kernel
either reboots immediatelly or freezes, in both cases without giving any
output in the serial. I tried to find the cause of this with git bisect
but in the end I was none the wiser. I also suspected my build
environment.
Eventually I found out the following:
* 4.1-rc6 boots OK
* 4.1-rc7 restarts immediatelly, but if I revert f18c34e48 it works but I
have no idea why would this commit break anything, the commit looks OK
* 4.1 - works if I revert f18c34e48
* e75c73ad6 feeezes even after reverting f18c34e48
* 4.2-rc1 - reboots, even after reverting f18c34e48
Only recently I tried a current kernel and I was more lucky: it gave me
some useful output on the serial.
To verify 8f9327cbb I was thinking I could take it and apply it onto
4.1-rc6 (if that was possible), but as Thomas suggested it might not be
worth it, then I forget about that.
Thanks for all the help to you and to Thomas.
Best regards,
David
Powered by blists - more mailing lists