lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2035ee3-8de6-465e-9b51-6b131313e822@linaro.org>
Date:   Fri, 20 Oct 2017 11:35:40 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     David Kozub <zub@...ux.fjfi.cvut.cz>
Cc:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] clockevents/drivers/cs5535: improve resilience to
 spurious interrupts

On 20/10/2017 09:49, David Kozub wrote:
> On Fri, 20 Oct 2017, Daniel Lezcano wrote:
> 
>> On 20/10/2017 00:25, Thomas Gleixner wrote:
>>> On Fri, 20 Oct 2017, Daniel Lezcano wrote:
>>>
>>>> On 19/10/2017 22:57, David Kozub wrote:
>>>>> This solves a BUG on ALIX 2c3 where mfgpt_tick is called before
>>>>> clockevents_config_and_register returns. This caused mfgpt_tick to
>>>>> call a
>>>>> null function pointer.
>>>>>
>>>>> Thanks to Daniel Lezcano and Thomas Gleixner for helping me analyze
>>>>> this
>>>>> and suggesting a solution.
>>>>>
>>>>> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
>>>>> Signed-off-by: David Kozub <zub@...ux.fjfi.cvut.cz>
>>>>> ---
>>>>
>>>> Thank for sending this fix.
>>>>
>>>> Can you check if the commit 8f9327cbb is the one introducing the
>>>> regression ? So we can add the proper tags and propagate the fix to
>>>> stable.
>>>
>>> No it's not.
>>>
>>> -       if (cs5535_tick_mode == CLOCK_EVT_MODE_SHUTDOWN)
>>> +       if (clockevent_state_shutdown(&cs5535_clockevent))
>>>
>>> This particular problem of the missing detached state check has been
>>> there
>>> forever and went unnoticed for whatever reason.
>>
>> The detached condition was artificially caught by the initialized
>> variable:
>>
>> -static unsigned int cs5535_tick_mode = CLOCK_EVT_MODE_SHUTDOWN;
>>
>> The patch 8f9327cbb removes the variable, so very likely this is where
>> the problem appeared.
> 
> I will try to test that. But I won't have access to the device till
> Sunday evening. I've had big trouble trying to run kernels > 4.1-rc5 on
> the device and if I'm looking correctly the commit was introduced in
> 4.3-rc1. But I'll try to figure something out.

David,

thanks again for taking the time to report and investigate this issue.
Usually people just give up and drop the legacy hardware without letting
us know the kernel is broken with it. So don't spend too much time with
it, just check if the commit before works, if not, just add in the log
the kernel version you noticed the breakage.

In case you are interested in doing some debugging to narrow down the
offending commit and you know the versions working and not working, you
can try the command git-bisect. It will use a dichotomy approach to find
out the culprit.

For example:
Let's say you have v4.1-rc5 working and v4.1-rc6 not working and in
between there are 1024 changes. The dichotomy approach will find out the
patch introducing the regression in ln(1024)/ln(2) = 10 iterations.

It's usage is:

git bisect start v4.1-rc6 v4.1-rc5 (bad goes always first)
make && test (the test is ok)
git bisect good
make && test (the test is ok)
git bisect good
make && test (the test fails)
git bisect bad
...

...

And it ends up to the bad commit (assuming there are no compilation
broken patches in the process).

Once it is finished, use git bisect reset

  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ