linux-kernel - Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 16 Oct 2017 22:41:26 +0200 (CEST)
From:   David Kozub <zub@...ux.fjfi.cvut.cz>
To:     Thomas Gleixner <tglx@...utronix.de>
cc:     Daniel Lezcano <daniel.lezcano@...aro.org>,
        linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX
 2c3 - null call

On Mon, 16 Oct 2017, Thomas Gleixner wrote:

> On Thu, 12 Oct 2017, David Kozub wrote:
>> On Thu, 12 Oct 2017, Thomas Gleixner wrote:
>>> On Thu, 12 Oct 2017, Daniel Lezcano wrote:
>>> The real question is why
>>>
>>>        /* Set the clock scale and enable the event mode for CMP2 */
>>>        val = MFGPT_SCALE | (3 << 8);
>>>
>>>        cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
>>>
>>> is in the setup code at all.
>>>
>>> The obvious place for this is in mfgpt_set_periodic() which gets called
>>> when the clock event and the handler is set up in the core code. Up to that
>>> point the interrupt handler is protected against shared interrupts via the
>>> is_shutdown() check.
>>
>> It is, but only after clockevents_config_and_register. But mfgpt_tick is
>> called before that (just after setup_irq).
>
> Right, but the protection against the spurious interrupt there is not
> sufficient. See patch below.
>
> Thanks,
>
> 	tglx
>
> 8<-------------------
>
> --- a/drivers/clocksource/cs5535-clockevt.c
> +++ b/drivers/clocksource/cs5535-clockevt.c
> @@ -117,7 +117,8 @@ static irqreturn_t mfgpt_tick(int irq, void *dev_id)
> 	/* Turn off the clock (and clear the event) */
> 	disable_timer(cs5535_event_clock);
>
> -	if (clockevent_state_shutdown(&cs5535_clockevent))
> +	if (clockevent_state_detached(&cs5535_clockevent) ||
> +	    clockevent_state_shutdown(&cs5535_clockevent))
> 		return IRQ_HANDLED;
>
> 	/* Clear the counter */

Hi Thomas,

Thanks for the patch. I can confirm that this resolves the issue.

I still think the fact that the handler is triggered at all is a bit 
strange but maybe ensuring the handler doesn't do anything bad is 
sufficient to declare this issue fixed. After all I'm probbly the last 
person in the known universe using this driver. :)

Would it not be nicer to also explicitly set cs5535_clockevent's 
state_use_accessors to CLOCK_EVT_STATE_DETACHED (in the initialization of 
cs5535_clockevent)? Maybe that field is not to be touched directly from 
outside, on the other hand in the current code the check in the handler 
relies on CLOCK_EVT_STATE_DETACHED being 0 (and so cs5535_clockevent's 
state_use_accessors being CLOCK_EVT_STATE_DETACHED by default).

I was also wondering how is it ensured that the handler can never see an 
inconsistent state of state_use_accessors but after going through the 
code I think the key is clockevents_register_device's 
raw_spin_lock_irqsave(&clockevents_lock, flags); (even though the set to 
DETACHED happens outside, but we're already starting in DETACHED state). I 
also wonder if this would still be safe if there were multiple CPUs - but 
this is not happening on the ALIX as it has just one CPU.

Best regards,
David