lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 10 Oct 2017 15:03:46 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     David Kozub <zub@...ux.fjfi.cvut.cz>
Cc:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX 2c3
 - null call

On 09/10/2017 21:33, David Kozub wrote:
> On Mon, 9 Oct 2017, Daniel Lezcano wrote:
> 
>> On 07/10/2017 23:26, David Kozub wrote:
>>> Hi all,
>>>
>>> booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
>>> (http://pcengines.ch/alix2c3.htm) dies with:
>>>
>>> [    2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
>>> registered
>>> [    2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
>>> [    2.355676] ledtrig-cpu: registered to indicate activity on CPUs
>>> [    2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
>>> [    2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
>>> event, using IRQ 7
>>> [    2.412687] BUG: unable to handle kernel NULL pointer dereference
>>> at   (null)
>>> [    2.412698] IP:   (null)
>>> [    2.412702] *pde = 00000000
>>> [    2.412713] Oops: 0000 [#1]
>>> [    2.412716] Modules linked in:
>>> [    2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
>>> 4.14.0-rc3-humel-test17 #36
>>> [    2.412739] task: c0010000 task.stack: c008e000
>>> [    2.412744] EIP:   (null)
>>> [    2.412749] EFLAGS: 00210093 CPU: 0
>>> [    2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX: 00000000 EDX: 0000620c
>>> [    2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
>>> [    2.412780]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
>>> [    2.412790] CR0: 80050033 CR2: 00000000 CR3: 0064f000 CR4: 00000090
>>> [    2.412793] Call Trace:
>>> [    2.412800]  <IRQ>
>>> [    2.412825]  ? mfgpt_tick+0x5d/0x81
>>> [    2.412845]  __handle_irq_event_percpu+0x56/0xb6
>>> [    2.412864]  ? handle_fasteoi_irq+0xf3/0xf3
>>> [    2.412878]  handle_irq_event_percpu+0x17/0x3f
>>> [    2.412892]  handle_irq_event+0x1d/0x29
>>> [    2.412905]  handle_level_irq+0x57/0xc6
>>> [    2.412924]  handle_irq+0x47/0x52
>>> [    2.412929]  </IRQ>
>>> [    2.412934]  <SOFTIRQ>
>>> [    2.412949]  do_IRQ+0x32/0x9b
>>> [    2.412963]  ? __irqentry_text_end+0x7/0x7
>>> [    2.412976]  common_interrupt+0x2e/0x34
>>> ...
>>>
>>> The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
>>> mfgpt_tick() on line 132:
>>> ...
>>>     cs5535_clockevent.event_handler(&cs5535_clockevent);
>>> ...
>>> cs5535_clockevent.event_handler is null.
>>>
>>> Adding some more traces I see mfgpt_tick() gets called before
>>> clockevents_config_and_register() finishes (invoked from
>>> cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
>>> event_handler, it's NULL. Wrapping the event_handler call on line 132 in
>>> a null pointer check results in a working system.
>>>
>>> The issue is present at least also in kernel 4.13.5. In kernel versions
>>> <= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
>>> booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
>>> 4.13.5 recently), so I don't know when exactly this issue appeared.
>>>
>>> My guess is that the timer interrupt is enabled too early. If I change
>>> the order of clockevents_config_and_register() and
>>> cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
>>> cs5535_mfgpt_init() like this:
>>> ...
>>>     /* Set up the clock event */
>>>     printk(KERN_INFO DRV_NAME
>>>         ": Registering MFGPT timer as a clock event, using IRQ %d\n",
>>>         timer_irq);
>>>     clockevents_config_and_register(&cs5535_clockevent, MFGPT_HZ,
>>>                     0xF, 0xFFFE);
>>>
>>>     /* Set the clock scale and enable the event mode for CMP2 */
>>>     val = MFGPT_SCALE | (3 << 8);
>>>     cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
>>> ...
>>> the system boots and seems to be OK.
>>
>> What happens if instead of inverting those two lines, you add
>> mfgpt_shutdown() early in the init function ?
> 
> Hi Daniel,
> 
> I can't call mfgpt_shutdown() before clockevents_config_and_register()
> as mfgpt_shutdown() wants a struct clock_event_device* which is only
> available after clockevents_config_and_register(). But I tried calling
> disable_timer():
> 
> @@ -152,6 +152,8 @@ static int __init cs5535_mfgpt_init(void)
>         }
>         cs5535_event_clock = timer;
> 
> +       disable_timer(cs5535_event_clock);
> +
>         /* Set up the IRQ on the MFGPT side */
>         if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, &timer_irq)) {
>                 printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",
> 
> which should do the same thing. Unfortunately I still get the same BUG.
> 
> I added some more traces and I think the IRQ comes just after:
>     ret = setup_irq(timer_irq, &mfgptirq);

Mmh, the interrupt flag tells it is a shared interrupt, bad :/

The interrupt handler checks if the interrupt was for the timer.

So here, there are two options:

 - the timer is in an inconsistent state, with an interrupt line up and
enabled. In this case, it should be reseted correctly before going
further in the init function (disable, ack irq if any, and then request
irq and register).

 .. or ..

 - the check at the beginning of the interrupt handler:
        /* See if the interrupt was for us */
        if (!(val & (MFGPT_SETUP_SETUP | MFGPT_SETUP_CMP2 |
		MFGPT_SETUP_CMP1)))
                return IRQ_NONE;

   does not work and we try to handle an interrupt which is not for us.


That could be any of these two (or both) and I suspect the commit
8f9327cbb brings to light this problem as before the interrupt handler
had the check:
	if (cs5535_tick_mode == CLOCK_EVT_MODE_SHUTDOWN)
		return IRQ_HANDLED;

with:

static unsigned int cs5535_tick_mode = CLOCK_EVT_MODE_SHUTDOWN;







-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ