lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.21.1710092047250.22810@linux.fjfi.cvut.cz>
Date:   Mon, 9 Oct 2017 21:33:37 +0200 (CEST)
From:   David Kozub <zub@...ux.fjfi.cvut.cz>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
cc:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: Kernel BUG in mfgpt_tick (cs5535-clockevt.c) on ALIX
 2c3 - null call

On Mon, 9 Oct 2017, Daniel Lezcano wrote:

> On 07/10/2017 23:26, David Kozub wrote:
>> Hi all,
>>
>> booting up kernel 4.14-rc3 with CS5535_CLOCK_EVENT_SRC on an ALIX 2c3
>> (http://pcengines.ch/alix2c3.htm) dies with:
>>
>> [    2.313086] cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0'
>> registered
>> [    2.338711] cs5535-mfgpt cs5535-mfgpt: registered timer 0
>> [    2.355676] ledtrig-cpu: registered to indicate activity on CPUs
>> [    2.373745] cs5535-mfgpt cs5535-mfgpt: registered timer 1
>> [    2.389976] cs5535-clockevt: Registering MFGPT timer as a clock
>> event, using IRQ 7
>> [    2.412687] BUG: unable to handle kernel NULL pointer dereference
>> at   (null)
>> [    2.412698] IP:   (null)
>> [    2.412702] *pde = 00000000
>> [    2.412713] Oops: 0000 [#1]
>> [    2.412716] Modules linked in:
>> [    2.412732] CPU: 0 PID: 1 Comm: swapper Not tainted
>> 4.14.0-rc3-humel-test17 #36
>> [    2.412739] task: c0010000 task.stack: c008e000
>> [    2.412744] EIP:   (null)
>> [    2.412749] EFLAGS: 00210093 CPU: 0
>> [    2.412758] EAX: c05fb8c0 EBX: c05fb880 ECX: 00000000 EDX: 0000620c
>> [    2.412769] ESI: c0009fd4 EDI: c014e14e EBP: c0009fac ESP: c0009fa8
>> [    2.412780]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
>> [    2.412790] CR0: 80050033 CR2: 00000000 CR3: 0064f000 CR4: 00000090
>> [    2.412793] Call Trace:
>> [    2.412800]  <IRQ>
>> [    2.412825]  ? mfgpt_tick+0x5d/0x81
>> [    2.412845]  __handle_irq_event_percpu+0x56/0xb6
>> [    2.412864]  ? handle_fasteoi_irq+0xf3/0xf3
>> [    2.412878]  handle_irq_event_percpu+0x17/0x3f
>> [    2.412892]  handle_irq_event+0x1d/0x29
>> [    2.412905]  handle_level_irq+0x57/0xc6
>> [    2.412924]  handle_irq+0x47/0x52
>> [    2.412929]  </IRQ>
>> [    2.412934]  <SOFTIRQ>
>> [    2.412949]  do_IRQ+0x32/0x9b
>> [    2.412963]  ? __irqentry_text_end+0x7/0x7
>> [    2.412976]  common_interrupt+0x2e/0x34
>> ...
>>
>> The problem seems to be in drivers/clocksource/cs5535-clockevt.c in
>> mfgpt_tick() on line 132:
>> ...
>>     cs5535_clockevent.event_handler(&cs5535_clockevent);
>> ...
>> cs5535_clockevent.event_handler is null.
>>
>> Adding some more traces I see mfgpt_tick() gets called before
>> clockevents_config_and_register() finishes (invoked from
>> cs5535_mfgpt_init() on line 178). So when mfgpt_tick() accessess the
>> event_handler, it's NULL. Wrapping the event_handler call on line 132 in
>> a null pointer check results in a working system.
>>
>> The issue is present at least also in kernel 4.13.5. In kernel versions
>> <= 4.1-rc6 the cs5535_clockevent worked OK. Kernels >= 4.1-rc7 never
>> booted at all on my ALIX 2c3 (last I tried 4.5, then gave up and tried
>> 4.13.5 recently), so I don't know when exactly this issue appeared.
>>
>> My guess is that the timer interrupt is enabled too early. If I change
>> the order of clockevents_config_and_register() and
>> cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val) in
>> cs5535_mfgpt_init() like this:
>> ...
>>     /* Set up the clock event */
>>     printk(KERN_INFO DRV_NAME
>>         ": Registering MFGPT timer as a clock event, using IRQ %d\n",
>>         timer_irq);
>>     clockevents_config_and_register(&cs5535_clockevent, MFGPT_HZ,
>>                     0xF, 0xFFFE);
>>
>>     /* Set the clock scale and enable the event mode for CMP2 */
>>     val = MFGPT_SCALE | (3 << 8);
>>     cs5535_mfgpt_write(cs5535_event_clock, MFGPT_REG_SETUP, val);
>> ...
>> the system boots and seems to be OK.
>
> What happens if instead of inverting those two lines, you add
> mfgpt_shutdown() early in the init function ?

Hi Daniel,

I can't call mfgpt_shutdown() before clockevents_config_and_register() as 
mfgpt_shutdown() wants a struct clock_event_device* which is only 
available after clockevents_config_and_register(). But I tried calling 
disable_timer():

@@ -152,6 +152,8 @@ static int __init cs5535_mfgpt_init(void)
         }
         cs5535_event_clock = timer;

+       disable_timer(cs5535_event_clock);
+
         /* Set up the IRQ on the MFGPT side */
         if (cs5535_mfgpt_setup_irq(timer, MFGPT_CMP2, &timer_irq)) {
                 printk(KERN_ERR DRV_NAME ": Could not set up IRQ %d\n",

which should do the same thing. Unfortunately I still get the same BUG.

I added some more traces and I think the IRQ comes just after:
 	ret = setup_irq(timer_irq, &mfgptirq);

(And my reordering of cs5535_mfgpt_write and 
clockevents_config_and_register was just lucky, because the confing and 
register occured sooner than the IRQ.)

Attached is a patch that shows where I added the traces. It produces the 
following output (I also added a bunch of sleeps in between to be sure 
which step triggers the mfgpt_tick call):

[    3.229612] cs5535-clockevt: enabling verbose handler
[    3.234699] cs5535-clockevt: sleeping(1)...
[    4.320051] cs5535-clockevt: done(1)
[    4.323641] cs5535-clockevt: calling cs5535_mfgpt_alloc_timer
[    4.329402] cs5535-mfgpt cs5535-mfgpt: registered timer 1
[    4.334835] cs5535-clockevt: sleeping(2)...
[    5.360051] cs5535-clockevt: done(2)
[    5.363632] cs5535-clockevt: calling disable_timer
[    5.368425] cs5535-clockevt: sleeping(3)...
[    6.400048] cs5535-clockevt: done(3)
[    6.403631] cs5535-clockevt: calling cs5535_mfgpt_setup_irq
[    6.409214] cs5535-clockevt: sleeping(4)...
[    7.440044] cs5535-clockevt: done(4)
[    7.443623] cs5535-clockevt: calling setup_irq
[    7.448088] cs5535-clockevt: mfgpt_tick called
[    7.452579] cs5535-clockevt: sleeping(5)...
[    8.480049] cs5535-clockevt: done(5)
[    8.483634] cs5535-clockevt: doing write
[    8.487559] cs5535-clockevt: sleeping(6)...
[    9.520051] cs5535-clockevt: done(6)
[    9.523635] cs5535-clockevt: Registering MFGPT timer as a clock event, 
using IRQ 7
[    9.531263] cs5535-clockevt: clockevents_config_and_register returned
[    9.537706] cs5535-clockevt: disabling verbose handler

(I observed the boot 3 times and the sequence is always the same.)

Best regards,
David
View attachment "cs5535-traces-hack.patch" of type "text/plain" (3762 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ