linux-kernel - Re: for_each_cpu() is buggy for UP kernel?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1805131517170.1582@nanos.tec.linutronix.de>
Date:   Sun, 13 May 2018 15:35:48 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Andrew Morton <akpm@...ux-foundation.org>
cc:     Dexuan Cui <decui@...rosoft.com>, Ingo Molnar <mingo@...nel.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Rakib Mullick <rakib.mullick@...il.com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: for_each_cpu() is buggy for UP kernel?

On Wed, 9 May 2018, Andrew Morton wrote:

> On Wed, 9 May 2018 06:24:16 +0000 Dexuan Cui <decui@...rosoft.com> wrote:
> 
> > In include/linux/cpumask.h, for_each_cpu is defined like this for UP kernel (CONFIG_NR_CPUS=1):
> > 
> > #define for_each_cpu(cpu, mask)                 \
> >         for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
> > 
> > Here 'mask' is ignored, but what if 'mask' contains 0 CPU? -- in this case, the for loop should not
> > run at all, but with the current code, we run the loop once with cpu==0.
> > 
> > I think I'm seeing a bug in my UP kernel that is caused by the buggy for_each_cpu():
> > 
> > in kernel/time/tick-broadcast.c: tick_handle_oneshot_broadcast(), tick_broadcast_oneshot_mask
> > contains 0 CPU, but due to the buggy for_each_cpu(), the variable 'next_event' is changed from
> > its default value KTIME_MAX to "next_event = td->evtdev->next_event"; as a result,
> > tick_handle_oneshot_broadcast () -> tick_broadcast_set_event() -> clockevents_program_event()
> > -> pit_next_event() is programming the PIT timer by accident, causing an interrupt storm of PIT
> > interrupts in some way: I'm seeing that the kernel is receiving ~8000 PIT interrupts per second for
> > 1~5 minutes when the UP kernel boots, and it looks the kernel hangs, but in 1~5 minutes, finally
> > somehow the kernel can recover and boot up fine. But, occasionally, the kernel just hangs there
> > forever, receiving ~8000 PIT timers per second. 
> > 
> > With the below change in kernel/time/tick-broadcast.c, the interrupt storm will go away:
> > 
> > +#undef for_each_cpu
> > +#define for_each_cpu(cpu, mask)                        \
> > +       for ((cpu) = 0; (((cpu) < 1) && ((mask)[0].bits[0] & 1)); (cpu)++, (void)mask)
> > 
> > Should we fix the for_each_cpu() in include/linux/cpumask.h for UP?
> 
> I think so, yes.  That might reveal new peculiarities, but such is life.
> 
> I guess we should use bitmap_empty() rather than open-coding it.

Agreed.

FWIW, this had been discussed before, but there was no real conclusion:

  https://lkml.kernel.org/r/alpine.DEB.2.20.1709161850010.2105@nanos

Thanks,

	tglx