lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Jul 2018 16:44:33 -0700
From:   Kevin Hilman <khilman@...libre.com>
To:     Sudeep Holla <sudeep.holla@....com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>, fweisbec@...il.com,
        Arnd Bergmann <arnd@...db.de>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>
Subject: Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device

Hi Sudeep,

On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@....com> wrote:
>
> Checking the equality of cpumask for both new and old tick device doesn't
> ensure that it's CPU local device. This will cause issue if a low rating
> clockevent tick device is registered first followed by the registration
> of higher rating clockevent tick device.
>
> In such case, clockevents_released list will never get emptied as both
> the devices get selected as preferred one and we will loop forever in
> clockevents_notify_released.
>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Sudeep Holla <sudeep.holla@....com>

I've got a arm32 board (meson8b-odroidc1) that's been failing in
kernelCI.org since the merge window (boot log[1]), and I finally got
around to bisecting it[2].  Unfortunately, the bisect pointed at a
merge commit, but with some trial and error (and a suggestion by Arnd)
I was able to test that revering $SUBJECT commit[3], my problem goes
away.

Another interesting data point is that disabling SMP (either by
"nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
away, without needing to revert this patch.

AFAICT, this platform, is using a single timer as a clocksource
("amlogic,meson6-timer") which is not a per-CPU timer.

I ran out of time to keep digging on this issue, and I'm still not
sure exactly what's going on, but I wanted to report it in case anyone
else has any ideas, and so we can hopefully get it fixed during the
-rc cycle.

Kevin

[1] https://storage.kernelci.org/mainline/master/v4.18-rc2-357-gd3bc0e67f852/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-meson8b-odroidc1.html
[2] http://termbin.com/mk07
[3] in mainline as: 1332a9055801 tick: Prefer a lower rating device
only if it's CPU local device

> ---
>  kernel/time/tick-common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Hi Thomas,
>
> I am seeing this issue on my Juno devboard, where system wide timers
> with rating 300 and 400 are registered in same order and we get stuck in
> a loop in clockevents_notify_released. Let me know if this looks sane or
> you have any suggestions that I can try out.
>
> Regards,
> Sudeep
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 49edc1c4f3e6..78e598334007 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -277,7 +277,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
>          */
>         return !curdev ||
>                 newdev->rating > curdev->rating ||
> -              !cpumask_equal(curdev->cpumask, newdev->cpumask);
> +              (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
> +               !tick_check_percpu(curdev, newdev, smp_processor_id()));
>  }
>
>  /*
> --
> 2.7.4
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ