linux-kernel - Re: [PATCH] cpufreq: fix another race between PPC notification and vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKohpon-2rO7Rkr9juNXO+cOswoWmqiKHnnTr3aprW2y_Noh_w@mail.gmail.com>
Date:	Thu, 29 Jan 2015 14:08:38 +0530
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	Ethan Zhao <ethan.zhao@...cle.com>,
	santosh shilimkar <santosh.shilimkar@...cle.com>
Cc:	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ethan Zhao <ethan.kernel@...il.com>
Subject: Re: [PATCH] cpufreq: fix another race between PPC notification and vcpu_hotplug()

Looks like you just save my time here, Santosh has also reported a
similar race in a personal mail..

On 29 January 2015 at 12:12, Ethan Zhao <ethan.zhao@...cle.com> wrote:
> There is race observed between PPC changed notification handler worker thread
> and vcpu_hotplug() called within xenbus_thread() context.
> It is shown as following WARNING:
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>  kobject_get+0x41/0x50()
>  Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>  lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>  ...
>  [   14.003548] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted
>  ...
>  [   14.003553] Workqueue: kacpi_notify acpi_os_execute_deferred
>  [   14.003554]  0000000000000000 000000008c76682c ffff88094c793af8
>  ffffffff81661b14
>  [   14.003556]  0000000000000000 0000000000000000 ffff88094c793b38
>  ffffffff81072b61
>  [   14.003558]  ffff88094c793bd8 ffff8812491f8800 0000000000000292
>  0000000000000000
>  [   14.003560] Call Trace:
>  [   14.003567]  [<ffffffff81661b14>] dump_stack+0x46/0x58
>  [   14.003571]  [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>  [   14.003572]  [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>  [   14.003574]  [<ffffffff812e16d1>] kobject_get+0x41/0x50
>  [   14.003579]  [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>  [   14.003581]  [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>  [   14.003586]  [<ffffffff810b8cb2>] ? up+0x32/0x50
>  [   14.003589]  [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>  [   14.003591]  [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>  [   14.003593]  [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>  [   14.003596]  [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>  [   14.003601]  [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>  [   14.003604]  [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>  [   14.003606]  [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>  [   14.003609]  [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>  [   14.003611]  [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>  [   14.003614]  [<ffffffff8108c910>] process_one_work+0x160/0x410
>  [   14.003616]  [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>  [   14.003617]  [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>  [   14.003621]  [<ffffffff81092421>] kthread+0xe1/0x100
>  [   14.003623]  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>  [   14.003628]  [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>  [   14.003630]  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>  [   14.003631] ---[ end trace 89e66eb9795efdf7 ]---
>
>  Thread A: Workqueue: kacpi_notify
>
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()
>              kobject_get()
>
>  Thread B: xenbus_thread()
>
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_DOWN_PREPARE..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_prepare()
>                  update_policy_cpu()
>                    kobject_move()

Where is the race ? How do you say this is racy ?

I am not sure if the problem is with kobject_move(), to me it looked like
the problem is with cpufreq_policy_put_kobj() and we tried to do kobject_get()
after the kobject has been freed..

I don't agree to the solution you gave, but lets first make sure what the
problem is, and then take any action against it.

Please try this patch and let us know if it fixes it for you:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..5ced9cca4822 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1411,6 +1411,7 @@ static int __cpufreq_remove_dev_finish(struct device *dev,

        read_lock_irqsave(&cpufreq_driver_lock, flags);
        policy = per_cpu(cpufreq_cpu_data, cpu);
+       per_cpu(cpufreq_cpu_data, cpu) = NULL;
        read_unlock_irqrestore(&cpufreq_driver_lock, flags);

        if (!policy) {
@@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
                }
        }

-       per_cpu(cpufreq_cpu_data, cpu) = NULL;
        return 0;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/