[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <801e569c-4f86-4bb5-a255-b861b86cb773@oracle.com>
Date: Tue, 24 Dec 2024 01:20:48 +1100
From: imran.f.khan@...cle.com
To: Hillf Danton <hdanton@...a.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Tejun Heo <tj@...nel.org>,
john.stultz@...aro.org, sboyd@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: Query about timer wheel API
Hello Hillf,
On 23/12/2024 11:51 pm, Hillf Danton wrote:
> On Mon, 23 Dec 2024 11:14:21 +1100 imran.f.khan@...cle.com
>>
>> Recently we have come across some bugs in the RDS code, where a delayed
>> work was being queued on an offlined CPU and as a result of that the
>
> Such a queue could not happen given irq disabled in queue_delayed_work_on().
> Did you see it upstream?
>
You mean upstream RDS or upstream workqueue ? For RDS I need to check, but with
upstream v6.6 kernel, I was able to submit a delayed work to an offlined CPU.
The delayed work would never happen and I can see corresponding timer in timer
list of offlined CPU (using crash).
Once the CPU is brought back online, depending on the workload the work handler
gets executed.
I used following test module:
===============
#include <linux/module.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/workqueue.h>
#include <linux/completion.h>
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/jiffies.h>
#define TIMEOUT 1 /* test timeout in secs */
#define NUM_WORK_ITEMS 1 /* number of work items to submit */
static DEFINE_MUTEX(mutex);
static DEFINE_MUTEX(dwork_func_mutex);
static void delayed_work_func(struct work_struct *data)
{
int cpu;
mutex_lock(&dwork_func_mutex);
cpu = get_cpu();
pr_err("%s invoked for work: 0x%px on cpu#%d \n", __func__, data, cpu);
put_cpu();
mutex_unlock(&dwork_func_mutex);
}
static int param_set_queue_work_on_cpu(const char *val, const struct kernel_param *kp)
{
int cpu, this_cpu, i;
struct delayed_work *dwork = NULL;
if (!mutex_trylock(&mutex))
return -EBUSY;
cpu = simple_strtoul(val, NULL, 0);
/*if (!cpu_present(cpu))
return -EINVAL;*/
for (i = 0; i < NUM_WORK_ITEMS; i++) {
dwork = kzalloc(sizeof(struct delayed_work), GFP_KERNEL);
if(dwork) {
this_cpu = get_cpu();
INIT_DELAYED_WORK(dwork, delayed_work_func);
queue_delayed_work_on(cpu, system_wq, dwork, msecs_to_jiffies(10000));
pr_err("Submitted dwork 0x%px on %s cpu#%d \n", dwork, cpu_online(cpu)?"online":"offline", cpu);
put_cpu();
}
}
mutex_unlock(&mutex);
return 0;
}
module_param_call(queue_work_on_cpu, param_set_queue_work_on_cpu, NULL, NULL, 0600);
static int __init workqueue_study_init(void)
{
pr_err("module_init \n");
return 0;
}
static void workqueue_study_exit(void)
{
pr_err("module_exit \n");
}
MODULE_AUTHOR("Imran Khan <imran.eie.85@...il.com>");
MODULE_DESCRIPTION("Workqueue study");
MODULE_LICENSE("GPL");
module_init(workqueue_study_init);
module_exit(workqueue_study_exit);
===========
This module gives an interface at:
/sys/module/<module name>/params/queue_work_on_cpu
Writing X there would submit a delayed_work (delay 10 secs)
to CPU X.
We can see if CPU X is online, submitted work gets executed
after around 10 secs. But if CPU X is offline, the submitted
work handler does not get fired unless the CPU has been brought
back online.
Thanks,
Imran
>> underlying timer was not firing, which in turn meant that the work was
>> never able to make it to the intended worker_pool.
Powered by blists - more mailing lists