[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1409281232040.31609@jlaw-desktop.mno.stratus.com>
Date: Sun, 28 Sep 2014 12:39:34 -0400
From: Joe Lawrence <joe.lawrence@...atus.com>
To: Tejun Heo <tj@...nel.org>
CC: Joe Lawrence <joe.lawrence@...atus.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: schedule_delayed_work with a 0 jiffy delay
On Sun, 28 Sep 2014, Tejun Heo wrote:
> Hello, Joe.
>
> On Fri, Sep 26, 2014 at 10:52:27AM -0400, Joe Lawrence wrote:
> ...
> > I was wondering if such behavior was expected on !PREEMPT kernels,
> > especially after b22ce2785d97 "workqueue: cond_resched() after
> > processing each work item". In the ftraces I've observed from the RCU
> > stall, cond_resched() never seems to schedule. Under what conditions
> > would it be expected to do so?
>
> Hmmm... it should yield if there's a higher priority task scheduled.
> Maybe the workqueue in question is a highpri one?
I don't believe any high priority tasks or workqueues are involved here.
Consider a synthetic repro module like:
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/workqueue.h>
struct delayed_work dwork;
struct delayed_work dwork2;
static void warn_thread(struct work_struct *work)
{
schedule_delayed_work_on(4, &dwork, 0);
}
static void warn_thread2(struct work_struct *work)
{
schedule_delayed_work_on(4, &dwork2, 0);
}
int __init init_module(void)
{
INIT_DELAYED_WORK(&dwork, warn_thread);
schedule_delayed_work_on(4, &dwork, 1000);
INIT_DELAYED_WORK(&dwork2, warn_thread2);
schedule_delayed_work_on(4, &dwork2, 1000);
pr_info("%s loaded.\n", THIS_MODULE->name);
return 0;
}
void cleanup_module(void)
{
cancel_delayed_work_sync(&dwork);
cancel_delayed_work_sync(&dwork2);
pr_info("%s unloaded.\n", THIS_MODULE->name);
}
When I load this, "kworker/4:1" goes to 100% CPU and the RCU grace
periods reported by rcusched freeze on CPU4 for minutes at a time.
> > Changing the worker function to reschedule with at least 1 jiffy avoids
> > this problem -- I was just curious if a fix at that end was appropriate.
>
> The fact that 0 delay means immediate execution is depended upon in
> multiple places and changing it will break other things. The
> workqueue user itself sounds buggy to me. Can't that be fixed?
Certainly modifying the workqueue user should be sufficient here. I was
only wondering if the cond_resched in process_one_work was expected to
give the CPU some breathing room in this scenario.
Thanks,
-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists