[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1279149996.32374.5.camel@w-sridhar.beaverton.ibm.com>
Date: Wed, 14 Jul 2010 16:26:36 -0700
From: Sridhar Samudrala <sri@...ibm.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...e.hu>,
netdev <netdev@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dmitri Vorobiev <dmitri.vorobiev@...ial.com>,
Jiri Kosina <jkosina@...e.cz>,
Thomas Gleixner <tglx@...utronix.de>,
Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH repost] sched: export sched_set/getaffinity to modules
On Tue, 2010-07-13 at 14:09 +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 12, 2010 at 11:59:08PM -0700, Sridhar Samudrala wrote:
> > On 7/4/2010 2:00 AM, Michael S. Tsirkin wrote:
> > >On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote:
> > >>On 07/02, Peter Zijlstra wrote:
> > >>>On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> > >>>> Does it (Tejun's kthread_clone() patch) also inherit the
> > >>>>cgroup of the caller?
> > >>>Of course, its a simple do_fork() which inherits everything just as you
> > >>>would expect from a similar sys_clone()/sys_fork() call.
> > >>Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
> > >>from ioctl(), right?
> > >>
> > >>Then the new thread becomes the natural child of the caller, and it shares
> > >>->mm with the parent. And files, dup_fd() without CLONE_FS.
> > >>
> > >>Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
> > >>TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
> > >>just because the parent gets SIGQUIT or abother coredumpable signal.
> > >>Or the new thread can recieve SIGSTOP via ^Z.
> > >>
> > >>Perhaps this is OK, I do not know. Just to remind that kernel_thread()
> > >>is merely clone(CLONE_VM).
> > >>
> > >>Oleg.
> > >
> > >Right. Doing this might break things like flush. The signal and exit
> > >behaviour needs to be examined carefully. I am also unsure whether
> > >using such threads might be more expensive than inheriting kthreadd.
> > >
> > Should we just leave it to the userspace to set the cgroup/cpumask
> > after qemu starts the guest and
> > the vhost threads?
> >
> > Thanks
> > Sridhar
>
> Yes but we can't trust userspace to do this. It's important
> to do it on thread creation: if we don't, malicious userspace
> can create large amount of work exceeding the cgroup limits.
>
> And the same applies so the affinity: if the qemu process
> is limited to a set of CPUs, it's important to make
> the kernel thread that does work our behalf limited to the same
> set of CPUs.
>
> This is not unique to vhost, it's just that virt scenarious are affected
> by this more: people seem to run untrusted applications and expect the
> damage to be contained.
OK. So we want to create a thread that is a child of kthreadd, but inherits the cgroup/cpumask
from the caller. How about an exported kthread function kthread_create_in_current_cg()
that does this?
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index aabc8a1..e0616f0 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -9,6 +9,9 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
const char namefmt[], ...)
__attribute__((format(printf, 3, 4)));
+struct task_struct *kthread_create_in_current_cg(int (*threadfn)(void *data),
+ void *data, char *name);
+
/**
* kthread_run - create and wake a thread.
* @threadfn: the function to run until signal_pending(current).
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 83911c7..ea4e737 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <trace/events/sched.h>
+#include <linux/cgroup.h>
static DEFINE_SPINLOCK(kthread_create_lock);
static LIST_HEAD(kthread_create_list);
@@ -149,6 +150,42 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
}
EXPORT_SYMBOL(kthread_create);
+struct task_struct *kthread_create_in_current_cg(int (*threadfn)(void *data),
+ void *data, char *name)
+{
+ struct task_struct *worker;
+ cpumask_var_t mask;
+ int ret = -ENOMEM;
+
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ goto out_free_mask;
+
+ worker = kthread_create(threadfn, data, "%s-%d", name, current->pid);
+ if (IS_ERR(worker))
+ goto out_free_mask;
+
+ ret = sched_getaffinity(current->pid, mask);
+ if (ret)
+ goto out_stop_worker;
+
+ ret = sched_setaffinity(worker->pid, mask);
+ if (ret)
+ goto out_stop_worker;
+
+ ret = cgroup_attach_task_current_cg(worker);
+ if (ret)
+ goto out_stop_worker;
+
+ return worker;
+
+out_stop_worker:
+ kthread_stop(worker);
+out_free_mask:
+ free_cpumask_var(mask);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(kthread_create_in_current_cg);
+
/**
* kthread_bind - bind a just-created kthread to a cpu.
* @p: thread created by kthread_create().
Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists