[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <319f1e21-6cf4-4502-ebc8-c808560fb48d@oracle.com>
Date: Mon, 11 Jul 2022 23:44:14 -0500
From: Mike Christie <michael.christie@...cle.com>
To: hch@...radead.org, stefanha@...hat.com, jasowang@...hat.com,
mst@...hat.com, sgarzare@...hat.com,
virtualization@...ts.linux-foundation.org, brauner@...nel.org,
ebiederm@...ssion.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v10 0/8] Use copy_process in vhost layer
Eric and Christian, Ping?
If you guys don't like these patches anymore what about something
simple like just exporting some helpers to update and check a task's
nproc limit. Something like this:
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 81cab4b01edc..71b5946be792 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -98,6 +98,10 @@ int kernel_wait(pid_t pid, int *stat);
extern void free_task(struct task_struct *tsk);
+extern bool task_is_over_nproc_limit(struct task_struct *tsk);
+extern void task_inc_nproc(struct task_struct *tsk);
+extern void task_dec_nproc(struct task_struct *tsk);
+
/* sched_exec is called by processes performing an exec */
#ifdef CONFIG_SMP
extern void sched_exec(void);
diff --git a/kernel/cred.c b/kernel/cred.c
index e10c15f51c1f..c15e7b926013 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -358,7 +358,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
kdebug("share_creds(%p{%d,%d})",
p->cred, atomic_read(&p->cred->usage),
read_cred_subscribers(p->cred));
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ task_inc_nproc(p);
return 0;
}
@@ -395,7 +395,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
#endif
p->cred = p->real_cred = get_cred(new);
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ task_inc_nproc(p);
alter_cred_subscribers(new, 2);
validate_creds(new);
return 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..88dbe3458d7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1964,6 +1964,32 @@ static void copy_oom_score_adj(u64 clone_flags, struct task_struct *tsk)
mutex_unlock(&oom_adj_mutex);
}
+bool task_is_over_nproc_limit(struct task_struct *tsk)
+{
+ if (!is_ucounts_overlimit(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC,
+ task_rlimit(tsk, RLIMIT_NPROC)))
+ return false;
+
+ if (tsk->real_cred->user != INIT_USER && !capable(CAP_SYS_RESOURCE) &&
+ !capable(CAP_SYS_ADMIN))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(task_is_over_nproc_limit);
+
+void task_inc_nproc(struct task_struct *tsk)
+{
+ inc_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1);
+}
+EXPORT_SYMBOL_GPL(task_inc_nproc);
+
+void task_dec_nproc(struct task_struct *tsk)
+{
+ dec_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1);
+}
+EXPORT_SYMBOL_GPL(task_dec_nproc);
+
/*
* This creates a new process as a copy of the old one,
* but does not actually start it yet.
@@ -2102,11 +2128,8 @@ static __latent_entropy struct task_struct *copy_process(
goto bad_fork_free;
retval = -EAGAIN;
- if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
- if (p->real_cred->user != INIT_USER &&
- !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
- goto bad_fork_cleanup_count;
- }
+ if (task_is_over_nproc_limit(p))
+ goto bad_fork_cleanup_count;
current->flags &= ~PF_NPROC_EXCEEDED;
/*
@@ -2526,7 +2549,7 @@ static __latent_entropy struct task_struct *copy_process(
bad_fork_cleanup_delayacct:
delayacct_tsk_free(p);
bad_fork_cleanup_count:
- dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ task_dec_nproc(p);
exit_creds(p);
bad_fork_free:
WRITE_ONCE(p->__state, TASK_DEAD);
On 6/19/22 8:13 PM, Mike Christie wrote:
> The following patches were made over Linus's tree.
>
> Eric and Christian, the vhost maintainer, Michael Tsirkin has ACK'd the
> patches. I haven't got any more comments from you guys for a couple
> postings now (Jan 8 was the last reply). Are you guys ok to merge them?
>
> For everyone else that hasn't see this before, the patches allow the
> vhost layer to do a copy_process on the thread that does the
> VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its
> userspace app. This allows the vhost layer's worker threads to inherit
> cgroups, namespaces, address space, etc and this worker thread will also
> be accounted for against that owner/parent process's RLIMIT_NPROC limit.
>
> If you are not familiar with qemu and vhost here is more detailed
> problem description:
>
> Qemu will create vhost devices in the kernel which perform network, SCSI,
> etc IO and management operations from worker threads created by the
> kthread API. Because the kthread API does a copy_process on the kthreadd
> thread, the vhost layer has to use kthread_use_mm to access the Qemu
> thread's memory and cgroup_attach_task_all to add itself to the Qemu
> thread's cgroups.
>
> The problem with this approach is that we then have to add new functions/
> args/functionality for every thing we want to inherit. I started doing
> that here:
>
> https://lkml.org/lkml/2021/6/23/1233
>
> for the RLIMIT_NPROC check, but it seems it might be easier to just
> inherit everything from the beginning, becuase I'd need to do something
> like that patch several times.
>
> V10:
> - Eric's cleanup patches my vhost flush cleanup patches are merged
> upstream, so rebase against Linus's tree which has everything.
> V9:
> - Rebase against Eric's kthread-cleanups-for-v5.19 branch. Drop patches
> no longer needed due to kernel clone arg and pf io worker patches in that
> branch.
> V8:
> - Fix kzalloc GFP use.
> - Fix email subject version number.
> V7:
> - Drop generic user_worker_* helpers and replace with vhost_task specific
> ones.
> - Drop autoreap patch. Use kernel_wait4 instead.
> - Fix issue where vhost.ko could be removed while the worker function is
> still running.
> V6:
> - Rename kernel_worker to user_worker and fix prefixes.
> - Add better patch descriptions.
> V5:
> - Handle kbuild errors by building patchset against current kernel that
> has all deps merged. Also add patch to remove create_io_thread code as
> it's not used anymore.
> - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER
> case added in 5.16-rc1.
> - Add PF_USER_WORKER flag so we can check it later after the initial
> thread creation for the wake up, vm and singal cses.
> - Added patch to auto reap the worker thread.
> V4:
> - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
> - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
> added the new kernel worker functions.
> - Fixed extra "i" issue.
> - Added PF_USER_WORKER flag and added check that kernel_worker_start users
> had that flag set. Also dropped patches that passed worker flags to
> copy_thread and replaced with PF_USER_WORKER check.
> V3:
> - Add parentheses in p->flag and work_flags check in copy_thread.
> - Fix check in arm/arm64 which was doing the reverse of other archs
> where it did likely(!flags) instead of unlikely(flags).
> V2:
> - Rename kernel_copy_process to kernel_worker.
> - Instead of exporting functions, make kernel_worker() a proper
> function/API that does common work for the caller.
> - Instead of adding new fields to kernel_clone_args for each option
> make it flag based similar to CLONE_*.
> - Drop unused completion struct in vhost.
> - Fix compile warnings by merging vhost cgroup cleanup patch and
> vhost conversion patch.
>
>
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@...ts.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Powered by blists - more mailing lists