lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <319f1e21-6cf4-4502-ebc8-c808560fb48d@oracle.com>
Date:   Mon, 11 Jul 2022 23:44:14 -0500
From:   Mike Christie <michael.christie@...cle.com>
To:     hch@...radead.org, stefanha@...hat.com, jasowang@...hat.com,
        mst@...hat.com, sgarzare@...hat.com,
        virtualization@...ts.linux-foundation.org, brauner@...nel.org,
        ebiederm@...ssion.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v10 0/8] Use copy_process in vhost layer

Eric and Christian, Ping?

If you guys don't like these patches anymore what about something
simple like just exporting some helpers to update and check a task's
nproc limit. Something like this:


diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 81cab4b01edc..71b5946be792 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -98,6 +98,10 @@ int kernel_wait(pid_t pid, int *stat);
 
 extern void free_task(struct task_struct *tsk);
 
+extern bool task_is_over_nproc_limit(struct task_struct *tsk);
+extern void task_inc_nproc(struct task_struct *tsk);
+extern void task_dec_nproc(struct task_struct *tsk);
+
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
 extern void sched_exec(void);
diff --git a/kernel/cred.c b/kernel/cred.c
index e10c15f51c1f..c15e7b926013 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -358,7 +358,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 		kdebug("share_creds(%p{%d,%d})",
 		       p->cred, atomic_read(&p->cred->usage),
 		       read_cred_subscribers(p->cred));
-		inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+		task_inc_nproc(p);
 		return 0;
 	}
 
@@ -395,7 +395,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 #endif
 
 	p->cred = p->real_cred = get_cred(new);
-	inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+	task_inc_nproc(p);
 	alter_cred_subscribers(new, 2);
 	validate_creds(new);
 	return 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..88dbe3458d7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1964,6 +1964,32 @@ static void copy_oom_score_adj(u64 clone_flags, struct task_struct *tsk)
 	mutex_unlock(&oom_adj_mutex);
 }
 
+bool task_is_over_nproc_limit(struct task_struct *tsk)
+{
+	if (!is_ucounts_overlimit(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC,
+	    task_rlimit(tsk, RLIMIT_NPROC)))
+		return false;
+
+	if (tsk->real_cred->user != INIT_USER && !capable(CAP_SYS_RESOURCE) &&
+	    !capable(CAP_SYS_ADMIN))
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(task_is_over_nproc_limit);
+
+void task_inc_nproc(struct task_struct *tsk)
+{
+	inc_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1);
+}
+EXPORT_SYMBOL_GPL(task_inc_nproc);
+
+void task_dec_nproc(struct task_struct *tsk)
+{
+	dec_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1);
+}
+EXPORT_SYMBOL_GPL(task_dec_nproc);
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -2102,11 +2128,8 @@ static __latent_entropy struct task_struct *copy_process(
 		goto bad_fork_free;
 
 	retval = -EAGAIN;
-	if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
-		if (p->real_cred->user != INIT_USER &&
-		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
-			goto bad_fork_cleanup_count;
-	}
+	if (task_is_over_nproc_limit(p))
+		goto bad_fork_cleanup_count;
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
 	/*
@@ -2526,7 +2549,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cleanup_delayacct:
 	delayacct_tsk_free(p);
 bad_fork_cleanup_count:
-	dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+	task_dec_nproc(p);
 	exit_creds(p);
 bad_fork_free:
 	WRITE_ONCE(p->__state, TASK_DEAD);


On 6/19/22 8:13 PM, Mike Christie wrote:
> The following patches were made over Linus's tree.
> 
> Eric and Christian, the vhost maintainer, Michael Tsirkin has ACK'd the
> patches. I haven't got any more comments from you guys for a couple
> postings now (Jan 8 was the last reply). Are you guys ok to merge them?
> 
> For everyone else that hasn't see this before, the patches allow the
> vhost layer to do a copy_process on the thread that does the
> VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its
> userspace app. This allows the vhost layer's worker threads to inherit
> cgroups, namespaces, address space, etc and this worker thread will also
> be accounted for against that owner/parent process's RLIMIT_NPROC limit.
> 
> If you are not familiar with qemu and vhost here is more detailed
> problem description:
> 
> Qemu will create vhost devices in the kernel which perform network, SCSI,
> etc IO and management operations from worker threads created by the
> kthread API. Because the kthread API does a copy_process on the kthreadd
> thread, the vhost layer has to use kthread_use_mm to access the Qemu
> thread's memory and cgroup_attach_task_all to add itself to the Qemu
> thread's cgroups.
> 
> The problem with this approach is that we then have to add new functions/
> args/functionality for every thing we want to inherit. I started doing
> that here:
> 
> https://lkml.org/lkml/2021/6/23/1233
> 
> for the RLIMIT_NPROC check, but it seems it might be easier to just
> inherit everything from the beginning, becuase I'd need to do something
> like that patch several times.
> 
> V10:
> - Eric's cleanup patches my vhost flush cleanup patches are merged
> upstream, so rebase against Linus's tree which has everything.
> V9:
> - Rebase against Eric's kthread-cleanups-for-v5.19 branch. Drop patches
> no longer needed due to kernel clone arg and pf io worker patches in that
> branch.
> V8:
> - Fix kzalloc GFP use.
> - Fix email subject version number.
> V7:
> - Drop generic user_worker_* helpers and replace with vhost_task specific
>   ones.
> - Drop autoreap patch. Use kernel_wait4 instead.
> - Fix issue where vhost.ko could be removed while the worker function is
>   still running.
> V6:
> - Rename kernel_worker to user_worker and fix prefixes.
> - Add better patch descriptions.
> V5:
> - Handle kbuild errors by building patchset against current kernel that
>   has all deps merged. Also add patch to remove create_io_thread code as
>   it's not used anymore.
> - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER
>   case added in 5.16-rc1.
> - Add PF_USER_WORKER flag so we can check it later after the initial
>   thread creation for the wake up, vm and singal cses.
> - Added patch to auto reap the worker thread.
> V4:
> - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
> - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
>   added the new kernel worker functions.
> - Fixed extra "i" issue.
> - Added PF_USER_WORKER flag and added check that kernel_worker_start users
>   had that flag set. Also dropped patches that passed worker flags to
>   copy_thread and replaced with PF_USER_WORKER check.
> V3:
> - Add parentheses in p->flag and work_flags check in copy_thread.
> - Fix check in arm/arm64 which was doing the reverse of other archs
>   where it did likely(!flags) instead of unlikely(flags).
> V2:
> - Rename kernel_copy_process to kernel_worker.
> - Instead of exporting functions, make kernel_worker() a proper
>   function/API that does common work for the caller.
> - Instead of adding new fields to kernel_clone_args for each option
>   make it flag based similar to CLONE_*.
> - Drop unused completion struct in vhost.
> - Fix compile warnings by merging vhost cgroup cleanup patch and
>   vhost conversion patch.
> 
> 
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@...ts.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ