lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 18 Dec 2020 16:06:15 -0800 From: Stephen Brennan <stephen.s.brennan@...cle.com> To: Alexey Dobriyan <adobriyan@...il.com> Cc: Stephen Brennan <stephen.s.brennan@...cle.com>, James Morris <jmorris@...ei.org>, "Serge E. Hallyn" <serge@...lyn.com>, linux-security-module@...r.kernel.org, Paul Moore <paul@...l-moore.com>, Stephen Smalley <stephen.smalley.work@...il.com>, Eric Paris <eparis@...isplace.org>, selinux@...r.kernel.org, Casey Schaufler <casey@...aufler-ca.com>, Eric Biederman <ebiederm@...ssion.com>, Alexander Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, Matthew Wilcox <willy@...radead.org> Subject: [PATCH v3 1/2] proc: Allow pid_revalidate() during LOOKUP_RCU The pid_revalidate() function requires dropping from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_locrkef as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Allow the pid_revalidate() function to execute under LOOKUP_RCU. When updates must be made to the inode, drop out of RCU and into REF mode. Signed-off-by: Stephen Brennan <stephen.s.brennan@...cle.com> --- When running running ~100 parallel instances of "TZ=/etc/localtime ps -fe >/dev/null" on a 100CPU machine, the %sys utilization reaches 90%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) By applying this patch, %sys utilization falls to around 60% under the same workload. Although this particular workload is a bit contrived, we have seen some monitoring scripts which produced similarly high %sys time due to this contention. Changes from v3: - Rather than call pid_update_inode() with flags, create proc_inode_needs_update() to determine whether the call can be skipped. - Restore the call to the security hook (see next patch). Changes from v2: - Remove get_pid_task_rcu_user() and get_proc_task_rcu(), since they were unnecessary. - Remove the call to security_task_to_inode(). fs/proc/base.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index b3422cda2a91..4b246e9bd5df 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1968,6 +1968,20 @@ void pid_update_inode(struct task_struct *task, struct inode *inode) security_task_to_inode(task, inode); } +/* See if we can avoid the above call. Assumes RCU lock held */ +static bool pid_inode_needs_update(struct task_struct *task, struct inode *inode) +{ + kuid_t uid; + kgid_t gid; + + if (inode->i_mode & (S_ISUID | S_ISGID)) + return true; + task_dump_owner(task, inode->i_mode, &uid, &gid); + if (!uid_eq(uid, inode->i_uid) || !gid_eq(gid, inode->i_gid)) + return true; + return false; +} + /* * Rewrite the inode's ownerships here because the owning task may have * performed a setuid(), etc. @@ -1977,19 +1991,20 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags) { struct inode *inode; struct task_struct *task; + int rv = 0; - if (flags & LOOKUP_RCU) - return -ECHILD; - - inode = d_inode(dentry); - task = get_proc_task(inode); - + rcu_read_lock(); + inode = d_inode_rcu(dentry); + task = pid_task(proc_pid(inode), PIDTYPE_PID); if (task) { - pid_update_inode(task, inode); - put_task_struct(task); - return 1; + rv = 1; + if ((flags & LOOKUP_RCU) && pid_inode_needs_update(task, inode)) + rv = -ECHILD; + else if (!(flags & LOOKUP_RCU)) + pid_update_inode(task, inode); } - return 0; + rcu_read_unlock(); + return rv; } static inline bool proc_inode_is_dead(struct inode *inode) -- 2.25.1
Powered by blists - more mailing lists