linux-kernel - Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160803142850.GA27072@fieldses.org>
Date:	Wed, 3 Aug 2016 10:28:50 -0400
From:	"J. Bruce Fields" <bfields@...ldses.org>
To:	Nikolay Borisov <kernel@...p.com>
Cc:	Jeff Layton <jlayton@...chiereds.net>, viro@...iv.linux.org.uk,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	ebiederm@...ssion.com, containers@...ts.linux-foundation.org,
	Andrey Vagin <avagin@...nvz.org>, xemul@...tuozzo.com
Subject: Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns

On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 04:46 PM, Jeff Layton wrote:
> > On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
> >> On busy container servers reading /proc/locks shows all the locks
> >> created by all clients. This can cause large latency spikes. In my
> >> case I observed lsof taking up to 5-10 seconds while processing around
> >> 50k locks. Fix this by limiting the locks shown only to those created
> >> in the same pidns as the one the proc was mounted in. When reading
> >> /proc/locks from the init_pid_ns show everything.
> >>
> >>> Signed-off-by: Nikolay Borisov <kernel@...p.com>
> >> ---
> >>  fs/locks.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index ee1b15f6fc13..751673d7f7fc 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
> >>  {
> >>>  	struct locks_iterator *iter = f->private;
> >>>  	struct file_lock *fl, *bfl;
> >>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
> >>  
> >>>  	fl = hlist_entry(v, struct file_lock, fl_link);
> >>  
> >>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
> > 
> > Ok, so when you read from a process that's in the init_pid_ns
> > namespace, then you'll get the whole pile of locks, even when reading
> > this from a filesystem that was mounted in a different pid_ns?
> > 
> > That seems odd to me if so. Any reason not to just uniformly use the
> > proc_pidns here?
> 
> [CCing some people from openvz/CRIU]
> 
> My train of thought was "we should have means which would be the one
> universal truth about everything and this would be a process in the
> init_pid_ns".

OK, but why not make that means be "mount proc from the init_pid_ns and
read /proc/locks there".  So just replace current_pidns with proc_pidns
in the above.  I think that's all Jeff was suggesting.

--b.

> I don't have strong preference as long as I'm not breaking
> userspace. As I said before - I think the CRIU guys might be using that
> interface.
> 
> > 
> >>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> >>> +		return 0;
> >> +
> >>>  	lock_get_status(f, fl, iter->li_pos, "");
> >>  
> >>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> >