lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Feb 2014 22:44:32 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	paulmck@...ux.vnet.ibm.com
Cc:	akpm@...ux-foundation.org, joe@...ches.com, keescook@...omium.org,
	geert@...ux-m68k.org, jkosina@...e.cz, viro@...iv.linux.org.uk,
	davem@...emloft.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Change task_struct->comm to use RCU.

Thank you for reviewing, Paul.

Paul E. McKenney wrote:
> > +/**
> > + * set_task_comm - Change task_struct->comm with tracer and perf hooks called.
> > + *
> > + * @tsk: Pointer to "struct task_struct".
> > + * @buf: New comm name.
> > + *
> > + * Returns nothing.
> > + */
> > +void set_task_comm(struct task_struct *tsk, char *buf)
> > +{
> > +	/*
> > +	 * Question: Do we need to use task_lock()/task_unlock() ?
> 
> The have-memory path through do_set_task_comm() does tolerate multiple
> CPUs concurrently setting the comm of a given task, but the no-memory
> path does not.  I advise keeping the locking, at least unless/until
> some workload is having lots of CPUs concurrently changing a given
> task's comm.  For some good reason, I hasten to add!  ;-)
> 
That question meant whether trace_task_rename(tsk, buf) needs to be serialized
by tsk->alloc_lock. If trace_task_rename(tsk, buf) doesn't need to be
serialized by tsk->alloc_lock, we can remove task_lock()/task_unlock().

> Another reason to get rid of the lock is to allow do_set_task_comm()
> to sleep waiting for memory to become available, but not sure that
> this is a good approach.

I used GFP_ATOMIC because I don't know whether there are callers that depend on
"changing task->comm does not sleep", for until today "changing task->comm did
not sleep".

> > +void do_set_task_comm(struct task_struct *tsk, const char *buf)
> > +{
> > +	struct rcu_comm *comm;
> > +
> > +	comm = kmem_cache_zalloc(commname_cachep, GFP_ATOMIC | __GFP_NOWARN);
> > +	if (likely(comm)) {
> > +		atomic_set(&comm->usage, 1);
> > +		strncpy(comm->name, buf, TASK_COMM_LEN - 1);
> > +		smp_wmb(); /* Behave like rcu_assign_pointer(). */
> > +		comm = xchg(&tsk->rcu_comm, comm);
> 
> The value-returning xchg() implies a full memory barrier, so the
> smp_wmb() is not needed.
> 
I see.

> > +		put_commname(comm);
> > +	} else {
> > +		/*
> > +		 * Memory allocation failed. We have to tolerate loss of
> > +		 * consistency.
> > +		 *
> > +		 * Question 1: Do we want to reserve some amount of static
> > +		 * "struct rcu_comm" arrays for use upon allocation failures?
> > +		 *
> > +		 * Question 2: Do we perfer unchanged comm name over
> > +		 * overwritten comm name because comm name is copy-on-write ?
> > +		 */
> > +		WARN_ONCE(1, "Failed to allocate memory for comm name.\n");
> > +		rcu_read_lock();
> > +		do {
> > +			comm = rcu_dereference(tsk->rcu_comm);
> > +		} while (!atomic_read(&comm->usage));
> 
> So the idea here is to avoid races with someone who might be freeing
> this same ->rcu_comm structure, right?  If we see a non-zero reference,
> they might still free it, but that would be OK because we are in an
> RCU read-side critical section, so it won't actually be freed until
> we get done.  And our change is being overwritten by someone else's
> in that case, but that is OK because it could happen anyway.
> 
Right.

> So the loop shouldn't go through many cycles, especially if memory
> remains low.
> 
> If I am correct, might be worth a comment.  Doubly so if I am wrong.  ;-)
> 
You are correct.

What about Q1 and Q2, like below ?

  /* Usage is initialized with ATOMIC_INIT(-1). */
  static struct rcu_comm static_rcu_comm[CONFIG_RESERVED_COMMBUFFER];

  static void free_commname(struct rcu_head *rcu)
  {
  	struct rcu_comm *comm = container_of(rcu, struct rcu_comm, rcu);
  	if (comm < &static_rcu_comm[0] ||
  	    comm >= &static_rcu_comm[CONFIG_RESERVED_COMMBUFFER])
  		kmem_cache_free(commname_cachep, comm);
  	else
  		atomic_set(&comm->usage, -1);
  }

  void do_set_task_comm(struct task_struct *tsk, const char *buf)
  {
  	struct rcu_comm *comm;
  
  	comm = kmem_cache_zalloc(commname_cachep, GFP_ATOMIC | __GFP_NOWARN);
  	if (likely(comm))
  		atomic_set(&comm->usage, 1);
  	else {
  		int i;
  
  		/* Memory allocation failed. Try static comm name buffer. */
  		for (i = 0; i < CONFIG_RESERVED_COMMBUFFER; i++) {
  			comm = &static_rcu_comm[i];
  			if (atomic_read(&comm->usage) != -1)
  				continue;
  			if (atomic_add_return(&comm->usage, 2) == 1)
  				goto found;
  		  	put_commname(comm);
  		  	put_commname(comm);
  		}
  		/* No comm name buffer available. Keep current comm name. */
  		WARN_ONCE(1, "Failed to allocate memory for comm name.\n");
  		return;
  	}
  found:
  	strncpy(comm->name, buf, TASK_COMM_LEN - 1);
  	comm = xchg(&tsk->rcu_comm, comm);
  	put_commname(comm);
  }

Since we can preallocate rcu_comm using GFP_KERNEL and return -ENOMEM (e.g.
prepare_bprm_creds() for execve() case, copy_from_user() for prctl(PR_SET_NAME)
and comm_write() (i.e. /proc/$pid/comm ) cases), there will be little users of
static_rcu_comm[] (e.g. kthreadd()).


> > @@ -1191,6 +1195,11 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >  	p = dup_task_struct(current);
> >  	if (!p)
> >  		goto fork_out;
> > +	rcu_read_lock();
> > +	do {
> > +		p->rcu_comm = rcu_dereference(current->rcu_comm);
> > +	} while (!atomic_inc_not_zero(&p->rcu_comm->usage));
> 
> OK, loop until we get the sole reference on the comm...  But
> I suggest an rcu_read_unlock() followed by an rcu_read_lock() as the
> first thing in the loop.  Just to prevent adding an RCU CPU stall to
> our woes if we cannot get a reference.
> 
I see. I will use

  p->rcu_comm = NULL;
  do {
  	struct rcu_comm *comm;
  	rcu_read_lock();
  	comm = rcu_dereference(current->rcu_comm);
  	if (atomic_inc_not_zero(&comm->usage))
  		p->rcu_comm = comm;
  	rcu_read_unlock();
  } while (!p->rcu_comm);

in the next version (if we use RCU approach).

> And here the two tasks (parent and child) share the name until one
> or the other either changes its name or exits.  OK.
> 
Yes, this is copy on write approach.

Another approach could use heuristic atomicity; change from "char comm[16]"
to "long comm[16/sizeof(long)]" and read/write using "long". By using "long",
we can reduce the possibility of getting the mixture of current comm name and
comm name to update to, for reading/writing of "long" is atomic.

The horizontal axis of below map is the strlen() of current comm name and the
vertical axis is the strlen() of comm name to update to. If sizeof(long) == 8,
193 out of 256 patterns can be atomically updated without any locks. Moreover,
since strlen() of at least one of current comm name and new comm name tends to
be shorter than sizeof(long) (e.g. "bash" -> "ls", "rc.sysinit" -> "cat",
"bash" -> "avahi-daemon" ), update of comm name will more likely be atomic
without any locks.

    | 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
  --+-----------------------------------------------
   0| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   1| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   2| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   3| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   4| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   5| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   6| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   7| o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
   8| o  o  o  o  o  o  o  o  o  x  x  x  x  x  x  x
   9| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  10| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  11| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  12| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  13| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  14| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x
  15| o  o  o  o  o  o  o  o  x  x  x  x  x  x  x  x

> > +	rcu_read_unlock();
> > 
> >  	ftrace_graph_init_task(p);
> >  	get_seccomp_filter(p);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ