linux-kernel - Proper locking for find_task_by_pid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4FA79E81.4050306@localdomain.pl>
Date:	Mon, 07 May 2012 12:05:53 +0200
From:	Grzegorz Nosek <root@...aldomain.pl>
To:	linux-kernel@...r.kernel.org
Subject: Proper locking for find_task_by_pid_ns?

Hi,

TL;DR: my question is whether this:

read_lock(&tasklist_lock);
rcu_read_lock();
p = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
read_unlock(&tasklist_lock);

is a valid sequence, or is it deadlockable. My Google-fu says _either_ rcu_read_lock or tasklist_lock is required to traverse the task list[1] but I'm not sure about both at the same time.

The whole story:

I encountered a machine crash with the following output on the console (that's all I got):

http://imgur.com/DzksH

The kernel is 3.2.15 plus Linux-VServer (patch-3.2.11-vs2.3.2.8 [2])

I'm guessing (I admit, I didn't disassemble copy_process) the write lock in question is tasklist_lock (the only clearly visible write_lock_irq() call in this function). So to my untrained eye that would look like some process stuck with tasklist_lock held.

The crash happened while shutting down a vserver, which basically looks like this:

long vs_reboot(unsigned int cmd, void __user *arg)
{
       struct vx_info *vxi = current_vx_info();
       long ret = 0;

       vxdprintk(VXD_CBIT(misc, 5),
               "vs_reboot(%p[#%d],%u)",
               vxi, vxi ? vxi->vx_id : 0, cmd);

       ret = vs_reboot_helper(vxi, cmd, arg);
       if (ret)
               return ret;

       vxi->reboot_cmd = cmd;
       if (vx_info_flags(vxi, VXF_REBOOT_KILL, 0)) {
               switch (cmd) {
               case LINUX_REBOOT_CMD_RESTART:
               case LINUX_REBOOT_CMD_HALT:
               case LINUX_REBOOT_CMD_POWER_OFF:
                       vx_info_kill(vxi, 0, SIGKILL);
                       vx_info_kill(vxi, 1, SIGKILL);
               default:
                       break;
               }
       }
       return 0;
}

/* ... */

int vx_info_kill(struct vx_info *vxi, int pid, int sig)
{
       int retval, count = 0;
       struct task_struct *p;
       struct siginfo *sip = SEND_SIG_PRIV;

       retval = -ESRCH;
       vxdprintk(VXD_CBIT(misc, 4),
               "vx_info_kill(%p[#%d],%d,%d)*",
               vxi, vxi->vx_id, pid, sig);
       read_lock(&tasklist_lock);
       switch (pid) {
       case  0:
       case -1:
               for_each_process(p) {
                       int err = 0;

                       if (vx_task_xid(p) != vxi->vx_id || p->pid <= 1 ||
                               (pid && vxi->vx_initpid == p->pid))
                               continue;

                       err = group_send_sig_info(sig, sip, p);
                       ++count;
                       if (err != -EPERM)
                               retval = err;
               }
               break;
       case 1:
               if (vxi->vx_initpid) {
                       pid = vxi->vx_initpid;
                       /* for now, only SIGINT to private init ... */
                       if (!vx_info_flags(vxi, VXF_STATE_ADMIN, 0) &&
                               /* ... as long as there are tasks left */
                               (atomic_read(&vxi->vx_tasks) > 1))
                               sig = SIGINT;
               }
               /* fallthrough */
       default:
               rcu_read_lock();
               p = find_task_by_real_pid(pid);
               rcu_read_unlock();
               if (p) {
                       if (vx_task_xid(p) == vxi->vx_id)
                               retval = group_send_sig_info(sig, sip, p);
               }
               break;
       }
       read_unlock(&tasklist_lock);
       vxdprintk(VXD_CBIT(misc, 4),
               "vx_info_kill(%p[#%d],%d,%d,%ld) = %d",
               vxi, vxi->vx_id, pid, sig, (long)sip, retval);
       return retval;
}

vx_info_kill() is the only place that touches tasklist_lock that does not exist in mainline kernel. find_task_by_real_pid is just find_task_by_pid_ns(pid, &init_pid_ns).

The crash only happened once so far and AFAIK is not easily reproducible. Both the host and the guest were basically completely idle at that time.

So, am I on the right track or should I look elsewhere?

Best regards,
 Grzegorz Nosek

1. http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg57795.html
2. http://vserver.13thfloor.at/Experimental/patch-3.2.11-vs2.3.2.8.diff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/