lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3423a470-c152-0dbf-c7a7-2775a9679194@i-love.sakura.ne.jp>
Date:   Fri, 26 Oct 2018 21:23:54 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     serge@...lyn.com,
        syzbot <syzbot+a9ac39bf55329e206219@...kaller.appspotmail.com>,
        jmorris@...ei.org, keescook@...omium.org,
        linux-kernel@...r.kernel.org,
        linux-security-module@...r.kernel.org,
        syzkaller-bugs@...glegroups.com
Subject: Re: KASAN: use-after-free Read in task_is_descendant

On 2018/10/26 0:55, Oleg Nesterov wrote:
> On 10/25, Tetsuo Handa wrote:
>>
>> On 2018/10/25 21:17, Oleg Nesterov wrote:
>>>>> And yes, task_is_descendant() can hit the dead child, if nothing else it can
>>>>> be killed. This can explain the kasan report.
>>>>
>>>> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
>>>> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
>>>> isn't it?
>>>
>>> Yes. and you know, I am all confused. I no longer can understand you :/
>>
>> Why don't we need to check every time like shown below?
>> Why checking only once is sufficient?
> 
> Why do you think it is not sufficient?
> 
> Again, I can be easily wrong, rcu is not simple, but so far I think we need
> a single check at the start.
> 

Hmm, this report is difficult to guess what happened.

Since the "child" passed to task_is_descendant() has at least one reference
count taken by find_get_task_by_vpid(), rcu_dereference(walker->real_parent)
in the first iteration

  while (child->pid > 0) {
    if (!thread_group_leader(child))
      walker = rcu_dereference(child->group_leader);
    if (walker == parent) {
      rc = 1;
      break;
    }
    walker = rcu_dereference(walker->real_parent);
  }

must not trigger use-after-free bug. Thus, when this use-after-free was
detected at rcu_dereference(walker->real_parent), the memory pointed by
"walker" must have been released between

  while (walker->pid > 0) {
    if (!thread_group_leader(walker))
      walker = rcu_dereference(walker->group_leader);

and

    walker = rcu_dereference(walker->real_parent);
  }

because otherwise use-after-free would have been reported at walker->pid
or thread_group_leader(walker) or rcu_dereference(walker->group_leader).

Is my understanding correct?



Then, what pid_alive(child) is testing? It is not memory pointed by "child" but
memory pointed by "walker" (i.e. parent of "child" or parent of parent of "child"
or ... ) which is triggering use-after-free.

Suppose p1 == p2->real_parent and p2 == p3->real_parent, and p1 exited
when p2 tried to attach on p1, p2->real_parent was pointing to already
(or about to be) freed p1.

Even if pid_alive(p2) test can guarantee that p1 won't be released,
how can pid_alive(p3) test guarantee that p1 won't be released?
p1 can be released any moment because it has already waited for RCU
grace period, can't it?


ptrace(PTRACE_ATTACH, vpid_of_p2) {
  p2 = find_get_task_by_vpid(vpid_of_p2);
  ptrace_attach(p2, PTRACE_ATTACH, addr, data) {
    mutex_lock_interruptible(&p2->signal->cred_guard_mutex);
    // p1 starts exit()ing here.
    task_lock(p2);
    __ptrace_may_access(p2) {
      // p2->real_parent starts pointing to already freed p1.
      security_ptrace_access_check(p2, PTRACE_MODE_ATTACH) {
        yama_ptrace_access_check() {
           task_is_descendant(current, p2) {
             walker = p2;
             rcu_read_lock();
             if (pid_alive(p2)) { // If true
               if (p2->pid > 0) { // will be true
                 p1 = rcu_dereference(p2->real_parent); // might be OK due to pid_alive(p2) == true?
               }
             }
             rcu_read_unlock();
           }
        }
      }
    }
    task_unlock(p2);
    mutex_unlock(&p2->signal->cred_guard_mutex);
  }
  put_task_struct(p2);
}

ptrace(PTRACE_ATTACH, vpid_of_p3) {
  p3 = find_get_task_by_vpid(vpid_of_p3);
  ptrace_attach(p3, PTRACE_ATTACH, addr, data) {
    mutex_lock_interruptible(&p3->signal->cred_guard_mutex);
    // p1 starts exit()ing here.
    task_lock(p3);
    __ptrace_may_access(p3) {
      // p2->real_parent starts pointing to already freed p1.
      security_ptrace_access_check(p3, PTRACE_MODE_ATTACH) {
        yama_ptrace_access_check() {
           task_is_descendant(current, p3) {
             walker = p3;
             rcu_read_lock();
             if (pid_alive(p3)) { // If true
               if (p3->pid > 0) { // will be true
                 p2 = rcu_dereference(p3->real_parent); // will be OK if above assumption is OK.
                 if (p2->pid > 0) { // will be true
                   p1 = rcu_dereference(p2->real_parent); // will read already (or about to be) freed p1 address
                   if (p1->pid > 0) { // Oops here or
                     if (!thread_group_leader(p1)) // oops here or
                       p1 = rcu_dereference(p1->group_leader); // oops here or
                     p0 = rcu_dereference(p1->real_parent); // oops here, or not oops because releasing after this
                   }
                 }
               }
             }
             rcu_read_unlock();
           }
        }
      }
    }
    task_unlock(p3);
    mutex_unlock(&p3->signal->cred_guard_mutex);
  }
  put_task_struct(p3);
}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ