linux-kernel - Re: For review: seccomp_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f9b8b86-6e49-17ef-e414-82e489b0b99a@gmail.com>
Date:   Sat, 31 Oct 2020 09:31:48 +0100
From:   "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To:     Jann Horn <jannh@...gle.com>
Cc:     mtk.manpages@...il.com, Kees Cook <keescook@...omium.org>,
        Tycho Andersen <tycho@...ho.pizza>,
        Sargun Dhillon <sargun@...gun.me>,
        Christian Brauner <christian@...uner.io>,
        Daniel Borkmann <daniel@...earbox.net>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        Song Liu <songliubraving@...com>,
        Robert Sesek <rsesek@...gle.com>,
        Containers <containers@...ts.linux-foundation.org>,
        linux-man <linux-man@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Aleksa Sarai <cyphar@...har.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Will Drewry <wad@...omium.org>, bpf <bpf@...r.kernel.org>,
        Andy Lutomirski <luto@...capital.net>
Subject: Re: For review: seccomp_user_notif(2) manual page [v2]

On 10/30/20 8:14 PM, Jann Horn wrote:
> On Thu, Oct 29, 2020 at 3:19 PM Michael Kerrisk (man-pages)
> <mtk.manpages@...il.com> wrote:
>> On 10/29/20 2:42 AM, Jann Horn wrote:
>>> On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages)
>>> <mtk.manpages@...il.com> wrote:
>>>>        static bool
>>>>        getTargetPathname(struct seccomp_notif *req, int notifyFd,
>>>>                          char *path, size_t len)
>>>>        {
>>>>            char procMemPath[PATH_MAX];
>>>>
>>>>            snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid);
>>>>
>>>>            int procMemFd = open(procMemPath, O_RDONLY);
>>>>            if (procMemFd == -1)
>>>>                errExit("\tS: open");
>>>>
>>>>            /* Check that the process whose info we are accessing is still alive.
>>>>               If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed
>>>>               in checkNotificationIdIsValid()) succeeds, we know that the
>>>>               /proc/PID/mem file descriptor that we opened corresponds to the
>>>>               process for which we received a notification. If that process
>>>>               subsequently terminates, then read() on that file descriptor
>>>>               will return 0 (EOF). */
>>>>
>>>>            checkNotificationIdIsValid(notifyFd, req->id);
>>>>
>>>>            /* Read bytes at the location containing the pathname argument
>>>>               (i.e., the first argument) of the mkdir(2) call */
>>>>
>>>>            ssize_t nread = pread(procMemFd, path, len, req->data.args[0]);
>>>>            if (nread == -1)
>>>>                errExit("pread");
>>>
>>> As discussed at
>>> <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
>>> we need to re-check checkNotificationIdIsValid() after reading remote
>>> memory but before using the read value in any way. Otherwise, the
>>> syscall could in the meantime get interrupted by a signal handler, the
>>> signal handler could return, and then the function that performed the
>>> syscall could free() allocations or return (thereby freeing buffers on
>>> the stack).
>>>
>>> In essence, this pread() is (unavoidably) a potential use-after-free
>>> read; and to make that not have any security impact, we need to check
>>> whether UAF read occurred before using the read value. This should
>>> probably be called out elsewhere in the manpage, too...
>>
>> Thanks very much for pointing me at this!
>>
>> So, I want to conform that the fix to the code is as simple as
>> adding a check following the pread() call, something like:
>>
>> [[
>>      ssize_t nread = pread(procMemFd, path, len, req->data.args[argNum]);
>>      if (nread == -1)
>>         errExit("Supervisor: pread");
>>
>>      if (nread == 0) {
>>         fprintf(stderr, "\tS: pread() of /proc/PID/mem "
>>                 "returned 0 (EOF)\n");
>>         exit(EXIT_FAILURE);
>>      }
>>
>>      if (close(procMemFd) == -1)
>>         errExit("Supervisor: close-/proc/PID/mem");
>>
>> +    /* Once again check that the notification ID is still valid. The
>> +       case we are particularly concerned about here is that just
>> +       before we fetched the pathname, the target's blocked system
>> +       call was interrupted by a signal handler, and after the handler
>> +       returned, the target carried on execution (past the interrupted
>> +       system call). In that case, we have no guarantees about what we
>> +       are reading, since the target's memory may have been arbitrarily
>> +       changed by subsequent operations. */
>> +
>> +    if (!notificationIdIsValid(notifyFd, req->id, "post-open"))
>> +        return false;
>> +
>>      /* We have no guarantees about what was in the memory of the target
>>         process. We therefore treat the buffer returned by pread() as
>>         untrusted input. The buffer should be terminated by a null byte;
>>         if not, then we will trigger an error for the target process. */
>>
>>      if (strnlen(path, nread) < nread)
>>          return true;
>> ]]
> 
> Yeah, that should do the job. 

Thanks.

> With the caveat that a cancelled syscall
> could've also led to the memory being munmap()ed, so the nread==0 case
> could also happen legitimately - so you might want to move this check
> up above the nread==0 (mm went away) and nread==-1 (mm still exists,
> but read from address failed, errno EIO) checks if the error message
> shouldn't appear spuriously.

In any case, I've been refactoring (simplifying) that code a little.
I haven't so far rearranged the order of the checks, but I already
log message for the nread==0 case. (Instead, there will eventually
be an error when the response is sent.)

I also haven't exactly tested the scenario you describe in the
seccomp unotify scenario, but I think the above is not correct. Here 
are two scenarios I did test, simply with mmap() and /proc/PID/mem
(no seccomp involved): 

Scenario 1:
A creates a mapping at address X
B opens /proc/A/mem and and lseeks on resulting FD to offset X
A terminates
B reads from FD ==> read() returns 0 (EOF)

Scenario 2:
A creates a mapping at address X
B opens /proc/A/mem and and lseeks on resulting FD to offset X
A unmaps mapping at address X
B reads from FD ==> read() returns -1 / EIO.

That last scenario seems to contradict what you say, since I
think you meant that in this case read() should return 0 in
that case. Have I misunderstood you?

Thanks,

Michael





-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/