linux-kernel - Re: fs: NULL deref in atime_needs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+YtSpsFJSg393wjF8e5Ytv6CSH7b=X16UDSZsuniiV3fQ@mail.gmail.com>
Date:	Mon, 22 Feb 2016 12:20:30 +0100
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	Al Viro <viro@...iv.linux.org.uk>
Cc:	Mickaël Salaün <mic@...ikod.net>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	syzkaller <syzkaller@...glegroups.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: fs: NULL deref in atime_needs_update

On Sat, Feb 20, 2016 at 6:10 PM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Sat, Feb 20, 2016 at 02:25:40PM +0100, Mickaël Salaün wrote:
>
>> I think the bug may be somewhere in the nd->depth handling (when its value is 0) in fs/namei.c:get_link(): struct saved *last = nd->stack + nd->depth - 1
>
> Getting there with nd->depth == 0 would certainly be a bug - it would mean
> that we got there without should_follow_link() having returned 1.
>
> In case of open() it would be "do_last() has returned positive without
> should_follow_link() having returned 1".
>
> <looks>
>
> OK, there are several places where we rely on not getting bogus return values
> - inode_permission() should not return positives, neither should vfs_open(),
> security_path_truncate() and notify_change().
>
> Other similar "handle the last component" functions are guaranteed to
> never return positives other than directly from should_follow_link(), so
> they are OK.
>
> IIRC, you used LSM to inject a positive value to inode_permission(), right?
>
> Another way to trigger that would've been ->open() returning positive -
> a bug on *anything* since ->open() had been introduced in 0.95.  Amount of
> harm would vary - e.g. 0.95 would simply have that positive number returned
> to userland, looking like successful open(2).  With no new descriptor, of
> course...
>
> Short-term we probably want just
>         if (unlikely(error > 0)) {
>                 WARN_ON(1);
>                 error = -EINVAL;
>         }
> added right after out: in do_last(), try to trigger Dmitry's reproducers
> on it and then work back to the source of that thing *if* that's what's
> happening in his case.  Yours almost certainly is just that.
>
> Longer-term... I'm not sure.  Having a method that is supposed to return 0
> or -E<something> actually return positive is going to be a bad thing, no
> matter what, but "that bogus value gets passed to userland" is a lot
> more tolerable than "kernel memory corruption".  do_last() calling conventions
> make it vulnerable to the latter, and as far as nd->stack underruns that's
> it, but I'm not sure we don't have other places where such bug in driver,
> etc. would translate into mess ;-/
>
> OK, in any case, let's start with checking if Dmitry is seeing that and not
> something else.  I still don't understand his stack traces - the fault
> address quoted in his first posting doesn't match the register values in
> the same trace, and there's also a possibility that it's an RCU-related
> crap.  This should give a warning and prevent an oops if we are hitting
> a stack underrun on bogus positive from do_last().  Dmitry, could you try
> to build with delta below and run your reproducer(s)?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index f624d13..e30deef 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3273,6 +3273,10 @@ opened:
>                         goto exit_fput;
>         }
>  out:
> +       if (unlikely(error > 0)) {
> +               WARN_ON(1);
> +               error = -EINVAL;
> +       }
>         if (got_write)
>                 mnt_drop_write(nd->path.mnt);
>         path_put(&save_parent);


I've reproduced the second report (the one originating in openat) with
this patch and the WARNING did _not_ fire:


kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 2 PID: 17525 Comm: syz-executor Not tainted 4.5.0-rc5+ #331
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88002c6ddf00 ti: ffff88002c740000 task.ti: ffff88002c740000
RIP: 0010:[<ffffffff81821ded>]  [<ffffffff81821ded>]
atime_needs_update+0x2d/0x460
RSP: 0018:ffff88002c747a48  EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88002c747d88
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000c
RBP: ffff88002c747a70 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88002c747d98
R13: 0000000000000000 R14: ffff88002c747d98 R15: ffff88002c747d78
FS:  00007f24da3d9700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000002003f000 CR3: 000000002e2d0000 CR4: 00000000000006e0
Stack:
 ffff88002c747d40 ffff88002c747e08 0000000000000000 ffff88002c747d98
 ffff88002c747d78 ffff88002c747ab8 ffffffff817eeda2 ffff88002c747d78
 ffff880030fa88e8 ffff88002c747c98 0000000000000000 ffff88002c747d40
Call Trace:
 [<     inline     >] get_link fs/namei.c:1006
 [<ffffffff817eeda2>] trailing_symlink+0x142/0x760 fs/namei.c:2094
 [<ffffffff817f5cec>] path_openat+0xb4c/0x5760 fs/namei.c:3393
 [<ffffffff817fe13e>] do_filp_open+0x18e/0x250 fs/namei.c:3425
 [<ffffffff817c2dbc>] do_sys_open+0x1fc/0x420 fs/open.c:1022
 [<     inline     >] SYSC_openat fs/open.c:1049
 [<ffffffff817c3050>] SyS_openat+0x30/0x40 fs/open.c:1043
 [<ffffffff8669f6b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
Code: 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 98 17 d5
ff 48 8d 7b 0c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f>
b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP  [<ffffffff81821ded>] atime_needs_update+0x2d/0x460 fs/inode.c:1617
 RSP <ffff88002c747a48>
---[ end trace 872348222bfe81b0 ]---


This is _not_ on a tmpfs mount.

Regarding the registers, here is disassembly. The crash happens on a
KASAN check of the

ffffffff81821dc0 <atime_needs_update>:
ffffffff81821dc0:       55                      push   %rbp
ffffffff81821dc1:       48 89 e5                mov    %rsp,%rbp
ffffffff81821dc4:       41 57                   push   %r15
ffffffff81821dc6:       41 56                   push   %r14
ffffffff81821dc8:       41 55                   push   %r13
ffffffff81821dca:       41 54                   push   %r12
ffffffff81821dcc:       49 89 fc                mov    %rdi,%r12
ffffffff81821dcf:       53                      push   %rbx
ffffffff81821dd0:       48 89 f3                mov    %rsi,%rbx
ffffffff81821dd3:       e8 98 17 d5 ff          callq
ffffffff81573570 <__sanitizer_cov_trace_pc>
ffffffff81821dd8:       48 8d 7b 0c             lea    0xc(%rbx),%rdi
ffffffff81821ddc:       48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
ffffffff81821de3:       fc ff df
ffffffff81821de6:       48 89 fa                mov    %rdi,%rdx
ffffffff81821de9:       48 c1 ea 03             shr    $0x3,%rdx
ffffffff81821ded:       0f b6 14 02             movzbl (%rdx,%rax,1),%edx
ffffffff81821df1:       48 89 f8                mov    %rdi,%rax
ffffffff81821df4:       83 e0 07                and    $0x7,%eax
ffffffff81821df7:       83 c0 03                add    $0x3,%eax
ffffffff81821dfa:       38 d0                   cmp    %dl,%al
ffffffff81821dfc:       7c 08                   jl
ffffffff81821e06 <atime_needs_update+0x46>
ffffffff81821dfe:       84 d2                   test   %dl,%dl
ffffffff81821e00:       0f 85 03 03 00 00       jne
ffffffff81822109 <atime_needs_update+0x349>
ffffffff81821e06:       f6 43 0c 02             testb  $0x2,0xc(%rbx)
ffffffff81821e0a:       0f 85 1a 02 00 00       jne
ffffffff8182202a <atime_needs_update+0x26a>
ffffffff81821e10:       e8 5b 17 d5 ff          callq
ffffffff81573570 <__sanitizer_cov_trace_pc>
ffffffff81821e15:       48 8d 7b 28             lea    0x28(%rbx),%rdi
ffffffff81821e19:       48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
ffffffff81821e20:       fc ff df
ffffffff81821e23:       48 89 fa                mov    %rdi,%rdx
ffffffff81821e26:       48 c1 ea 03             shr    $0x3,%rdx
ffffffff81821e2a:       80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
ffffffff81821e2e:       0f 85 c4 03 00 00       jne
ffffffff818221f8 <atime_needs_update+0x438>

It means that inode is NULL here:

bool atime_needs_update(const struct path *path, struct inode *inode)
{
  struct vfsmount *mnt = path->mnt;
  struct timespec now;

  if (inode->i_flags & S_NOATIME)
    return false;