lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFt=ROOnU-rVoLoP=bgR3Y3WpJmhTerAQZqFd-v=eZ306Vrobg@mail.gmail.com>
Date:   Mon, 3 May 2021 23:31:48 +0800
From:   haosdent <haosdent@...il.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        zhengyu.duan@...pee.com, Haosong Huang <huangh@....com>
Subject: Re: NULL pointer dereference when access /proc/net

Hi, Alexander, thanks a lot for your detailed explanations. I take a
look at the code again and the thread and I realize there are some
incorrect representations in my previous word.

>>   struct inode *d_inode = 0x0         <======= d_inode is NULL and cause Oops!

Actually it is Oops at `inode->i_flags` directly instead of `d_inode`,
so the code would not further to change the access time.

```
bool __atime_needs_update(const struct path *path, struct inode *inode,
  bool rcu)
{
struct vfsmount *mnt = path->mnt;
struct timespec now;

if (inode->i_flags & S_NOATIME)    <======= Oops at here according to
`[19521409.372016] IP: __atime_needs_update+0x5/0x190`.
return false;
```

But it looks impossible if we take a look at "walk_component > lookup_fast".

Let me introduce why it goes through "walk_component > lookup_fast"
instead of "walk_component > lookup_slow" first,

in "walk_component > step_into > pick_link", the code would set
`nameidata->stack->seq` to `seq` which is comes from the passed in
parameters.
If the code goes through "walk_component > lookup_slow", `seq` would
be 0 and then `nameidata->stack->seq` would be 0.
If the code goes through "walk_component > lookup_slow", `seq` would
be `dentry->d_seq->sequence`.
According to the contents in attachment files "nameidata.txt" and
"dentry.txt", `dentry->d_seq->sequence` is 4, and
`nameidata->stack->seq` is 4 as well.
So looks like the code goes through "walk_component > lookup_fast" and
"walk_component > step_into > pick_link".

The `inode` parameter in `__atime_needs_update` comes from
`nameidata->link_inode`. But in attachment file "nameidata.txt", we
could found `nameidata->link_inode` is NULL already.
Because the code goes through "walk_component > lookup_fast" and
"walk_component > step_into > pick_link", the `inode` assign to
`nameidata->link_inode` must comes from `lookup_fast`.

So it looks like something wrong in `lookup_fast`. Let me continue to
explain why this looks impossible.

In `walk_component`, `lookup_fast` have to return 1 (> 0), otherwise
it would fallback to `lookup_slow`.

```
err = lookup_fast(nd, &path, &inode, &seq);
if (unlikely(err <= 0)) {
  if (err < 0)
    return err;
  path.dentry = lookup_slow(&nd->last, nd->path.dentry,
        nd->flags);
}

return step_into
```

Because for our case, it looks like the code goes through
"walk_component > lookup_fast" and "walk_component > step_into >
pick_link",
This infers `lookup_fast` return 1 in this Oops.

Because `lookup_fast` return 1, it looks like the code goes through
the following path.

```
  if (nd->flags & LOOKUP_RCU) {
    ...

    *inode = d_backing_inode(dentry);
    negative = d_is_negative(dentry);

    ...
      ...
      if (negative)
        return -ENOENT;
      path->mnt = mnt;
      path->dentry = dentry;
      if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
        return 1;
```

As we see, if `*inode` is NULL, it should return `-ENOENT` because `if
(negative)` would be true, which is conflict with "`lookup_fast`
return 1".

And in `__d_clear_type_and_inode`, it always sets the dentry to
negative first and then sets d_inode to NULL.

```
static inline void __d_clear_type_and_inode(struct dentry *dentry)
{`
  unsigned flags = READ_ONCE(dentry->d_flags);

  flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);   // Set dentry to
negative first.
  WRITE_ONCE(dentry->d_flags, flags);
      // memory barrier
  dentry->d_inode = NULL;
               // Then set d_inode to NULL.
}
```

So looks like `inode` in `lookup_fast` should not be NULL if it could
skip `if (negative)` check even in the RCU case. Unless

```
# in lookup_fast method
  *inode = d_backing_inode(dentry);
  negative = d_is_negative(dentry);
```

is reorder to

```
# in lookup_fast method
  negative = d_is_negative(dentry);
  *inode = d_backing_inode(dentry);
```

when CPU executing the code. But is this possible in RCU?

I diff my local ubuntu's code with upstream tag v4.15.18, it looks
like no different in `fs/namei.c`, `fs/dcache.c`, `fs/proc`.
So possible the problem may happen to upstream tag v4.15.18 as well,
sadly my script still could not reproduce the issue on the server so
far,
would like to see if any insights from you then I could continue to
check what's wrong in this Oops, thank you in advance!

On Tue, Apr 27, 2021 at 1:30 AM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Tue, Apr 27, 2021 at 01:16:44AM +0800, haosdent wrote:
> > > really should not assume ->d_inode stable
> >
> > Hi, Alexander, sorry to disturb you again. Today I try to check what
> > `dentry->d_inode` and `nd->link_inode` looks like when `dentry` is
> > already been killed in `__dentry_kill`.
> >
> > ```
> > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > ```
> >
> > It looks like `dentry->d_inode` could be NULL while `nd->link_inode`
> > is always has value.
> > But this make me confuse, by right `nd->link_inode` is get from
> > `dentry->d_inode`, right?
>
> It's sampled from there, yes.  And in RCU mode there's nothing to
> prevent a previously positive dentry from getting negative and/or
> killed.  ->link_inode (used to - it's gone these days) go with
> ->seq, which had been sampled from dentry->d_seq before fetching
> ->d_inode and then verified to have ->d_seq remain unchanged.
> That gives you "dentry used to have this inode at the time it
> had this d_seq", and that's what gets used to validate the sucker
> when we switch to non-RCU mode (look at legitimize_links()).
>
> IOW, we know that
>         * at some point during the pathwalk that sucker had this inode
>         * the inode won't get freed until we drop out of RCU mode
>         * if we need to go to non-RCU (and thus grab dentry references)
> while we still need that inode, we will verify that nothing has happened
> to that link (same ->d_seq, so it still refers to the same inode) and
> grab dentry reference, making sure it won't go away or become negative
> under us.  Or we'll fail (in case something _has_ happened to dentry)
> and repeat the entire thing in non-RCU mode.



-- 
Best Regards,
Haosdent Huang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ