lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFt=ROMzGwFssZA4Z35UeP2JPwEZtf62spm1Q+mN+mN+08bk8Q@mail.gmail.com>
Date:   Thu, 6 May 2021 18:21:37 +0800
From:   haosdent <haosdent@...il.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        zhengyu.duan@...pee.com, Haosong Huang <huangh@....com>
Subject: Re: NULL pointer dereference when access /proc/net

Oh, Al, I saw you mentioned the reoder issue at
https://groups.google.com/g/syzkaller/c/0SW33jMcrXQ/m/lHJUsWHVBwAJ
and with a follow-up patch
https://gist.githubusercontent.com/dvyukov/67fe363d5ce2e2b06c71/raw/4d1b6c23f8dff7e0f8e2e3cab7e50208fddb0570/gistfile1.txt
However, it looks it would not work in some previous version
https://github.com/torvalds/linux/blob/v4.16/fs/dcache.c#L496

Because we set `d_hahs.prev` to `NULL` at

```
void __d_drop(struct dentry *dentry)
{
  ___d_drop(dentry);
  dentry->d_hash.pprev = NULL;
}
```

then in `dentry_unlink_inode`, `raw_write_seqcount_begin` would be
skipped due to `if (hashed)` condition is false.

```
static void dentry_unlink_inode(struct dentry * dentry)
    __releases(dentry->d_lock)
    __releases(dentry->d_inode->i_lock)
{
    struct inode *inode = dentry->d_inode;
    bool hashed = !d_unhashed(dentry);

    if (hashed)
        raw_write_seqcount_begin(&dentry->d_seq);  <--- Looks would
skip because hashed is false.
    __d_clear_type_and_inode(dentry);
    hlist_del_init(&dentry->d_u.d_alias);
    if (hashed)
        raw_write_seqcount_end(&dentry->d_seq);     <--- Looks would
skip because hashed is false.
...
```

Should we backport
https://github.com/torvalds/linux/commit/4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab
and https://github.com/torvalds/linux/commit/0632a9ac7bc0a32f8251a53b3925775f0a7c4da6
to previous versions?

On Mon, May 3, 2021 at 11:31 PM haosdent <haosdent@...il.com> wrote:
>
> Hi, Alexander, thanks a lot for your detailed explanations. I take a
> look at the code again and the thread and I realize there are some
> incorrect representations in my previous word.
>
> >>   struct inode *d_inode = 0x0         <======= d_inode is NULL and cause Oops!
>
> Actually it is Oops at `inode->i_flags` directly instead of `d_inode`,
> so the code would not further to change the access time.
>
> ```
> bool __atime_needs_update(const struct path *path, struct inode *inode,
>   bool rcu)
> {
> struct vfsmount *mnt = path->mnt;
> struct timespec now;
>
> if (inode->i_flags & S_NOATIME)    <======= Oops at here according to
> `[19521409.372016] IP: __atime_needs_update+0x5/0x190`.
> return false;
> ```
>
> But it looks impossible if we take a look at "walk_component > lookup_fast".
>
> Let me introduce why it goes through "walk_component > lookup_fast"
> instead of "walk_component > lookup_slow" first,
>
> in "walk_component > step_into > pick_link", the code would set
> `nameidata->stack->seq` to `seq` which is comes from the passed in
> parameters.
> If the code goes through "walk_component > lookup_slow", `seq` would
> be 0 and then `nameidata->stack->seq` would be 0.
> If the code goes through "walk_component > lookup_slow", `seq` would
> be `dentry->d_seq->sequence`.
> According to the contents in attachment files "nameidata.txt" and
> "dentry.txt", `dentry->d_seq->sequence` is 4, and
> `nameidata->stack->seq` is 4 as well.
> So looks like the code goes through "walk_component > lookup_fast" and
> "walk_component > step_into > pick_link".
>
> The `inode` parameter in `__atime_needs_update` comes from
> `nameidata->link_inode`. But in attachment file "nameidata.txt", we
> could found `nameidata->link_inode` is NULL already.
> Because the code goes through "walk_component > lookup_fast" and
> "walk_component > step_into > pick_link", the `inode` assign to
> `nameidata->link_inode` must comes from `lookup_fast`.
>
> So it looks like something wrong in `lookup_fast`. Let me continue to
> explain why this looks impossible.
>
> In `walk_component`, `lookup_fast` have to return 1 (> 0), otherwise
> it would fallback to `lookup_slow`.
>
> ```
> err = lookup_fast(nd, &path, &inode, &seq);
> if (unlikely(err <= 0)) {
>   if (err < 0)
>     return err;
>   path.dentry = lookup_slow(&nd->last, nd->path.dentry,
>         nd->flags);
> }
>
> return step_into
> ```
>
> Because for our case, it looks like the code goes through
> "walk_component > lookup_fast" and "walk_component > step_into >
> pick_link",
> This infers `lookup_fast` return 1 in this Oops.
>
> Because `lookup_fast` return 1, it looks like the code goes through
> the following path.
>
> ```
>   if (nd->flags & LOOKUP_RCU) {
>     ...
>
>     *inode = d_backing_inode(dentry);
>     negative = d_is_negative(dentry);
>
>     ...
>       ...
>       if (negative)
>         return -ENOENT;
>       path->mnt = mnt;
>       path->dentry = dentry;
>       if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
>         return 1;
> ```
>
> As we see, if `*inode` is NULL, it should return `-ENOENT` because `if
> (negative)` would be true, which is conflict with "`lookup_fast`
> return 1".
>
> And in `__d_clear_type_and_inode`, it always sets the dentry to
> negative first and then sets d_inode to NULL.
>
> ```
> static inline void __d_clear_type_and_inode(struct dentry *dentry)
> {`
>   unsigned flags = READ_ONCE(dentry->d_flags);
>
>   flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);   // Set dentry to
> negative first.
>   WRITE_ONCE(dentry->d_flags, flags);
>       // memory barrier
>   dentry->d_inode = NULL;
>                // Then set d_inode to NULL.
> }
> ```
>
> So looks like `inode` in `lookup_fast` should not be NULL if it could
> skip `if (negative)` check even in the RCU case. Unless
>
> ```
> # in lookup_fast method
>   *inode = d_backing_inode(dentry);
>   negative = d_is_negative(dentry);
> ```
>
> is reorder to
>
> ```
> # in lookup_fast method
>   negative = d_is_negative(dentry);
>   *inode = d_backing_inode(dentry);
> ```
>
> when CPU executing the code. But is this possible in RCU?
>
> I diff my local ubuntu's code with upstream tag v4.15.18, it looks
> like no different in `fs/namei.c`, `fs/dcache.c`, `fs/proc`.
> So possible the problem may happen to upstream tag v4.15.18 as well,
> sadly my script still could not reproduce the issue on the server so
> far,
> would like to see if any insights from you then I could continue to
> check what's wrong in this Oops, thank you in advance!
>
> On Tue, Apr 27, 2021 at 1:30 AM Al Viro <viro@...iv.linux.org.uk> wrote:
> >
> > On Tue, Apr 27, 2021 at 01:16:44AM +0800, haosdent wrote:
> > > > really should not assume ->d_inode stable
> > >
> > > Hi, Alexander, sorry to disturb you again. Today I try to check what
> > > `dentry->d_inode` and `nd->link_inode` looks like when `dentry` is
> > > already been killed in `__dentry_kill`.
> > >
> > > ```
> > > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > > nd->last.name: net/sockstat, dentry->d_lockref.count: -128,
> > > dentry->d_inode: (nil), nd->link_inode: 0xffffffffab299966
> > > ```
> > >
> > > It looks like `dentry->d_inode` could be NULL while `nd->link_inode`
> > > is always has value.
> > > But this make me confuse, by right `nd->link_inode` is get from
> > > `dentry->d_inode`, right?
> >
> > It's sampled from there, yes.  And in RCU mode there's nothing to
> > prevent a previously positive dentry from getting negative and/or
> > killed.  ->link_inode (used to - it's gone these days) go with
> > ->seq, which had been sampled from dentry->d_seq before fetching
> > ->d_inode and then verified to have ->d_seq remain unchanged.
> > That gives you "dentry used to have this inode at the time it
> > had this d_seq", and that's what gets used to validate the sucker
> > when we switch to non-RCU mode (look at legitimize_links()).
> >
> > IOW, we know that
> >         * at some point during the pathwalk that sucker had this inode
> >         * the inode won't get freed until we drop out of RCU mode
> >         * if we need to go to non-RCU (and thus grab dentry references)
> > while we still need that inode, we will verify that nothing has happened
> > to that link (same ->d_seq, so it still refers to the same inode) and
> > grab dentry reference, making sure it won't go away or become negative
> > under us.  Or we'll fail (in case something _has_ happened to dentry)
> > and repeat the entire thing in non-RCU mode.
>
>
>
> --
> Best Regards,
> Haosdent Huang



-- 
Best Regards,
Haosdent Huang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ