linux-kernel - Re: [PATCH v2] writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZFLZw33qCKIYSDMJ@P9FQF9L96D.corp.robot.car>
Date:   Wed, 3 May 2023 15:01:39 -0700
From:   Roman Gushchin <roman.gushchin@...ux.dev>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Baokun Li <libaokun1@...wei.com>, linux-fsdevel@...r.kernel.org,
        viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
        tj@...nel.org, dennis@...nel.org, adilger.kernel@...ger.ca,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com, houtao1@...wei.com, stable@...r.kernel.org
Subject: Re: [PATCH v2] writeback, cgroup: fix null-ptr-deref write in
 bdi_split_work_to_wbs

On Tue, May 02, 2023 at 05:17:01PM -0700, Andrew Morton wrote:
> On Mon, 10 Apr 2023 21:08:26 +0800 Baokun Li <libaokun1@...wei.com> wrote:
> 
> > KASAN report null-ptr-deref:
> > ==================================================================
> > BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
> > Write of size 8 at addr 0000000000000000 by task sync/943
> > CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x7f/0xc0
> >  print_report+0x2ba/0x340
> >  kasan_report+0xc4/0x120
> >  kasan_check_range+0x1b7/0x2e0
> >  __kasan_check_write+0x24/0x40
> >  bdi_split_work_to_wbs+0x5c5/0x7b0
> >  sync_inodes_sb+0x195/0x630
> >  sync_inodes_one_sb+0x3a/0x50
> >  iterate_supers+0x106/0x1b0
> >  ksys_sync+0x98/0x160
> > [...]
> > ==================================================================
> > 
> > The race that causes the above issue is as follows:
> > 
> >            cpu1                     cpu2
> > -------------------------|-------------------------
> > inode_switch_wbs
> >  INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
> >  queue_rcu_work(isw_wq, &isw->work)
> >  // queue_work async
> >   inode_switch_wbs_work_fn
> >    wb_put_many(old_wb, nr_switched)
> >     percpu_ref_put_many
> >      ref->data->release(ref)
> >      cgwb_release
> >       queue_work(cgwb_release_wq, &wb->release_work)
> >       // queue_work async
> >        &wb->release_work
> >        cgwb_release_workfn
> >                             ksys_sync
> >                              iterate_supers
> >                               sync_inodes_one_sb
> >                                sync_inodes_sb
> >                                 bdi_split_work_to_wbs
> >                                  kmalloc(sizeof(*work), GFP_ATOMIC)
> >                                  // alloc memory failed
> >         percpu_ref_exit
> >          ref->data = NULL
> >          kfree(data)
> >                                  wb_get(wb)
> >                                   percpu_ref_get(&wb->refcnt)
> >                                    percpu_ref_get_many(ref, 1)
> >                                     atomic_long_add(nr, &ref->data->count)
> >                                      atomic64_add(i, v)
> >                                      // trigger null-ptr-deref
> > 
> > bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs.
> > If the allocation of new work fails, the on-stack fallback will be used and
> > the reference count of the current wb is increased afterwards. If cgroup
> > writeback membership switches occur before getting the reference count and
> > the current wb is released as old_wd, then calling wb_get() or wb_put()
> > will trigger the null pointer dereference above.
> > 
> > This issue was introduced in v4.3-rc7 (see fix tag1). Both sync_inodes_sb()
> > and __writeback_inodes_sb_nr() calls to bdi_split_work_to_wbs() can trigger
> > this issue. For scenarios called via sync_inodes_sb(), originally commit
> > 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback
> > membership switches") reduced the possibility of the issue by adding
> > wb_switch_rwsem, but in v5.14-rc1 (see fix tag2) removed the
> > "inode_io_list_del_locked(inode, old_wb)" from inode_switch_wbs_work_fn()
> > so that wb->state contains WB_has_dirty_io, thus old_wb is not skipped
> > when traversing wbs in bdi_split_work_to_wbs(), and the issue becomes
> > easily reproducible again.
> > 
> > To solve this problem, percpu_ref_exit() is called under RCU protection
> > to avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
> > Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
> > and skip the current wb if wb_tryget() fails because the wb has already
> > been shutdown.
> > 
> > Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
> > Fixes: f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
> 
> Cc Roman for this second commit.

Thanks for the heads up!

The patch looks good to me.
Acked-by: Roman Gushchin <roman.gushchin@...ux.dev>

Thanks!