linux-kernel - Re: KASAN: use-after-free Read in get_mem_cgroup_from

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190306182944.GE23850@redhat.com>
Date:   Wed, 6 Mar 2019 13:29:44 -0500
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     zhong jiang <zhongjiang@...wei.com>
Cc:     Peter Xu <peterx@...hat.com>, Mike Rapoport <rppt@...ux.ibm.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+cbb52e396df3e565ab02@...kaller.appspotmail.com>,
        Michal Hocko <mhocko@...nel.org>, cgroups@...r.kernel.org,
        Johannes Weiner <hannes@...xchg.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        David Rientjes <rientjes@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>
Subject: Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

Hello Zhong,

On Wed, Mar 06, 2019 at 09:07:00PM +0800, zhong jiang wrote:
> The patch use call_rcu to delay free the task_struct, but It is possible to free the task_struct
> ahead of get_mem_cgroup_from_mm. is it right?

Yes it is possible to free before get_mem_cgroup_from_mm, but if it's
freed before get_mem_cgroup_from_mm rcu_read_lock,
rcu_dereference(mm->owner) will return NULL in such case and there
will be no problem.

The simple fix also clears the mm->owner of the failed-fork-mm before
doing the call_rcu. The call_rcu delays the freeing after no other CPU
runs in between rcu_read_lock/unlock anymore. That guarantees that
those critical section will see mm->owner == NULL if the freeing of
the task strut already happened.

The solution Mike suggested for this and that we were wondering as
ideal in the past for the signal issue too, is to move the uffd
delivery at a point where fork is guaranteed to succeed. We should
probably try that too to see how it looks like and if it can be done
in a not intrusive way, but the simple fix that uses RCU should work
too.

Rolling back in case of errors inside fork itself isn't easily doable:
the moment we push the uffd ctx to the other side of the uffd pipe
there's no coming back as that information can reach the userland of
the uffd monitor/reader thread immediately after. The rolling back is
really the other thread failing at mmget_not_zero eventually. It's the
userland that has to rollback in such case when it gets a -ESRCH
retval.

Note that this fork feature is only ever needed in the non-cooperative
case, these things never need to happen when userfaultfd is used by an
app (or a lib) that is aware that it is using userfaultfd.

Thanks,
Andrea