linux-kernel - Re: possible deadlock in shmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFz9kEfTPxcausVj63mUvU7i6Dvv6=KNePVX2qR+-Ci2A@mail.gmail.com>
Date:   Tue, 14 Jul 2020 10:32:20 -0700
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Todd Kjos <tkjos@...gle.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Hridya Valsaraju <hridya@...gle.com>,
        Hillf Danton <hdanton@...a.com>,
        Eric Biggers <ebiggers@...nel.org>,
        syzbot <syzbot+7a0d9d0b26efefe61780@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arve Hjønnevåg <arve@...roid.com>,
        Christian Brauner <christian@...uner.io>,
        "open list:ANDROID DRIVERS" <devel@...verdev.osuosl.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Martijn Coenen <maco@...roid.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Todd Kjos <tkjos@...roid.com>,
        Markus Elfring <Markus.Elfring@....de>
Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue, Jul 14, 2020 at 9:41 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos <tkjos@...gle.com> wrote:
> >
> > +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver.
>
> Thanks for looping me in.
>
> >
> >
> > On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko <mhocko@...nel.org> wrote:
> > >
> > > On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> > > >
> > > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > > > >
> > > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > > > > >
> > > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > > > > > > new flag. And the overall upside is to keep the current gfp either in
> > > > > > > > the khugepaged context or not.
> > > > > > > >
> > > > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > > > @@ -77,4 +77,6 @@
> > > > > > > >   */
> > > > > > > >  #define FALLOC_FL_UNSHARE_RANGE              0x40
> > > > > > > >
> > > > > > > > +#define FALLOC_FL_NOBLOCK            0x80
> > > > > > > > +
> > > > > > >
> > > > > > > You can't add a new UAPI flag to fix a kernel-internal problem like this.
> > > > > >
> > > > > > Sounds fair, see below.
> > > > > >
> > > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > > > > checked on the ashmem side and added as an exception before going
> > > > > > to filesystem. On shmem side, no more than a best effort is paid
> > > > > > on the inteded exception.
> > > > > >
> > > > > > --- a/drivers/staging/android/ashmem.c
> > > > > > +++ b/drivers/staging/android/ashmem.c
> > > > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > > >  ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > >  {
> > > > > >   unsigned long freed = 0;
> > > > > > + bool nofs;
> > > > > >
> > > > > >   /* We might recurse into filesystem code, so bail out if necessary */
> > > > > >   if (!(sc->gfp_mask & __GFP_FS))
> > > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > > > >   if (!mutex_trylock(&ashmem_mutex))
> > > > > >           return -1;
> > > > > >
> > > > > > + /* enter filesystem with caution: nonblock on locking */
> > > > > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > > > > + if (!nofs)
> > > > > > +         current->flags |= PF_MEMALLOC_NOFS;
> > > > > > +
> > > > > >   while (!list_empty(&ashmem_lru_list)) {
> > > > > >           struct ashmem_range *range =
> > > > > >                   list_first_entry(&ashmem_lru_list, typeof(*range), lru);
> > > > >
> > > > > I do not think this is an appropriate fix. First of all is this a real
> > > > > deadlock or a lockdep false positive? Is it possible that ashmem just
> > > >
> > > > The warning matters and we can do something to quiesce it.
> > >
> > > The underlying issue should be fixed rather than _something_ done to
> > > silence it.
> > >
> > > > > needs to properly annotate its shmem inodes? Or is it possible that
> > > > > the internal backing shmem file is visible to the userspace so the write
> > > > > path would be possible?
> > > > >
> > > > > If this a real problem then the proper fix would be to set internal
> > > > > shmem mapping's gfp_mask to drop __GFP_FS.
> > > >
> > > > Thanks for the tip, see below.
> > > >
> > > > Can you expand a bit on how it helps direct reclaimers like khugepaged
> > > > in the syzbot report wrt deadlock?
> > >
> > > I do not understand your question.
> > >
> > > > TBH I have difficult time following
> > > > up after staring at the chart below for quite a while.
> > >
> > > Yes, lockdep reports are quite hard to follow and they tend to confuse
> > > one hell out of me. But this one says that there is a reclaim dependency
> > > between the shmem inode lock and the reclaim context.
> > >
> > > > Possible unsafe locking scenario:
> > > >
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(fs_reclaim);
> > > >                                lock(&sb->s_type->i_mutex_key#15);
> > > >                                lock(fs_reclaim);
> > > >
> > > >   lock(&sb->s_type->i_mutex_key#15);
> > >
> > > Please refrain from proposing fixes until the actual problem is
> > > understood. I suspect that this might be just false positive because the
> > > lockdep cannot tell the backing shmem which is internal to ashmem(?)
> > > with any general shmem.

Actually looking some more into this, I think you are right. Ashmem
currently does not redirect writes into the backing shmem and
fallocate call from ashmem_shrink_scan is always performed against
asma->file, which is the backing shmem. IOW writes into the backing
shmem are not supported, therefore this concurrent locking can't
happen.

I'm not sure how we can annotate the fact that the inode_lock in
generic_file_write_iter and in shmem_fallocate always operate on
different inodes. Ideas?

> > >  But somebody really familiar with ashmem code
> > > should have a look I believe.
>
> I believe the deadlock is possible if a write to ashmem fd coincides
> with shrinking of ashmem caches. I just developed a possible fix here
> https://android-review.googlesource.com/c/kernel/common/+/1361205 but
> wanted to test it before posting upstream. The idea is to detect such
> a race between write and cache shrinking operations and let
> ashmem_shrink_scan bail out if the race is detected instead of taking
> inode_lock. AFAIK writing ashmem files is not a usual usage for ashmem
> (standard usage is to mmap it and use as shared memory), therefore
> this bailing out early should not affect ashmem cache maintenance
> much. Besides ashmem_shrink_scan already bails out early if a
> contention on ashmem_mutex is detected, which is a much more probable
> case (see: https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/staging/android/ashmem.c#L497).
>
> I'll test and post the patch here in a day or so if there are no early
> objections to it.
> Thanks!
>
> > >
> > > --
> > > Michal Hocko
> > > SUSE Labs