linux-kernel - Re: [PATCH 2/3] xfs: add kmem_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190822120725.GA1119@dread.disaster.area>
Date:   Thu, 22 Aug 2019 22:07:25 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Christoph Hellwig <hch@...radead.org>,
        linux-xfs@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, penguin-kernel@...ove.SAKURA.ne.jp
Subject: Re: [PATCH 2/3] xfs: add kmem_alloc_io()

On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> On 8/22/19 12:14 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 11:10:57AM +0200, Peter Zijlstra wrote:
> >> 
> >> Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS
> >> into the GFP flags.
> >> 
> >> So are we sure it is broken and needs mending?
> > 
> > Well, that's what we are trying to work out. The problem is that we
> > have code that takes locks and does allocations that is called both
> > above and below the reclaim "lock" context. Once it's been seen
> > below the reclaim lock context, calling it with GFP_KERNEL context
> > above the reclaim lock context throws a deadlock warning.
> > 
> > The only way around that was to mark these allocation sites as
> > GFP_NOFS so lockdep is never allowed to see that recursion through
> > reclaim occur. Even though it isn't a deadlock vector.
> > 
> > What we're looking at is whether PF_MEMALLOC_NOFS changes this - I
> > don't think it does solve this problem. i.e. if we define the
> > allocation as GFP_KERNEL and then use PF_MEMALLOC_NOFS where reclaim
> > is not allowed, we still have GFP_KERNEL allocations in code above
> > reclaim that has also been seen below relcaim. And so we'll get
> > false positive warnings again.
> 
> If I understand both you and the code directly, the code sites won't call
> __fs_reclaim_acquire when called with current->flags including PF_MEMALLOC_NOFS.
> So that would mean they "won't be seen below the reclaim" and all would be fine,
> right?

No, the problem is this (using kmalloc as a general term for
allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)

   some random kernel code
    kmalloc(GFP_KERNEL)
     reclaim
     PF_MEMALLOC
     shrink_slab
      xfs_inode_shrink
       XFS_ILOCK
        xfs_buf_allocate_memory()
         kmalloc(GFP_KERNEL)

And so locks on inodes in reclaim are seen below reclaim. Then
somewhere else we have:

   some high level read-only xfs code like readdir
    XFS_ILOCK
     xfs_buf_allocate_memory()
      kmalloc(GFP_KERNEL)
       reclaim

And this one throws false positive lockdep warnings because we
called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
context. So the only solution we had at the tiem to shut it up was:

   some high level read-only xfs code like readdir
    XFS_ILOCK
     xfs_buf_allocate_memory()
      kmalloc(GFP_NOFS)

So that lockdep sees it's not going to recurse into reclaim and
doesn't throw a warning...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com