[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190815201323.GU21596@ziepe.ca>
Date: Thu, 15 Aug 2019 17:13:23 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Michal Hocko <mhocko@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
DRI Development <dri-devel@...ts.freedesktop.org>,
Intel Graphics Development <intel-gfx@...ts.freedesktop.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Christian König <christian.koenig@....com>,
Jérôme Glisse <jglisse@...hat.com>,
Masahiro Yamada <yamada.masahiro@...ionext.com>,
Wei Wang <wvw@...gle.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Jann Horn <jannh@...gle.com>, Feng Tang <feng.tang@...el.com>,
Kees Cook <keescook@...omium.org>,
Randy Dunlap <rdunlap@...radead.org>,
Daniel Vetter <daniel.vetter@...el.com>
Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end()
On Thu, Aug 15, 2019 at 09:35:26PM +0200, Michal Hocko wrote:
> > The last detail is I'm still unclear what a GFP flags a blockable
> > invalidate_range_start() should use. Is GFP_KERNEL OK?
>
> I hope I will not make this muddy again ;)
> invalidate_range_start in the blockable mode can use/depend on any sleepable
> allocation allowed in the context it is called from.
'in the context is is called from' is the magic phrase, as
invalidate_range_start is called while holding several different mm
related locks. I know at least write mmap_sem and i_mmap_rwsem
(write?)
Can GFP_KERNEL be called while holding those locks?
This is the question of indirect dependency on reclaim via locks you
raised earlier.
> So in other words it is no different from any other function in the
> kernel that calls into allocator. As the API is missing gfp context
> then I hope it is not called from any restricted contexts (except
> from the oom which we have !blockable for).
Yes, the callers are exactly my concern.
> > Lockdep has
> > complained on that in past due to fs_reclaim - how do you know if it
> > is a false positive?
>
> I would have to see the specific lockdep splat.
See below. I found it when trying to understand why the registration
of the mmu notififer was so oddly coded.
The situation was:
down_write(&mm->mmap_sem);
mm_take_all_locks(mm);
kmalloc(GFP_KERNEL); <--- lockdep warning
I understood Daniel said he saw this directly on a recent kernel when
working with his lockdep patch?
Checking myself, on todays kernel I see a call chain:
shrink_all_memory
fs_reclaim_acquire(sc.gfp_mask);
[..]
do_try_to_free_pages
shrink_zones
shrink_node
shrink_node_memcg
shrink_list
shrink_active_list
page_referenced
rmap_walk
rmap_walk_file
i_mmap_lock_read
down_read(i_mmap_rwsem)
So it is possible that the down_read() above will block on
i_mmap_rwsem being held in the caller of invalidate_range_start which
is doing kmalloc(GPF_KERNEL).
Is this OK? The lockdep annotation says no..
Jason
commit 35cfa2b0b491c37e23527822bf365610dbb188e5
Author: Gavin Shan <shangw@...ux.vnet.ibm.com>
Date: Thu Oct 25 13:38:01 2012 -0700
mm/mmu_notifier: allocate mmu_notifier in advance
While allocating mmu_notifier with parameter GFP_KERNEL, swap would start
to work in case of tight available memory. Eventually, that would lead to
a deadlock while the swap deamon swaps anonymous pages. It was caused by
commit e0f3c3f78da29b ("mm/mmu_notifier: init notifier if necessary").
=================================
[ INFO: inconsistent lock state ]
3.7.0-rc1+ #518 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/35 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&mapping->i_mmap_mutex){+.+.?.}, at: page_referenced+0x9c/0x2e0
{RECLAIM_FS-ON-W} state was registered at:
mark_held_locks+0x86/0x150
lockdep_trace_alloc+0x67/0xc0
kmem_cache_alloc_trace+0x33/0x230
do_mmu_notifier_register+0x87/0x180
mmu_notifier_register+0x13/0x20
kvm_dev_ioctl+0x428/0x510
do_vfs_ioctl+0x98/0x570
sys_ioctl+0x91/0xb0
system_call_fastpath+0x16/0x1b
Powered by blists - more mailing lists