[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180321172932.GE4780@bombadil.infradead.org>
Date: Wed, 21 Mar 2018 10:29:32 -0700
From: Matthew Wilcox <willy@...radead.org>
To: Yang Shi <yang.shi@...ux.alibaba.com>
Cc: Michal Hocko <mhocko@...nel.org>, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section
On Wed, Mar 21, 2018 at 09:31:22AM -0700, Yang Shi wrote:
> On 3/21/18 6:08 AM, Michal Hocko wrote:
> > Yes, this definitely sucks. One way to work that around is to split the
> > unmap to two phases. One to drop all the pages. That would only need
> > mmap_sem for read and then tear down the mapping with the mmap_sem for
> > write. This wouldn't help for parallel mmap_sem writers but those really
> > need a different approach (e.g. the range locking).
>
> page fault might sneak in to map a page which has been unmapped before?
>
> range locking should help a lot on manipulating small sections of a large
> mapping in parallel or multiple small mappings. It may not achieve too much
> for single large mapping.
I don't think we need range locking. What if we do munmap this way:
Take the mmap_sem for write
Find the VMA
If the VMA is large(*)
Mark the VMA as deleted
Drop the mmap_sem
zap all of the entries
Take the mmap_sem
Else
zap all of the entries
Continue finding VMAs
Drop the mmap_sem
Now we need to change everywhere which looks up a VMA to see if it needs
to care the the VMA is deleted (page faults, eg will need to SIGBUS; mmap
does not care; munmap will need to wait for the existing munmap operation
to complete), but it gives us the atomicity, at least on a per-VMA basis.
We could also do:
Take the mmap_sem for write
Mark all VMAs in the range as deleted & modify any partial VMAs
Drop mmap_sem
zap pages from deleted VMAs
That would give us the same atomicity that we have today.
Deleted VMAs would need a pointer to a completion, so operations that
need to wait can queue themselves up. I'd recommend we use the low bit
of vm_file and treat it as a pointer to a struct completion if set.
Powered by blists - more mailing lists