[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160301191906-mutt-send-email-mst@redhat.com>
Date: Tue, 1 Mar 2016 19:20:24 +0200
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Vladimir Davydov <vdavydov@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work
On Tue, Mar 01, 2016 at 06:17:58PM +0100, Michal Hocko wrote:
> On Tue 01-03-16 18:46:38, Michael S. Tsirkin wrote:
> > On Tue, Mar 01, 2016 at 05:35:37PM +0100, Michal Hocko wrote:
> > > On Tue 01-03-16 18:22:32, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 01, 2016 at 05:08:13PM +0100, Michal Hocko wrote:
> > > > > On Tue 01-03-16 17:57:04, Michael S. Tsirkin wrote:
> > > > > > On Tue, Mar 01, 2016 at 04:52:12PM +0100, Michal Hocko wrote:
> > > > > > > [CCing vhost-net maintainer]
> > > > > > >
> > > > > > > On Mon 29-02-16 20:02:09, Vladimir Davydov wrote:
> > > > > > > > An mm_struct may be pinned by a file. An example is vhost-net device
> > > > > > > > created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
> > > > > > > > vhost_dev_set_owner).
> > > > > > >
> > > > > > > The more I think about that the more I am wondering whether this is
> > > > > > > actually OK and correct. Why does the driver have to pin the address
> > > > > > > space? Nothing really prevents from parallel tearing down of the address
> > > > > > > space anyway so the code cannot expect all the vmas to stay. Would it be
> > > > > > > enough to pin the mm_struct only?
> > > > > >
> > > > > > I'll need to research this. It's a fact that as long as the
> > > > > > device is not stopped, vhost can attempt to access
> > > > > > the address space.
> > > > >
> > > > > But does it expect any specific parts of the address space to be mapped?
> > > > > E.g. proc needs to keep the mm allocated as well for some files but it
> > > > > doesn't pin the address space (mm_users) but rather mm_count (see
> > > > > proc_mem_open).
> > > >
> > > > At a quick glance, it seems that it's needed: it calls
> > > > get_user_pages(mm) and that looks like it will not DTRT (or even fail
> > > > gracefully) if mm->mm_users == 0 and exit_mmap/etc was already called
> > > > (or is in progress).
> > >
> > > yes it will fail gracefully
> >
> >
> > What makes get_user_pages fail gracefully in this case,
> > if it races with task exiting?
>
> Sorry, I could have been more verbose... The code would have to make sure
> that the mm is still alive before calling g-u-p by
> atomic_inc_not_zero(&mm->mm_users) and fail if the user count dropped to
> 0 in the mean time. See how fs/proc/task_mmu.c does that (proc_mem_open
> + m_start + m_stop.
>
> The biggest advanatage would be that the mm address space pin would be
> only for the particular operation. Not sure whether that is possible in
> the driver though. Anyway pinning the mm for a potentially unbounded
> amount of time doesn't sound too nice.
> --
> Michal Hocko
> SUSE Labs
Hmm that would be another atomic on data path ...
I'd have to explore that.
--
MST
Powered by blists - more mailing lists