[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170915001939.GA20096@jaegeuk-macbookpro.roam.corp.google.com>
Date: Thu, 14 Sep 2017 17:19:39 -0700
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Al Viro <viro@...IV.linux.org.uk>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: [PATCH] vfs: introduce UMOUNT_WAIT which waits for umount
completion
On 09/14, Jaegeuk Kim wrote:
> On 09/14, Al Viro wrote:
> > On Thu, Sep 14, 2017 at 02:30:17AM +0100, Al Viro wrote:
> > > On Wed, Sep 13, 2017 at 06:10:48PM -0700, Jaegeuk Kim wrote:
> > >
> > > > Android triggers umount(2) by init process, which is definitely not a kernel
> > > > thread. But, we've seen some kernel panics which say umount(2) was succeeded,
> > > > but ext4 triggered a kernel panic due to EIO after then like below. I'm also
> > > > not sure task_work_run() would be also safe enoughly. May I ask where I can
> > > > find sys_umount() calls task_work_run()?
> > >
> > > ret_{fast,slow}_syscall ->
> > > slow_work_pending ->
> > > do_work_pending() ->
> > > tracehook_notify_resume() ->
> > > task_work_run()
> > >
> > > It's not sys_umount() (or any other sys_...()) - it's syscall dispatcher after
> > > having called one of those and before returning to userland. What is guaranteed
> > > is that after successful task_work_add() the damn thing will be run in context
> > > of originating process before it returns from syscall. So any subsequent
> > > syscalls from that process are guaranteed to happen after the work has run.
> > > The same happens if the process exits rather than returns to userland (do_exit() ->
> > > exit_task_work() -> task_work_run()), but for that you would need it to die in
> > > umount(2) (e.g. get kill -9 delivered on the way out).
> > >
> > > Please, check if you are seeing task_work_add() failure in there and if you do,
> > > I would like to see a stack trace. IOW, slap WARN_ON(1); right after
> > > if (!task_work_add(task, &mnt->mnt_rcu, true))
> > > return;
> > > and see what (if anything) gets printed.
> >
> > AFAICS, for task_work_add() to fail here we need a final mntput() to be run
> > in context of a thread that already had exit_signals() run *and* subsequent
> > task_work_run() run to completion (with all pending callbacks executed, along
> > with all callbacks added by those, etc.)
> >
> > For that to have happened during umount(2) we would've needed
> > * killing signal delivered while going through the syscall
> > * final mntput() to have been done *NOT* from sys_umount() (otherwise
> > the work would've been added before we got to exit_signals())
> > * final mntput() to have been done *NOT* from any task_work callbacks
> > (otherwise it would've been added before we'd observed a combination of empty
> > list of pending work with PF_EXITING)
> >
> > I really want to see the stack trace of that failing task_work_add(), if that's
> > what actually happens there. What kind of a reproducer do you have for that?
>
> I've got this error from Android user, so there's no reproducer unfortunately.
> So, I wrote a script capturing WARN_ON after reboot running at every minute, but
> couldn't have got the error since yesterday so far.
Instead, I put more traces in the reboot procedure, and got a clue to suspect
the below flow.
delayed_fput() init
- umount
- mntput()
- mntput_no_expire() - mntput_no_expire()
- mnt_add_count(-1);
- mnt_get_count() return;
- return 0;
- mnt_add_count(-1);
- delayed_mntput_work
- device_shutdown
- ext4_put_super()
- EIO
Does this make any sense?
Thanks,
Powered by blists - more mailing lists