[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171031131333.pr2ophwd2bsvxc3l@dhcp22.suse.cz>
Date: Tue, 31 Oct 2017 14:13:33 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Byungchul Park <byungchul.park@....com>,
Dmitry Vyukov <dvyukov@...gle.com>,
syzbot
<bot+e7353c7141ff7cbb718e4c888a14fa92de41ebaa@...kaller.appspotmail.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Johannes Weiner <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
jglisse@...hat.com, LKML <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org, shli@...com, syzkaller-bugs@...glegroups.com,
Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>, ying.huang@...el.com,
kernel-team@....com
Subject: Re: possible deadlock in lru_add_drain_all
On Mon 30-10-17 16:10:09, Peter Zijlstra wrote:
> On Mon, Oct 30, 2017 at 07:09:21PM +0900, Byungchul Park wrote:
> > On Mon, Oct 30, 2017 at 09:22:03AM +0100, Michal Hocko wrote:
> > > [Cc Byungchul. The original full report is
> > > http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com]
> > >
> > > Could you have a look please? This smells like a false positive to me.
> >
> > +cc peterz@...radead.org
> >
> > Hello,
> >
> > IMHO, the false positive was caused by the lockdep_map of 'cpuhp_state'
> > which couldn't distinguish between cpu-up and cpu-down.
> >
> > And it was solved with the following commit by Peter and Thomas:
> >
> > 5f4b55e10645b7371322c800a5ec745cab487a6c
> > smp/hotplug: Differentiate the AP-work lockdep class between up and down
> >
> > Therefore, we can avoid the false positive on later than the commit.
> >
> > Peter and Thomas, could you confirm it?
>
> I can indeed confirm it's running old code; cpuhp_state is no more.
Does this mean the below chain is no longer possible with the current
linux-next (tip)?
> However, that splat translates like:
>
> __cpuhp_setup_state()
> #0 cpus_read_lock()
> __cpuhp_setup_state_cpuslocked()
> #1 mutex_lock(&cpuhp_state_mutex)
>
>
>
> __cpuhp_state_add_instance()
> #2 mutex_lock(&cpuhp_state_mutex)
this should be #1 right?
> cpuhp_issue_call()
> cpuhp_invoke_ap_callback()
> #3 wait_for_completion()
>
> msr_device_create()
> ...
> #4 filename_create()
> #3 complete()
>
>
>
> do_splice()
> #4 file_start_write()
> do_splice_from()
> iter_file_splice_write()
> #5 pipe_lock()
> vfs_iter_write()
> ...
> #6 inode_lock()
>
>
>
> sys_fcntl()
> do_fcntl()
> shmem_fcntl()
> #5 inode_lock()
> shmem_wait_for_pins()
> if (!scan)
> lru_add_drain_all()
> #0 cpus_read_lock()
>
>
>
> Which is an actual real deadlock, there is no mixing of up and down.
thanks a lot, this made it more clear to me. It took a while to
actually see 0 -> 1 -> 3 -> 4 -> 5 -> 0 cycle. I have only focused
on lru_add_drain_all while it was holding the cpus lock.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists