lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 15 Jun 2013 23:20:40 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Jiang Liu <liuj97@...il.com>
Cc:	Bjorn Helgaas <bhelgaas@...gle.com>,
	Yinghai Lu <yinghai@...nel.org>,
	"Alexander E . Patrakov" <patrakov@...il.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Yijing Wang <wangyijing@...wei.com>,
	linux-acpi@...r.kernel.org, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org, Len Brown <lenb@...nel.org>,
	stable@...r.kernel.org, Jiang Liu <jiang.liu@...wei.com>
Subject: Re: [BUGFIX v2 2/4] ACPI, DOCK: resolve possible deadlock scenarios

On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote:
> On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote:
> > On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote:
> > > On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote:
> > >> This is a preparation for next patch to avoid breaking bisecting.
> > >> If next patch is applied without this one, it will cause deadlock
> > >> as below:
> > >>
> > >> Case 1:
> > >> [   31.015593]  Possible unsafe locking scenario:
> > >>
> > >> [   31.018350]        CPU0                    CPU1
> > >> [   31.019691]        ----                    ----
> > >> [   31.021002]   lock(&dock_station->hp_lock);
> > >> [   31.022327]                                lock(&slot->crit_sect);
> > >> [   31.023650]                                lock(&dock_station->hp_lock);
> > >> [   31.025010]   lock(&slot->crit_sect);
> > >> [   31.026342]
> > >>
> > >> Case 2:
> > >>         hotplug_dock_devices()
> > >>                 mutex_lock(&ds->hp_lock)
> > >>                         dd->ops->handler()
> > >>                                 register_hotplug_dock_device()
> > >>                                         mutex_lock(&ds->hp_lock)
> > >> [   34.316570] [ INFO: possible recursive locking detected ]
> > >> [   34.316573] 3.10.0-rc4 #6 Tainted: G         C
> > >> [   34.316575] ---------------------------------------------
> > >> [   34.316577] kworker/0:0/4 is trying to acquire lock:
> > >> [   34.316579]  (&dock_station->hp_lock){+.+.+.}, at:
> > >> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> > >> [   34.316588]
> > >> but task is already holding lock:
> > >> [   34.316590]  (&dock_station->hp_lock){+.+.+.}, at:
> > >> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> > >> [   34.316595]
> > >> other info that might help us debug this:
> > >> [   34.316597]  Possible unsafe locking scenario:
> > >>
> > >> [   34.316599]        CPU0
> > >> [   34.316601]        ----
> > >> [   34.316602]   lock(&dock_station->hp_lock);
> > >> [   34.316605]   lock(&dock_station->hp_lock);
> > >> [   34.316608]
> > >>  *** DEADLOCK ***
> > >>
> > >> So fix this deadlock by not taking ds->hp_lock in function
> > >> register_hotplug_dock_device(). This patch also fixes a possible
> > >> race conditions in function dock_event() because previously it
> > >> accesses ds->hotplug_devices list without holding ds->hp_lock.
> > >>
> > >> Signed-off-by: Jiang Liu <jiang.liu@...wei.com>
> > >> Cc: Len Brown <lenb@...nel.org>
> > >> Cc: "Rafael J. Wysocki" <rjw@...k.pl>
> > >> Cc: Bjorn Helgaas <bhelgaas@...gle.com>
> > >> Cc: Yinghai Lu <yinghai@...nel.org>
> > >> Cc: Yijing Wang <wangyijing@...wei.com>
> > >> Cc: linux-acpi@...r.kernel.org
> > >> Cc: linux-kernel@...r.kernel.org
> > >> Cc: linux-pci@...r.kernel.org
> > >> Cc: <stable@...r.kernel.org> # 3.8+
> > >> ---
> > >>  drivers/acpi/dock.c                | 109 ++++++++++++++++++++++---------------
> > >>  drivers/pci/hotplug/acpiphp_glue.c |  15 +++++
> > >>  include/acpi/acpi_drivers.h        |   2 +
> > >>  3 files changed, 82 insertions(+), 44 deletions(-)
> > >>
> > >> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> > >> index 02b0563..602bce5 100644
> > >> --- a/drivers/acpi/dock.c
> > >> +++ b/drivers/acpi/dock.c
> > >> @@ -66,7 +66,7 @@ struct dock_station {
> > >>  	spinlock_t dd_lock;
> > >>  	struct mutex hp_lock;
> > >>  	struct list_head dependent_devices;
> > >> -	struct list_head hotplug_devices;
> > >> +	struct klist hotplug_devices;
> > >>
> > >>  	struct list_head sibling;
> > >>  	struct platform_device *dock_device;
> > >> @@ -76,12 +76,18 @@ static int dock_station_count;
> > >>
> > >>  struct dock_dependent_device {
> > >>  	struct list_head list;
> > >> -	struct list_head hotplug_list;
> > >> +	acpi_handle handle;
> > >> +};
> > >> +
> > >> +struct dock_hotplug_info {
> > >> +	struct klist_node node;
> > >>  	acpi_handle handle;
> > >>  	const struct acpi_dock_ops *ops;
> > >>  	void *context;
> > >>  };
> > >
> > > Can we please relax a bit and possibly take a step back?
> > >
> > > So since your last reply to me wasn't particularly helpful, I went through the
> > > code in dock.c and acpiphp_glue.c and I simply think that the whole
> > > hotplug_list thing is simply redundant.
> > >
> > > It looks like instead of using it (or the klist in this patch), we can add a
> > > "hotlpug_device" flag to dock_dependent_device and set that flag instead of
> > > adding dd to hotplug_devices or clear it instead of removing dd from that list.
> > >
> > > That would allow us to avoid the deadlock, because we wouldn't need the hp_lock
> > > any more and perhaps we could make the code simpler instead of making it more
> > > complex.
> > >
> > > How does that sound?
> > >
> > > Rafael
> > Hi Rafael,
> >        Thanks for comments! It would be great if we could kill the 
> > hotplug_devices
> > list so thing gets simple. But there are still some special cases:(
> > 
> > As you have mentioned,  ds->hp_lock is used to make both addition and 
> > removal
> > of hotplug devices wait for us to complete walking ds->hotplug_devices.
> > So it acts as two roles:
> > 1) protect the hotplug_devices list,
> > 2) serialize unregister_hotplug_dock_device() and 
> > hotplug_dock_devices() so
> > the dock driver doesn't access registered handler and associated data 
> > structure
> > once returing from unregister_hotplug_dock_device().
> 
> When it returns from unregister_hotplug_dock_device(), nothing prevents it
> from accessing whatever it wants, because ds->hp_lock is not used outside
> of the add/del and hotplug_dock_devices().  So, the actual role of
> ds->hp_lock (not the one that it is supposed to play, but the real one)
> is to prevent addition/deletion from happening when hotplug_dock_devices()
> is running.  [Yes, it does protect the list, but since the list is in fact
> unnecessary, that doesn't matter.]
> 
> > If we simply use a flag to mark presence of registered callback, we 
> > can't achieve the second goal.
> 
> I don't mean using the flag *alone*.
> 
> > Take the sony laptop as an example. It has several PCI 
> > hotplug
> > slot associated with the dock station:
> > [   28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB
> > [   30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM0
> > [   30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM1
> > [   30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2
> > [   30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA
> > [   30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA
> > [   30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN
> > [   30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD
> > [   30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check 
> > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB
> > 
> > So it still has some race windows if we undock the station while 
> > repeatedly rescanning/removing
> > the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces.

Which sysfs interfaces do you mean, by the way?

If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices()
should always be run under acpi_scan_lock too.  It isn't at the moment,
because write_undock() doesn't take acpi_scan_lock(), but this is an obvious
bug (so I'm going to send a patch to fix it in a while).

With that bug fixed, the possible race between acpi_eject_store() and
hotplug_dock_devices() should be prevented from happening, so perhaps we're
worrying about something that cannot happen?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ