lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <300a59beb03d140424acdc9a0b1a9ecd2fc402e2.1496221776.git.lv.zheng@intel.com>
Date:   Wed, 31 May 2017 17:41:58 +0800
From:   Lv Zheng <lv.zheng@...el.com>
To:     "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Len Brown <len.brown@...el.com>
Cc:     Lv Zheng <lv.zheng@...el.com>, Lv Zheng <zetalog@...il.com>,
        linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
        systemd-devel@...ts.freedesktop.org,
        Benjamin Tissoires <benjamin.tissoires@...hat.com>,
        Peter Hutterer <peter.hutterer@...-t.net>
Subject: [RFC PATCH v4 2/5] ACPI: button: Extends complement switch event support for all modes

Surface Pro 3 is a typical platform where suspend/resume loop problem
can be seen.

The problem is due to a systemd 229 bug:
1. "ignore": always can trigger endless suspend/resume loop
2. "open": sometimes suspend/resume loop can be stopped
3. "method": always can trigger endless susped/resume loop
The buggy systemd unexpectedly waits for an explicit "open" event after
boot/resume or it will suspends. However even when kernel can send a
faked "open" to it, its state machine is still wrong, systemd may not
respond "close" events arrived after "open" or may suddenly suspend
without seeing any instant events.

Recent systemd 233 has fixed this issue:
1. "ignore": everything works fine;
2. "open": no suspend/resume cycle, but sometimes cannot suspend the
           platform again after the first resume;
3. "method": no suspend/resume cycle, but always cannot suspend the
             platform again after the first resume.
The conclusion is: for suspend/resume cycle issue, "ignore" mode fixes
everything, but current "method" mode is still buggy.
The differences are due to button driver only implements complement switch
events for "ignore" mode. Without complement switch events, firmware
triggered "close" cannot be delivered to userspace (confirmed by
evemu-record).

The root cause of the lid state issues is the variation of the platform
firmware implementations:
1. Some platforms send "open" events to OS and the events arrive before
   button driver is resumed;
2. Some platforms send "open" events to OS, but the events arrive after
   button driver is resumed, ex., Samsung N210+;
3. Some platforms never send "open" events to OS, but send "open" events to
   update the cached _LID return value, and the update events arrive before
   button driver is resumed;
4. Some platforms never send "open" events to OS, but send "open" events to
   update the cached _LID return value, but the update events arrive after
   button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send "open" events, _LID returns value sticks to
   "close", ex., Surface Pro 1.

Let's check the docking external display issues (see links below):
1. For case 1, both "method"/"ignore" modes can work correctly;
2. For case 2/4/5, both "method"/"ignore" modes cannot work correctly;
3. For case 3, "method" can work correctly while "ignore" mode cannot.
The conclusion is: for docking external display issue, though the issue
still needs graphics layer (graphics drivers or desktop managers) to be
improved to ensure no breakages for case 2/4/5 platforms, there is a case
where "method" mode plays better.

Thus ACPI subsystem has been pushed to revert back to "method" mode due to
regression rule and case 3 (platforms reported on the links should all be
case 3 platforms), and libinput developers have volunteered to help to
provide workarounds when graphics layer is not fixed or systemd is not
updated.

Thus this patch extends the complement switch event support to other modes
using new indication: generating complement switch event for BIOS notified
"close". So that when button driver is reverted back to "method" mode, it
won't act worse than "ignore" mode on fixed systemd.

Tested with systemd 233, all modes worked fine (no suspend/resume loop and
can suspend any times) after applying this patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
      https://bugzilla.redhat.com/show_bug.cgi?id=1430259

Cc: <systemd-devel@...ts.freedesktop.org>
Cc: Benjamin Tissoires <benjamin.tissoires@...hat.com>
Cc: Peter Hutterer <peter.hutterer@...-t.net>
Signed-off-by: Lv Zheng <lv.zheng@...el.com>
---
 drivers/acpi/button.c | 116 +++++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 59 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 725a15a..36485cf 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -108,6 +108,7 @@ struct acpi_button {
 	unsigned long pushed;
 	int last_state;
 	ktime_t last_time;
+	bool last_is_bios;
 	bool suspended;
 };
 
@@ -144,78 +145,71 @@ static int acpi_lid_notify_state(struct acpi_device *device,
 	struct acpi_button *button = acpi_driver_data(device);
 	int ret;
 	ktime_t next_report;
-	bool do_update;
 
 	/*
-	 * In lid_init_state=ignore mode, if user opens/closes lid
-	 * frequently with "open" missing, and "last_time" is also updated
-	 * frequently, "close" cannot be delivered to the userspace.
-	 * So "last_time" is only updated after a timeout or an actual
-	 * switch.
+	 * Ignore frequently replayed switch events.
+	 *
+	 * AML tables can put Notify(LID, xxx) in a notification method,
+	 * and handling the hardware events by executing the entry methods
+	 * (ex., _Qxx) may cause the notification method to be invoked
+	 * several times.
+	 * This check doesn't apply to the faked events because if a BIOS
+	 * notification comes after a faked event, it must pass this check
+	 * in order to be reliablely delivered to user space.
 	 */
-	if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
-	    button->last_state != !!state)
-		do_update = true;
-	else
-		do_update = false;
-
 	next_report = ktime_add(button->last_time,
 				ms_to_ktime(lid_report_interval));
-	if (button->last_state == !!state &&
-	    ktime_after(ktime_get(), next_report)) {
+	if (button->last_is_bios && button->last_state == !!state &&
+	    !ktime_after(ktime_get(), next_report))
+		return 0;
+
+	/*
+	 * Send the unreliable complement switch event:
+	 *
+	 * On most platforms, the lid device is reliable. However there are
+	 * exceptions:
+	 * 1. Platforms returning initial lid state as "close" by default
+	 *    after booting/resuming:
+	 *     https://bugzilla.kernel.org/show_bug.cgi?id=89211
+	 *     https://bugzilla.kernel.org/show_bug.cgi?id=106151
+	 * 2. Platforms never reporting "open" events:
+	 *     https://bugzilla.kernel.org/show_bug.cgi?id=106941
+	 * On these buggy platforms, the usage model of the ACPI lid device
+	 * actually is:
+	 * 1. The initial returning value of _LID may not be reliable.
+	 * 2. The open event may not be reliable.
+	 * 3. The close event is reliable.
+	 *
+	 * But SW_LID is typed as input switch event, the input layer
+	 * checks if the event is redundant. Hence if the state is not
+	 * switched, the userspace cannot see this platform triggered
+	 * reliable event. By inserting a complement switch event, it then
+	 * is guaranteed that the platform triggered reliable one can
+	 * always be seen by the userspace.
+	 */
+	if (button->last_state == !!state && is_bios_event) {
 		/* Complain the buggy firmware */
 		pr_warn_once("The lid device is not compliant to SW_LID.\n");
 
 		/*
-		 * Send the unreliable complement switch event:
-		 *
-		 * On most platforms, the lid device is reliable. However
-		 * there are exceptions:
-		 * 1. Platforms returning initial lid state as "close" by
-		 *    default after booting/resuming:
-		 *     https://bugzilla.kernel.org/show_bug.cgi?id=89211
-		 *     https://bugzilla.kernel.org/show_bug.cgi?id=106151
-		 * 2. Platforms never reporting "open" events:
-		 *     https://bugzilla.kernel.org/show_bug.cgi?id=106941
-		 * On these buggy platforms, the usage model of the ACPI
-		 * lid device actually is:
-		 * 1. The initial returning value of _LID may not be
-		 *    reliable.
-		 * 2. The open event may not be reliable.
-		 * 3. The close event is reliable.
-		 *
-		 * But SW_LID is typed as input switch event, the input
-		 * layer checks if the event is redundant. Hence if the
-		 * state is not switched, the userspace cannot see this
-		 * platform triggered reliable event. By inserting a
-		 * complement switch event, it then is guaranteed that the
-		 * platform triggered reliable one can always be seen by
-		 * the userspace.
+		 * Do not generate complement switch event for "open"
+		 * events - faking "close" events can trigger unexpected
+		 * behaviors.
+		 * Thus only generate complement switch event for BIOS
+		 * notified "close".
 		 */
-		if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
-			do_update = true;
-			/*
-			 * Do generate complement switch event for "close"
-			 * as "close" is reliable and wrong "open" won't
-			 * trigger unexpected behaviors.
-			 * Do not generate complement switch event for
-			 * "open" as "open" is not reliable and wrong
-			 * "close" will trigger unexpected behaviors.
-			 */
-			if (!state) {
-				input_report_switch(button->input,
-						    SW_LID, state);
-				input_sync(button->input);
-			}
+		if (!state) {
+			input_report_switch(button->input, SW_LID, state);
+			input_sync(button->input);
 		}
 	}
+
 	/* Send the platform triggered reliable event */
-	if (do_update) {
-		input_report_switch(button->input, SW_LID, !state);
-		input_sync(button->input);
-		button->last_state = !!state;
-		button->last_time = ktime_get();
-	}
+	input_report_switch(button->input, SW_LID, !state);
+	input_sync(button->input);
+	button->last_state = !!state;
+	button->last_time = ktime_get();
+	button->last_is_bios = is_bios_event;
 
 	if (state)
 		pm_wakeup_hard_event(&device->dev);
@@ -444,6 +438,8 @@ static int acpi_button_resume(struct device *dev)
 	struct acpi_button *button = acpi_driver_data(device);
 
 	button->suspended = false;
+	/* ignore replay frequency check between suspend/resume */
+	button->last_is_bios = false;
 	if (button->type == ACPI_BUTTON_TYPE_LID)
 		acpi_lid_initialize_state(device);
 	return 0;
@@ -492,6 +488,8 @@ static int acpi_button_add(struct acpi_device *device)
 			ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
 		button->last_state = !!acpi_lid_evaluate_state(device);
 		button->last_time = ktime_get();
+		/* ignore replay frequency check after boot */
+		button->last_is_bios = false;
 	} else {
 		printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
 		error = -ENODEV;
-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ