lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Jun 2016 01:17:12 +0000
From:	"Zheng, Lv" <lv.zheng@...el.com>
To:	Benjamin Tissoires <benjamin.tissoires@...il.com>
CC:	"Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	"Brown, Len" <len.brown@...el.com>, Lv Zheng <zetalog@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
	"Bastien Nocera:" <hadess@...ess.net>,
	"Valdis.Kletnieks@...edu" <Valdis.Kletnieks@...edu>
Subject: RE: [PATCH v2 3/3] ACPI / button: Send "open" state after
 boot/resume

Hi,

> From: Benjamin Tissoires [mailto:benjamin.tissoires@...il.com]
> Subject: Re: [PATCH v2 3/3] ACPI / button: Send "open" state after
> boot/resume
> 
> Hi Lv,
> 
> On Tue, May 31, 2016 at 4:55 AM, Zheng, Lv <lv.zheng@...el.com> wrote:
> > Hi,
> >
> >> From: Benjamin Tissoires [mailto:benjamin.tissoires@...il.com]
> >> Subject: Re: [PATCH v2 3/3] ACPI / button: Send "open" state after
> >> boot/resume
> >>
> >> On Fri, May 27, 2016 at 9:16 AM, Lv Zheng <lv.zheng@...el.com>
> wrote:
> [snipped
> ]>> As Valdis replied on 0/3, I don't think this is a good solution (even
> >> temporary). Linux should not assume the current state of a input
> >> device, and sending unconditionally 1 here is wrong. If the device is
> >> on a docking station, you will wake up the wrong monitor and screw
> the
> >> user session (and this will be a regression).
> > [Lv Zheng]
> > We are doing the test to see how this behaves on several different
> platforms.
> >
> >>
> >> How about we simply send the current LID state stored in the ACPI?
> >> something like calling acpi_lid_send_state() directly?
> > [Lv Zheng]
> > This is what we are going to eliminate in [PATCH 01].
> > We have several real bugs related to sending a wrong state to the
> userspace.
> > Userspace will suspend right after resume because of the 'close' state.
> 
> On the other hand, you are trying to remove 23de5d9ef2a4bbc4f733f, a
> patch that has been around for 9 years and we only start seeing
> devices where this logic is not working...
> 
> I am not saying your approach is wrong, I am just saying that instead
> of a plain revert, we should probably be more conservative and add a
> quirk for those buggy machines. Ideally, we should try to understand
> why there is such an issue that Windows doesn't have (the solution
> might just be that given Windows doesn't care, we are screwed).
> 
> BTW, on the Surface 3, there is a WMI
> (f7cc25ec-d20b-404c-8903-0ed4359c18ae -> WQHE) which returns the
> actual value of the LID, without using PNP0C0D at all. I have a
> feeling that Windows might use it when it is in trouble or in an
> unsure state. I couldn't find this WMI on the 2 other systems so that
> may also be just a one shot for the Surface 3.
[Lv Zheng] 
Thanks for the information.
But it seems Surface 3 is not a good example for this.
It is a runtime idle platform.
And the root cause of the Surface 3 issue should be in the freeze code.
After waking the system up via LID irq, the irq is dropped.
But I guess it is risky to invoke irq handler right there, doing so could break may existing drivers.


> 
> [snipped]
> > [Lv Zheng]
> > The understanding here is incorrect.
> > We have 3 bogus devices.
> > 1 of them is surface 3 which is a hardware reduced platform.
> > The others are all traditional platforms.
> >
> > =====
> > The facts are:
> >
> > Both the platforms return cached lid state from _LID.
> > The cached value will be updated by lid irq (via GPIO IRQ, GPE, or EC
> event).
> > AML tables will send lid notification in the irq handler.
> >
> > Some AML tables will update the cached value in _WAK (I'll describe why
> it is necessary below).
> > But updating the cached value in _WAK is not guaranteed by all AML
> tables.
> >
> > For the 'close' state irq, all tables will send lid close notification.
> > For the 'open' state irq, it seems there are tables never sending lid open
> notification (sounds like Windows do not care about lid open).
> > =====
> >
> > Surface 3 is entirely a different case.
> > It is a runtime idle system and hardware reduced.
> > On that kind of system, lid open is handled by OS not by BIOS.
> > Surface 3 is exactly the platform that doesn't send lid open notification.
> > I guess the AML is intentionally written in this way to be compliant to
> the traditional platforms.
> >
> > While on the traditional platforms:
> > When lid is opened, BIOS handles the lid irq and wakes the system from
> the FACS waking vector.
> > So it is likely that there is no lid open irq after the system is resumed.
> > BIOS may forget to update the cached lid value in the _WAK or some
> other control methods that could be executed after resuming.
> > Then if we send _LID result to the user space, the cached value could
> apparently be 'close'.
> >
> > That explains why there is no "lid open" configuration in the "Windows
> Device Manager".
> >
> >>
> >> I propose as a workaround to enable a kthread that will monitor the
> >> lid state and update the correct value to userspace (5 sec of polling
> >> time should be enough given that systemd checks every 20 sec).
> >> We should probably have this workaround only for a set of known
> >> devices, as it might just be temporary for those until the actual
> >> underlying problem is fixed (wrong DSDT in the Surface 3 case that
> >> doesn't notify at all, issue in the EC for the Surface Pro 1 and the
> >> Samsung N210).
> > [Lv Zheng]
> > That cannot help to solve the issue/gap.
> >
> > The problem is Linux userspace has a facility re-checking lid state when
> lid state is "close".
> > If it failed to update the lid state to "open" for a timeout period, it would
> suspend the platform.
> >
> > We actually are recommending Linus userspace to introduce a new lid
> handling mode to only handle "lid close" event and ignore "lid open" event.
> > During the period this is not changed from the userspace, we have to
> send "open" after resume/boot in order not to trigger the timeout
> mechanism.
> >
> 
> I hear your points and I couldn't agree more that the only solution
> for us is (sadly) to mimic Windows.
> 
> I am just disagreeing on the method used here: I'd rather have a
> fallback mechanism where the lid is not sent to "one" at each boot
> resume because this will mess us people using docking stations. The
> problem is that Red Hat has a lot of them, and I foresee a lot of
> complains to me if this is merged in its current state.
> 
> We could have a parameter (say "send_lid_open_on_resume" or
> "use_bios_lid_value") in acpi/button.c which would allow people to
> choose if they want the "new" behavior or the old one. And we could
> also add some DMI matching for the messed up laptops/tablets where we
> force one or the other quirk. When we agree that user space now
> behaves gently with us, we could then remove entirely the quirk and
> the dmi matching.
> 
> How does that sound?
[Lv Zheng] 
The choices are in my first revision.
I could restore it back and re-send this series.
Also I need to update PATCH 02 to eliminate wrong IS_ERR_VALUE().

> 
> Also, on the topic of the .notify not being sent by some BIOS, I have
> seen something similar on the surface 3.
> 
> The ASL for the LID status change is the following:
> ---
>             Method (_E4C, 0, Serialized)  // _Exx: Edge-Triggered GPE
>             {
>                 If ((HELD == One))
>                 {
>                     ^^LID.LIDB = One
>                 }
>                 Else
>                 {
>                     ^^LID.LIDB = Zero
>                     Notify (LID, 0x80) // Status Change
>                 }
> 
>                 Notify (^^PCI0.SPI1.NTRG, One) // Device Check
>             }
> ---
> 
> I noticed I received the hotplug event on the touchscreen when the LID
> state changed, and tried to initiate the notify on the LID acpi device
> from within the touchscreen driver.
> This works well, except that from time to time, on resume, I don't
> even receive the hotplug notification. It is as if the notification is
> lost while I receive it 80% of the time.
[Lv Zheng] 
Probably this is because of the same reason as mentioned before.
I still think Surface 3 is a different example.

On those traditional platforms, we can see that:
1. There are platforms returning hardware state directly from _LID
2. There are platforms returning cached state from _LID and update it via lid irq, EC._REG, _WAK
3. There are platforms returning cached state from _LID and update it via lid irq

The fact is there are simply so many type 3 platforms.
You might be thinking there are only several such kind of bogus devices.
But actually our team has been working on customizing those platforms, enhancing their EC._REG/_WAK for years in order to make the AML tables working on Linux.
This looks like an endless work to me...

> 
> This might be completely unrelated, but I think it is worth mentioning
> because there might be a common root at the state not being updated
> correctly.
> 
> Cheers, and sorry for being annoying :)
[Lv Zheng] 
I didn't see anything annoying. :)

Cheers
-Lv

> Benjamin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ