lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <486b1f53-afd3-d789-0a2e-aa50daad71a2@ti.com>
Date:   Mon, 19 Dec 2016 14:15:47 +0200
From:   Peter Ujfalusi <peter.ujfalusi@...com>
To:     Jani Nikula <jani.nikula@...ux.intel.com>,
        Daniel Vetter <daniel@...ll.ch>
CC:     <daniel.vetter@...ll.ch>, <linux-kernel@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <tomi.valkeinen@...com>,
        "Tahvanainen, Jari" <jari.tahvanainen@...el.com>
Subject: Re: [PATCH v2] drm: drm_probe_helper: Fix output_poll_work scheduling

On 12/19/2016 11:54 AM, Jani Nikula wrote:
> On Wed, 31 Aug 2016, Daniel Vetter <daniel@...ll.ch> wrote:
>> On Wed, Aug 31, 2016 at 02:09:05PM +0300, Peter Ujfalusi wrote:
>>> drm_kms_helper_poll_enable_locked() should check if we have delayed event
>>> pending and if we have, schedule the work to run without delay.
>>>
>>> Currently the output_poll_work is only scheduled if any of the connectors
>>> have DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT with
>>> DRM_OUTPUT_POLL_PERIOD delay. It does not matter if we have delayed event
>>> already registered to be handled. The detection will be delayd by
>>> DRM_OUTPUT_POLL_PERIOD in any case.
>>> Furthermore if none of the connectors are marked as POLL_CONNECT or
>>> POLL_DISCONNECT because all connectors are either POLL_HPD or they are
>>> always connected: the output_poll_work will not run at all even if we
>>> have delayed event marked.
>>>
>>> When none of the connectors require polling, their initial status change
>>> from unknown to connected/disconnected is not going to be handled until
>>> the first kms application starts or if we have fb console enabled.
>>>
>>> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@...com>
>>> ---
>>> Hi,
>>>
>>> Changes since v1:
>>> - dropped the last paragraph from the commit message.
>>
>> I added a few more words to the commit message to explain when exactly
>> this is a problem and applied your patch to drm-misc.
> 
> Hi Peter, sadly looks like this regresses users out there [1]. Seems to
> be a reliable bisect. We need to have this fixed or reverted.

When I sent the patch I did booted my laptop (with Intel Corporation Haswell-ULT
Integrated Graphics Controller (rev 0b)) and my old desktop with some nVidia Quadro
using nouveau stack and seen no issue.

But with 4.9 now on my Dell laptop I see lots of warnings during boot:
[    0.573119] ------------[ cut here ]------------
[    0.573127] WARNING: CPU: 3 PID: 874 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210
[    0.573132] WARN_ON(!msg->buffer != !msg->size)
[    0.573134] Modules linked in:
[    0.573140] CPU: 3 PID: 874 Comm: kworker/u8:3 Not tainted 4.9.0-gentoo #1
[    0.573143] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015
[    0.573150] Workqueue: events_unbound async_run_entry_fn
[    0.573154]  ffffc900021afaf0 ffffffff813c3bad ffffc900021afb40 0000000000000000
[    0.573160]  ffffc900021afb30 ffffffff8106a596 0000042614b82908 ffffc900021afc00
[    0.573166]  ffff8802144040e0 0000000000000003 0000000000000000 ffff880214404158
[    0.573172] Call Trace:
[    0.573178]  [<ffffffff813c3bad>] dump_stack+0x4f/0x72
[    0.573182]  [<ffffffff8106a596>] __warn+0xc6/0xe0
[    0.573185]  [<ffffffff8106a5fa>] warn_slowpath_fmt+0x4a/0x50
[    0.573189]  [<ffffffff815926e9>] ? intel_dp_aux_transfer+0xc9/0x210
[    0.573193]  [<ffffffff815927f5>] intel_dp_aux_transfer+0x1d5/0x210
[    0.573198]  [<ffffffff817d3e53>] ? _raw_write_unlock_irqrestore+0x13/0x30
[    0.573202]  [<ffffffff817d3e79>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[    0.573207]  [<ffffffff814b9e48>] drm_dp_dpcd_access+0x58/0xf0
[    0.573210]  [<ffffffff814b9ef6>] drm_dp_dpcd_write+0x16/0x20
[    0.573214]  [<ffffffff8158d8c3>] intel_dp_start_link_train+0x2b3/0x4a0
[    0.573218]  [<ffffffff8158ecd2>] intel_dp_check_link_status+0xb2/0xf0
[    0.573222]  [<ffffffff81593586>] intel_dp_detect+0x7d6/0xb40
[    0.573226]  [<ffffffff814bb1db>] drm_helper_probe_single_connector_modes+0x41b/0x4e0
[    0.573233]  [<ffffffff814c8d7c>] drm_fb_helper_initial_config+0x7c/0x3f0
[    0.573237]  [<ffffffff817d3e39>] ? _raw_spin_unlock_irq+0x9/0x10
[    0.573242]  [<ffffffff815858b3>] intel_fbdev_initial_config+0x13/0x30
[    0.573245]  [<ffffffff8108b832>] async_run_entry_fn+0x32/0xe0
[    0.573249]  [<ffffffff81083208>] process_one_work+0x148/0x4c0
[    0.573253]  [<ffffffff810835c3>] worker_thread+0x43/0x4e0
[    0.573257]  [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[    0.573260]  [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[    0.573264]  [<ffffffff8107f3c7>] ? call_usermodehelper_exec_async+0x137/0x140
[    0.573269]  [<ffffffff81088a45>] kthread+0xc5/0xe0
[    0.573273]  [<ffffffff81088980>] ? kthread_park+0x60/0x60
[    0.573277]  [<ffffffff8107f290>] ? umh_complete+0x40/0x40
[    0.573280]  [<ffffffff817d44f2>] ret_from_fork+0x22/0x30
[    0.573285] ---[ end trace a544f5d689389b41 ]---

If I revert the patch in question I have tons of:

[    0.569127] ------------[ cut here ]------------
[    0.569136] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210
[    0.569141] WARN_ON(!msg->buffer != !msg->size)
[    0.569143] Modules linked in:
[    0.569149] CPU: 3 PID: 564 Comm: kworker/3:2 Not tainted 4.9.0-gentoo #2
[    0.569152] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015
[    0.569159] Workqueue: events i915_hotplug_work_func
[    0.569164]  ffffc90001a93ba0 ffffffff813c3bad ffffc90001a93bf0 0000000000000000
[    0.569174]  ffffc90001a93be0 ffffffff8106a596 0000042614b12908 ffffc90001a93cb0
[    0.569184]  ffff88021482b0e0 0000000000000003 0000000000000000 ffff88021482b158
[    0.569191] Call Trace:
[    0.569198]  [<ffffffff813c3bad>] dump_stack+0x4f/0x72
[    0.569205]  [<ffffffff8106a596>] __warn+0xc6/0xe0
[    0.569210]  [<ffffffff8106a5fa>] warn_slowpath_fmt+0x4a/0x50
[    0.569214]  [<ffffffff815926c9>] ? intel_dp_aux_transfer+0xc9/0x210
[    0.569218]  [<ffffffff815927d5>] intel_dp_aux_transfer+0x1d5/0x210
[    0.569223]  [<ffffffff817d3e53>] ? _raw_write_unlock_irqrestore+0x13/0x30
[    0.569224] loop: module loaded
[    0.569233]  [<ffffffff817d3e79>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[    0.569241]  [<ffffffff814b9e48>] drm_dp_dpcd_access+0x58/0xf0
[    0.569248]  [<ffffffff814b9ef6>] drm_dp_dpcd_write+0x16/0x20
[    0.569254]  [<ffffffff8158d8a3>] intel_dp_start_link_train+0x2b3/0x4a0
[    0.569261]  [<ffffffff8158ecb2>] intel_dp_check_link_status+0xb2/0xf0
[    0.569268]  [<ffffffff81593566>] intel_dp_detect+0x7d6/0xb40
[    0.569275]  [<ffffffff8157c739>] i915_hotplug_work_func+0x1d9/0x2a0
[    0.569283]  [<ffffffff81083208>] process_one_work+0x148/0x4c0
[    0.569290]  [<ffffffff810835c3>] worker_thread+0x43/0x4e0
[    0.569296]  [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[    0.569302]  [<ffffffff81083580>] ? process_one_work+0x4c0/0x4c0
[    0.569309]  [<ffffffff81088a45>] kthread+0xc5/0xe0
[    0.569316]  [<ffffffff81088980>] ? kthread_park+0x60/0x60
[    0.569322]  [<ffffffff817d44f2>] ret_from_fork+0x22/0x30
[    0.569330] ---[ end trace bc0abba135d2cb5d ]---

So the behaviour is changed, no question about it. But I still believe that
the patch itself fixes a valid bug (or shortcoming). atm I don't see
how it can lock the kernel.
The only thing the patch does is to schedule the output_poll_work w/o delay
if we already have delayed_event to reduce the time to handle pending
event(s) when the poll is enabled for the first time.
>From the log it is not clear for me why Xorg would lock up, but we have
intel and nouveau drivers probed, it might be because we have two GPUs?

> 
> BR,
> Jani.
> 
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=98690
> 
> 
> 
>>
>> Thanks, Daniel
>>>
>>> Regards,
>>> Peter
>>>
>>>  drivers/gpu/drm/drm_probe_helper.c | 8 +++++++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
>>> index a0df377d7d1c..f6b64d7d3528 100644
>>> --- a/drivers/gpu/drm/drm_probe_helper.c
>>> +++ b/drivers/gpu/drm/drm_probe_helper.c
>>> @@ -129,6 +129,7 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev)
>>>  {
>>>  	bool poll = false;
>>>  	struct drm_connector *connector;
>>> +	unsigned long delay = DRM_OUTPUT_POLL_PERIOD;
>>>
>>>  	WARN_ON(!mutex_is_locked(&dev->mode_config.mutex));
>>>
>>> @@ -141,8 +142,13 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev)
>>>  			poll = true;
>>>  	}
>>>
>>> +	if (dev->mode_config.delayed_event) {
>>> +		poll = true;
>>> +		delay = 0;
>>> +	}
>>> +
>>>  	if (poll)
>>> -		schedule_delayed_work(&dev->mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD);
>>> +		schedule_delayed_work(&dev->mode_config.output_poll_work, delay);
>>>  }
>>>  EXPORT_SYMBOL(drm_kms_helper_poll_enable_locked);
>>>
>>> --
>>> 2.9.3
>>>
> 


-- 
Péter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ