lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5230B673.6090800@tuebingen.mpg.de>
Date:	Wed, 11 Sep 2013 20:29:07 +0200
From:	Mario Kleiner <mario.kleiner@...bingen.mpg.de>
To:	Peter Hurley <peter@...leysoftware.com>
CC:	Steven Rostedt <rostedt@...dmis.org>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Clark Williams <williams@...hat.com>,
	Dave Airlie <airlied@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Mario Kleiner <mario.kleiner@...bingen.mpg.de>
Subject: Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

On 11.09.13 17:16, Peter Hurley wrote:
> On 09/11/2013 09:26 AM, Steven Rostedt wrote:
>> On Wed, 11 Sep 2013 07:28:09 -0300
>> "Luis Claudio R. Goncalves" <lclaudio@...g.org> wrote:
>>
>>> Hello,
>>>
>>> I saw two different occurrences of "BUG: sleeping function called from
>>> invalid context" happening on 3.10.10-rt7. The first one happened on
>>> drm_vblank_get() -> i915_get_vblank_timestamp() and was flooding dmesg
>>> after Xorg started. I silenced that with a hackish patch, just for
>>> fun, and
>>> found a second problem, this time on tty_ldisc_reinit().
>>>
>>> The logs:
>>>
>>> [   23.973991] BUG: sleeping function called from invalid context at
>>> /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
>>> [   23.973992] in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg
>>> [   23.973993] 2 locks held by Xorg/517:
>>> [   23.973993]  #0:
>>> [   23.973994]  (
>>> [   23.973994] &dev->vbl_lock
>>> [   23.973995] ){......}
>>> [   23.973995] , at:
>>> [   23.974007] [<ffffffffa0024c60>] drm_vblank_get+0x30/0x2b0 [drm]
>>> [   23.974008]  #1:
>>> [   23.974008]  (
>>> [   23.974008] &dev->vblank_time_lock
>>> [   23.974008] ){......}
>>> [   23.974009] , at:
>>> [   23.974013] [<ffffffffa0024ce1>] drm_vblank_get+0xb1/0x2b0 [drm]
>>> [   23.974013] Preemption disabled at:
>>> [   23.974021] [<ffffffffa008bc95>]
>>> i915_get_vblank_timestamp+0x45/0xa0 [i915]
>>> [   23.974023] CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5
>>> [   23.974024] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
>>> 02/05/2013
>>> [   23.974024]  0000000000070008
>>> [   23.974025]  ffff88017a08bae0
>>> [   23.974025]  ffffffff8164b790
>>> [   23.974025]  ffff88017a08baf8
>>> [   23.974026]  ffffffff8107e62f
>>> [   23.974026]  ffff880179840040
>>> [   23.974026]  ffff88017a08bb18
>>> [   23.974027]  ffffffff81651ac4
>>> [   23.974027]  0000000000000002
>>> [   23.974027]  ffff880179840000
>>> [   23.974028]  ffff88017a08bb48
>>> [   23.974028]  ffffffffa0084e67
>>> [   23.974028] Call Trace:
>>> [   23.974033]  [<ffffffff8164b790>] dump_stack+0x19/0x1b
>>> [   23.974035]  [<ffffffff8107e62f>] __might_sleep+0xff/0x170
>>> [   23.974037]  [<ffffffff81651ac4>] rt_spin_lock+0x24/0x60
>>> [   23.974043]  [<ffffffffa0084e67>] i915_read32+0x27/0x170 [i915]
>>> [   23.974048]  [<ffffffffa008a591>] i915_pipe_enabled+0x31/0x40 [i915]
>>> [   23.974054]  [<ffffffffa008a6be>]
>>> i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915]
>>> [   23.974056]  [<ffffffff81087025>] ? sched_clock_cpu+0xb5/0x100
>>> [   23.974062]  [<ffffffffa00245d4>]
>>> drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm]
>>> [   23.974068]  [<ffffffffa008bc95>]
>>> i915_get_vblank_timestamp+0x45/0xa0 [i915]
>>> [   23.974073]  [<ffffffffa0024998>]
>>> drm_get_last_vbltimestamp+0x48/0x70 [drm]
>>> [   23.974078]  [<ffffffffa0024db5>] drm_vblank_get+0x185/0x2b0 [drm]
>>> [   23.974079]  [<ffffffff81087025>] ? sched_clock_cpu+0xb5/0x100
>>> [   23.974085]  [<ffffffffa0025d03>] drm_wait_vblank+0x83/0x5d0 [drm]
>>> [   23.974086]  [<ffffffff81087025>] ? sched_clock_cpu+0xb5/0x100
>>> [   23.974088]  [<ffffffff810af143>] ?
>>> __lock_acquire.isra.31+0x273/0xa70
>>> [   23.974093]  [<ffffffffa00212a2>] drm_ioctl+0x552/0x6a0 [drm]
>>> [   23.974096]  [<ffffffff8128d886>] ? avc_has_perm_flags+0x126/0x1c0
>>> [   23.974098]  [<ffffffff8128d788>] ? avc_has_perm_flags+0x28/0x1c0
>>> [   23.974099]  [<ffffffff810ad49e>] ? put_lock_stats.isra.23+0xe/0x40
>>> [   23.974101]  [<ffffffff811a0095>] do_vfs_ioctl+0x325/0x5b0
>>> [   23.974103]  [<ffffffff8128fc3e>] ? file_has_perm+0x8e/0xa0
>>> [   23.974105]  [<ffffffff811a03a1>] SyS_ioctl+0x81/0xa0
>>> [   23.974107]  [<ffffffff8165a342>] tracesys+0xdd/0xe2
>>>
>>>
>>> and
>>>
>>>
>>> [   25.423675] BUG: sleeping function called from invalid context at
>>> /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
>>> [   25.423676] in_atomic(): 1, irqs_disabled(): 1, pid: 188, name:
>>> plymouthd
>>> [   25.423676] 3 locks held by plymouthd/188:
>>> [   25.423682]  #0:  (&tty->legacy_mutex){......}, at:
>>> [<ffffffff816528d0>] tty_lock_nested+0x40/0x90
>>> [   25.423686]  #1:  (&tty->ldisc_mutex){......}, at:
>>> [<ffffffff8139b482>] tty_ldisc_hangup+0x152/0x300
>>> [   25.423688]  #2:  (tty_ldisc_lock){......}, at:
>>> [<ffffffff8139a9c2>] tty_ldisc_reinit+0x72/0x130
>>> [   25.423689] Preemption disabled at:[<ffffffff8139a9c2>]
>>> tty_ldisc_reinit+0x72/0x130
>>> [   25.423691] CPU: 2 PID: 188 Comm: plymouthd Not tainted
>>> 3.10.10-rt7+ #6
>>> [   25.423692] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
>>> 02/05/2013
>>> [   25.423694]  005ff00000000000 ffff8801788ada68 ffffffff8164b790
>>> ffff8801788ada80
>>> [   25.423695]  ffffffff8107e62f ffff8801991ce1a0 ffff8801788adaa0
>>> ffffffff81651ac4
>>> [   25.423696]  0000000000000000 ffffea0005e26680 ffff8801788adaf8
>>> ffffffff81130984
>>> [   25.423696] Call Trace:
>>> [   25.423700]  [<ffffffff8164b790>] dump_stack+0x19/0x1b
>>> [   25.423702]  [<ffffffff8107e62f>] __might_sleep+0xff/0x170
>>> [   25.423704]  [<ffffffff81651ac4>] rt_spin_lock+0x24/0x60
>>> [   25.423707]  [<ffffffff81130984>] free_hot_cold_page+0xb4/0x3c0
>>> [   25.423710]  [<ffffffff81178209>]
>>> ?unfreeze_partials.isra.42+0x229/0x2b0
>>> [   25.423711]  [<ffffffff81130dc7>] __free_pages+0x47/0x70
>>> [   25.423713]  [<ffffffff81130fb2>] __free_memcg_kmem_pages+0x22/0x50
>>> [   25.423714]  [<ffffffff81177528>] __free_slab+0xe8/0x1e0
>>> [   25.423716]  [<ffffffff81177654>] free_delayed+0x34/0x50
>>> [   25.423717]  [<ffffffff81649633>] __slab_free+0x273/0x36b
>>> [   25.423719]  [<ffffffff810acde9>] ? get_lock_stats+0x19/0x60
>>> [   25.423721]  [<ffffffff810ad49e>] ? put_lock_stats.isra.23+0xe/0x40
>>> [   25.423722]  [<ffffffff81178794>] kfree+0x1c4/0x210
>>> [   25.423724]  [<ffffffff8139a9f5>] ? tty_ldisc_reinit+0xa5/0x130
>>> [   25.423725]  [<ffffffff8139a9f5>] tty_ldisc_reinit+0xa5/0x130
>>> [   25.423726]  [<ffffffff8139b49f>] tty_ldisc_hangup+0x16f/0x300
>>> [   25.423728]  [<ffffffff81082e1d>] ? get_parent_ip+0xd/0x50
>>> [   25.423731]  [<ffffffff8104d736>] ? unpin_current_cpu+0x16/0x70
>>> [   25.423732]  [<ffffffff81392136>] __tty_hangup+0x346/0x460
>>> [   25.423733]  [<ffffffff81392260>] tty_vhangup+0x10/0x20
>>> [   25.423735]  [<ffffffff8139d6e1>] pty_close+0x131/0x180
>>> [   25.423736]  [<ffffffff813936ad>] tty_release+0x11d/0x5f0
>>> [   25.423737]  [<ffffffff810acde9>] ? get_lock_stats+0x19/0x60
>>> [   25.423747]  [<ffffffff81009d84>] ? native_sched_clock+0x24/0x80
>>> [   25.423749]  [<ffffffff81087025>] ? sched_clock_cpu+0xb5/0x100
>>> [   25.423750]  [<ffffffff810acde9>] ? get_lock_stats+0x19/0x60
>>> [   25.423752]  [<ffffffff8118eb45>] __fput+0xf5/0x250
>>> [   25.423762]  [<ffffffff8118ed6e>] ____fput+0xe/0x10
>>> [   25.423764]  [<ffffffff8106f1cc>] task_work_run+0xac/0xe0
>>> [   25.423765]  [<ffffffff81002a11>] do_notify_resume+0x91/0xf0
>>> [   25.423767]  [<ffffffff8165a41c>] int_signal+0x12/0x17
>>>
>>>
>>> About the hackish patch I used to silence the constant i915 complaints,
>>> even though the comments on drm_calc_vbltimestamp_from_scanoutpos()
>>> state
>>> the preempt_disable is there for a reason, I removed it for the RT
>>> case. I
>>> really wanted to confirm if there was anything else after that
>>> complaint.
>>
>> The funny part is, there's a comment there that shows that this was
>> done even for "PREEMPT_RT". Unfortunately, the call to
>> "get_scanout_position()" can call functions that use the rt-mutex
>> "sleeping spin locks" and it breaks there.
>>
>> I guess we need to ask the authors of the mainline patch exactly why
>> that preempt_disable() is needed?
>
> The drm core associates a timestamp with each vertical blank frame #.
> Drm drivers can optionally support a 'high resolution' hw timestamp.
> The vblank frame #/timestamp tuple is user-space visible.
>
> The i915 drm driver supports a hw timestamp via this drm helper function
> which computes the timestamp from the crtc scan position (based on the
> pixel clock).
>
> For mainline, the preempt_disable/_enable() isn't actually necessary
> because every call tree that leads here already has preemption disabled.
>
> For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?
>

Hi

Just to give some context why i wrote that routine the way it is written:

The preempt_disable() / _enable() there is intended to try to make sure 
that the etime = ktime_get() query and get_crtc_scanoutpos() query 
happen as closely together in time as possible, because time delays 
between those queries directly translate into noise/uncertainty in the 
calculated vblank timestamps.

Those timestamps are now used by userspace apps like desktop compositors 
or media players for smooth animation and audio-video sync, but the main 
reason i implemented those patches was to support scientific 
neuro-science research applications which need very precise ( << 100 
usecs, ideally < 20 usecs ) vblank timestamps (see 
<https://github.com/Psychtoolbox-3/Psychtoolbox-3/blob/master/Psychtoolbox/PsychDocumentation/ECVP2010Poster_VisualTimingPrecision.pdf?raw=true> 
for more background). The code checks how big the uncertainty is and 
retries the query up to three times if the delay is greater than 20 
microseconds by default.

The retry is meant as a last resort measure for isolated unavoidable 
outliers, e.g., interruption by NMI irq's or maybe system management 
interrupts. We can only retry a few times, because the code is also 
called from the vblank irq handler every video refresh cycle 
(drm_handle_vblank()), and we can't report failure to userspace due to a 
client api (OML_sync_control and INTEL_swap_events GLX extension specs 
at www.opengl.org) which didn't anticipate error cases -- essentially 
reporting failure was not an option.

As far as the typical timing sensitive scientific applications go, those 
users who need this vblank timestamping to be the most precise and 
robust are also the same users who will likely need to install a 
lowlatency or even realtime kernel, so for this to be useful, that 
routine should be really well protected against preemption on a rt kernel.

That said, maybe preempt_disable is no longer the optimal choice there 
and there's some better way to achieve good protection against 
interruptions of that bit of code? My knowledge here is a bit rusty, and 
the intel kms drivers and rt stuff has changed quite a bit.

thanks,
-mario


> Regards,
> Peter Hurley
>
>
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index f92da0a..bd2e8e2 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -631,7 +631,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct
>>> drm_device *dev, int crtc,
>>>           /* Disable preemption to make it very likely to
>>>            * succeed in the first iteration even on PREEMPT_RT kernel.
>>>            */
>>> -        preempt_disable();
>>> +        preempt_disable_nort();
>>>
>>>           /* Get system timestamp before query. */
>>>           stime = ktime_get();
>>> @@ -644,7 +644,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct
>>> drm_device *dev, int crtc,
>>>           if (!drm_timestamp_monotonic)
>>>               mono_time_offset = ktime_get_monotonic_offset();
>>>
>>> -        preempt_enable();
>>> +        preempt_enable_nort();
>>>
>>>           /* Return as no-op if scanout query unsupported or failed. */
>>>           if (!(vbl_status & DRM_SCANOUTPOS_VALID)) {
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ