lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56A677C6.4050909@gmail.com>
Date:	Mon, 25 Jan 2016 20:30:14 +0100
From:	Mario Kleiner <mario.kleiner.de@...il.com>
To:	Ville Syrjälä <ville.syrjala@...ux.intel.com>,
	Michel Dänzer <michel@...nzer.net>,
	Alex Deucher <alexander.deucher@....com>,
	Vlastimil Babka <vbabka@...e.cz>,
	LKML <linux-kernel@...r.kernel.org>,
	dri-devel@...ts.freedesktop.org,
	Christian König <christian.koenig@....com>
Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with
 radeon



On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>> Readding Daniel, which somehow got dropped from the cc.
>>
>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>
>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>
>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>
>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>>>> See
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>>>> just a few days.
>>>>>>>>>>
>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>
>>>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>
>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>> clearly jumps backwards.
>>>>>>>>
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>>>
>>>>>>>> Same here.
>>>>>>>
>>>>>>> At least one of the jumps is expected, because this is around turning
>>>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>>>> might get reset. Or at least that's how it should be.
>>>>>>>
>>>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I suspect it is because vblank_disable_and_save calls
>>>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>>>> already off.
>>>>>>
>>>>>> So on a manual display disable -> reenable you get something like
>>>>>>
>>>>>> At disable:
>>>>>>
>>>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>>>> final count.
>>>>>>
>>>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>>>
>>>>>> At reenable:
>>>>>>
>>>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>>>> --> Unwanted counter jump.
>>>>>>
>>>>>>
>>>>>> The problem doesn't happen on a pure modeset to a different video
>>>>>> resolution/refresh rate, as then we only have one call into
>>>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>>>
>>>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>>>> no-op calls. I will try that now.
>>>>>
>>>>> It does that on purpose. Otherwise the vblank counter would appear to
>>>>> have stalled while the interrupt was off.
>>>>>
>>>>
>>>> Ok, that's what the comments there say, although i don't see atm. why
>>>> that perceived stall would be a big problem. I checked all callers of
>>>> vblank_disable_and_save(). They are all careful to not call that
>>>> function if vblanks are already disabled. The only exception is
>>>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>>>> drivers which have resetting hw counters or other problematic behaviour
>>>> during modesets etc. then this will break. E.g., calling the vblank
>>>> timestamping stuff is also not safe/well-defined during modesets when
>>>> the timestamping constants are not (yet) updated to reflect the new mode
>>>> timing of the modeset in progress.
>>>
>>> The idea is to maintain the appearance that the counter ticks all the
>>> time as long as the crtc is active. While that may not be really
>>> required in case if no one is currently interested in the vblank
>>> counter, I think it's a nice thing to have just to make the behaviour
>>> of the counter consistent.
>>>
>>> As far as calling drm_vblank_off() after the hw counter got reset, well,
>>> that not correct. It should be called before the reset.
>>
>> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
>> first call to DMPS_OFF will call drm_vblank_off() and really disable
>> vblank-irqs if they were running, updating the counts/ts a last time. But
>> then the dpms off will reset the hw counter to zero. When one reenables the
>> display, a second call to DPMS_OFF will now call drm_vblank_off again when
>> it apparently shouldn't.
>>
>> I just tested this patch, which fixes the counter jumps on radeon-kms with
>> my or Daniel's drm_vblank_off patches to radeon:
>
> This might be due to the legacy helpers, which just love to redundantly
> disable stuff that's off already. The problem I see with no-oping these
> out is that for atomic drivers (which really should get this right) this
> might paper over bugs: E.g. when you forget to call _off() when disabling
> the crtc, then calling _on() twice in a row is indeed a serious bug.
> Similar when you forget to call _on() and have multiple _off() calls in a
> row.
>
> So not sure what to do here.
> -Daniel
>

Yes, the legacy helpers cause two calls to dpms off if one disables a 
display. First during display disable as intended. Then when one 
reenables the display during modesetting as part of 
crtc_funcs->prepare() - at least on radeon.

Maybe the minimum thing that would help is to just check for 
vblank->inmodeset in drm_vblank_off(). If that would be the case we'd 
know it is a redundant call and could no-op it and do a 
WARN_ON(vblank->inmodeset)?

drm_vblank_on() i don't know how to treat, but that one calls 
drm_reset_vblank_timestamp() which should be less problematic if called 
redundantly.

Now the patch i want to try next to fix the drm_vblank_pre/post_modeset 
regression in Linux 4.4/4.5 is to add a ...

if ((diff > 1) && vblank->inmodeset) diff = 1;

... to the bottom of drm_update_vblank_count(). That should hopefully 
restore the pre/post_modeset behavior as close to the original behavior 
as possible. As a side effect it would also prevent the counter jump 
caused by redundant calls to drm_vblank_off().

-mario

>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 607f493..d739d93 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned
>> int pipe)
>>          spin_lock_irqsave(&dev->event_lock, irqflags);
>>
>>          spin_lock(&dev->vbl_lock);
>> -       vblank_disable_and_save(dev, pipe);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>> +       if (vblank->enabled)
>> +               vblank_disable_and_save(dev, pipe);
>>          wake_up(&vblank->queue);
>>
>>          /*
>> @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned
>> int pipe)
>>                  return;
>>
>>          spin_lock_irqsave(&dev->vbl_lock, irqflags);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>>          /* Drop our private "prevent drm_vblank_get" refcount */
>>          if (vblank->inmodeset) {
>>                  atomic_dec(&vblank->refcount);
>>
>>
>>
>> Another, maybe better, approach might be to no-op redundant calls to
>> drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to
>> drm_vblank_on() iff !vblank->inmodeset.
>>
>> -mario
>>
>>
>>>
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> Otherwise kms drivers would have to be careful to never call
>>>>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>>>>>> text to drm_vblank_on() claims that unbalanced calls to these functions
>>>>>> are perfectly fine.
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ