lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160125132310.GS23290@intel.com>
Date:	Mon, 25 Jan 2016 15:23:10 +0200
From:	Ville Syrjälä <ville.syrjala@...ux.intel.com>
To:	Mario Kleiner <mario.kleiner.de@...il.com>
Cc:	Michel Dänzer <michel@...nzer.net>,
	Alex Deucher <alexander.deucher@....com>,
	Vlastimil Babka <vbabka@...e.cz>,
	LKML <linux-kernel@...r.kernel.org>,
	dri-devel@...ts.freedesktop.org,
	Christian König <christian.koenig@....com>
Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with
 radeon

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> > On 23.01.2016 00:18, Ville Syrjälä wrote:
> >> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>
> >>> [ Trimming KDE folks from Cc ]
> >>>
> >>> On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>
> >>>>>> Can you please point me at the vblank on/off jump bug please?
> >>>>>
> >>>>> AFAIR I originally reported it in response to
> >>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>> , but I can't find that in the archives, so maybe that was just on IRC.
> >>>>> See
> >>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>> . Basically, I ran into the bug fixed by your patch because the counter
> >>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>> just a few days.
> >>>>
> >>>> Ok, so just uncovered the overflow bug.
> >>>
> >>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>> counter jumping bug (similar to the bug this thread is about), which
> >>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>> to happen when turning off the CRTC:
> >>>
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>
> >> Not sure what bug we're talking about here, but here the hw counter
> >> clearly jumps backwards.
> >>
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>
> >> Same here.
> >
> > At least one of the jumps is expected, because this is around turning
> > off the CRTC for DPMS off. Don't know yet why there are two jumps back
> > though.
> >
> >
> >> These things just don't happen on i915 because drm_vblank_off() and
> >> drm_vblank_on() are always called around the times when the hw counter
> >> might get reset. Or at least that's how it should be.
> >
> > Which is of course the idea of Daniel's patch (which is what I'm getting
> > the above with) or Mario's patch as well, but clearly something's still
> > wrong. It's certainly possible that it's something in the driver, but
> > since calling drm_vblank_pre/post_modeset from the same places seems to
> > work fine (ignoring the regression discussed in this thread)... Do
> > drm_vblank_on/off require something else to handle this correctly?
> >
> >
> 
> I suspect it is because vblank_disable_and_save calls 
> drm_update_vblank_count() unconditionally, even if vblank irqs are 
> already off.
> 
> So on a manual display disable -> reenable you get something like
> 
> At disable:
> 
> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
> final count.
> 
> Then the crtc is shut down and its hw counter resets to zero.
> 
> At reenable:
> 
> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
> drm_vblank_off -> vblank_disable_and_save -> A pointless 
> drm_update_vblank_count() while the hw counter is already reset to zero 
> --> Unwanted counter jump.
> 
> 
> The problem doesn't happen on a pure modeset to a different video 
> resolution/refresh rate, as then we only have one call into 
> atombios_crtc_dpms(DPMS_OFF).
> 
> I think the fix is to fix vblank_disable_and_save() to only call 
> drm_update_vblank_count() if vblank irqs get actually disabled, not on 
> no-op calls. I will try that now.

It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.

> 
> Otherwise kms drivers would have to be careful to never call 
> drm_vblank_off multiple times before calling drm_vblank_on, but the help 
> text to drm_vblank_on() claims that unbalanced calls to these functions 
> are perfectly fine.
> 
> -mario
> 
> 
> 
> 
> 
> 
> 

-- 
Ville Syrjälä
Intel OTC

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ