lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <569FEEDE.4060409@gmail.com>
Date:	Wed, 20 Jan 2016 21:32:30 +0100
From:	Mario Kleiner <mario.kleiner.de@...il.com>
To:	Vlastimil Babka <vbabka@...e.cz>,
	Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc:	Alex Deucher <alexander.deucher@....com>,
	Christian König <christian.koenig@....com>,
	Daniel Vetter <daniel.vetter@...ll.ch>, mgraesslin@....org,
	David Airlie <airlied@...ux.ie>,
	dri-devel@...ts.freedesktop.org,
	LKML <linux-kernel@...r.kernel.org>, kwin@....org
Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with
 radeon

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
>
> No, test mode seems to be fine.
>
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
>
> Attached. Captured by having kdm running, switching to console, running
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
> frozen splashscreen, switch back, terminate dmesg. So somewhere around
> the middle there should be where ksplashscreen starts...
>
>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>
> No such warnings there.
>
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
>

Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.

I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.

-mario

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ