linux-kernel - Re: [REGRESSION]: hibernate/sleep regression w/ bisection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111104164808.GA2015@homer.localdomain>
Date:	Fri, 4 Nov 2011 12:48:08 -0400
From:	Jerome Glisse <j.glisse@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Andrew Watts <akwatts@...il.com>,
	linux-pm@...ts.linux-foundation.org,
	Dmitry Torokhov <dmitry.torokhov@...il.com>,
	linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org
Subject: Re: [REGRESSION]: hibernate/sleep regression w/ bisection

On Fri, Nov 04, 2011 at 09:14:31AM -0700, Tejun Heo wrote:
> (cc'ing David Airlie and dri-devel)
> 
> Hello, the original thread can be read from
> 
>   http://thread.gmane.org/gmane.linux.kernel/1209587
> 
> Full sysrq-t output at
> 
>   http://article.gmane.org/gmane.linux.kernel/1211256
> 
> So, the problem is that after a seemingly unreated update to input
> serio driver (convert to use workqueue), X seems to lock up
> sporadically across suspend/resume cycles.
> 
> I went through the full sysrq-t output but couldn't spot anything
> suspicious w/ anything else.  No worker is stuck and nobody is waiting
> for flush to finish.
> 
> Stack trace for X follows.
> 
> > X               S f499b944  5800  1652   1651 0x00400080
> >  f499b9a8 00003086 00000000 f499b944 c100d4a4 00000000 00000000 f499b958
> >  00000000 f499b9a8 f5173140 d7857c56 00000057 f5173140 d8b69880 00000057
> >  00000001 00000000 f499b9b4 c104dd89 000f4240 00000000 00000000 f499ba68
> > Call Trace:
> >  [<c1291301>] ttm_bo_wait_unreserved+0x5f/0x106
> >  [<c129145f>] ttm_bo_reserve_locked+0xb7/0xe1
> >  [<c1292c27>] ttm_bo_reserve+0x26/0x95
> >  [<c12c3c97>] radeon_crtc_do_set_base+0xbd/0x6d2
> >  [<c12c42e7>] radeon_crtc_set_base+0x1b/0x1d
> >  [<c12c430d>] radeon_crtc_mode_set+0x24/0xdd7
> >  [<c1279c57>] drm_crtc_helper_set_mode+0x32c/0x48b
> >  [<c1279e2f>] drm_helper_resume_force_mode+0x79/0x23e
> >  [<c12ace10>] radeon_gpu_reset+0x84/0x98
> >  [<c12c0838>] radeon_fence_wait+0x2d1/0x311
> >  [<c12c0e37>] radeon_sync_obj_wait+0xc/0xe
> >  [<c12908be>] ttm_bo_wait+0xa1/0x108
> >  [<c12d6e7b>] radeon_gem_wait_idle_ioctl+0x76/0xc4
> >  [<c127e62e>] drm_ioctl+0x1c2/0x42c
> >  [<c10e288e>] do_vfs_ioctl+0x79/0x54b
> >  [<c10e2dcb>] sys_ioctl+0x6b/0x70
> >  [<c1593813>] sysenter_do_call+0x12/0x22
> 
> Do you guys have any ideas what's going on?  It seems to be waiting
> for bo->reserved to go zero.  Is it possible that someone there is
> forgetting to properly kick a work item after resume causing the wait
> to stall?
> 
> Andrew, can you please kill the X server after the hang and see
> whether that brings the system back?  I think sshd should still work
> and if not you can write a script to kill the X server after 30secs
> after resume (and kill that script if resume succeeds).
> 
> Thank you.
> 

Ok so issue is funny, it should happen without the serio change, i guess
this other change make it just more likely. So here is my theory
radeon_gem_wait_idle_ioctl is call on the scanout buffer it reserve this
buffer. It wait for it to go idle, for some reasone the GPU is either
lockup or not yet fully resume or in some other state (see below for
more suposition).

At that point the gpu reset is call, which reset the gpu and then
restore it, to restore it need to reserve the scanout buffer and
bang you stuck. As the scanout buffer is already reserve by the
wait ioctl.

Thing is i don't know what would be a good solution to this, we could
set some flag to say that we are in reset phase and test if scanout
buffer are already reserve not try to reserve them again in the
restore after gpu reset path.


The GPU lockup is weird, can we get a dmesg on resume when the lockup
happen ? I am really not sure what happen here.

Cheers,
Jerome
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/