linux-kernel - Re: [PATCH] DRM / radeon / PM: Do not evict VRAM during freeze phase of hibernation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100621075741.GB2350@barney.localdomain>
Date:	Mon, 21 Jun 2010 09:57:41 +0200
From:	Jerome Glisse <glisse@...edesktop.org>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Dave Airlie <airlied@...hat.com>, linux-kernel@...r.kernel.org,
	dri-devel <dri-devel@...ts.freedesktop.org>,
	linux-pm@...ts.linux-foundation.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] DRM / radeon / PM: Do not evict VRAM during freeze phase
 of hibernation

On Sun, Jun 20, 2010 at 01:43:05AM +0200, Rafael J. Wysocki wrote:
> On Saturday, June 19, 2010, Rafael J. Wysocki wrote:
> > On Saturday, June 19, 2010, Dave Airlie wrote:
> > > On Sat, 2010-06-19 at 01:23 +0200, Rafael J. Wysocki wrote:
> > > > On Friday, June 18, 2010, Dave Airlie wrote:
> > > > > On Fri, 2010-06-18 at 22:21 +0200, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rjw@...k.pl>
> > > > > > 
> > > > > > I have recently noticed a 55 sec. delay during the "device freeze"
> > > > > > phase of hibernation on my test-bed HP nx6325.  Due to the 100%
> > > > > > reproducibility of it I was able to narrow it down to
> > > > > > radeon_suspend_kms() and then it turned out that the delay occured
> > > > > > somewhere in radeon_bo_evict_vram().  However, it doesn't seem really
> > > > > > necessary or even very useful to me to evict VRAM at this particular
> > > > > > point, because we're going to create an image and bring the device
> > > > > > back to the fully functional state in a little while.  Thus, I think
> > > > > > the VRAM evicition can be skipped for state.event == PM_EVENT_FREEZE,
> > > > > > which makes the delay go away.
> > > > > 
> > > > > I'm not 100% sure of the hibernate sequencing and its early in the
> > > > > morning, but we want to evict VRAM before image building so we can have
> > > > > the contents of VRAM in the image so we can restore them on resume. Does
> > > > > this just avoid evicting them a second time after we created the image?
> > > > 
> > > > No, it's the first time, before creating the image, but I didn't seen any
> > > > difference on resume with and without the patch, so I thought it was a good
> > > > idea. :-)
> > > 
> > > On the machine you have its most likely not going to show up unless you
> > > are running a 3D app or something across suspend, since currently X
> > > re-exposes most apps on VT switch, so they just redraw.
> > 
> > Yes.  Moreover, hibernation is always done after a VT switch.  That's why
> > I said I thought the eviction wasn't necessary in the changelog.
> > 
> > BTW, I have three different test boxes with radeon hardware and the
> > $subject patch is not a problem on any of them.
> > 
> > > Was it always this slow?
> > 
> > Nope.  It definitely is a regression, although I'm not sure what's the last
> > good kernel.
> > 
> > > you can see how many objects are in vram using
> > > debugfs (/sys/kernel/debug/dri/0/radeon_vram_mm), it sounds like the TTM
> > > eviction process is blocking on something,
> 
> I did some more debug work (the _total_ lack of comments inside of the
> relevant radeon and ttm code makes this a next-to-impossible task, though)
> and found that all of the delays (up to 5 seconds) happen inside of
> ttm_bo_move_accel_cleanup() called from radeon_move_blit(), where the "new"
> memory type is TTL_PL_TT and the "old" one is TTL_PL_VRAM.  The preceding
> radeon_copy() always returns 0.
> 
> Please let me know if you need more information.
> 
> Thanks,
> Rafael

Can you confirm that this is trigger by first radeon_bo_evict_vram in
radeon_suspend_kms() ? Also can you check if irq is enabled (put some
debug in the irq handler of your gpu). My guess is that irq are stop
(likely stop before radeon suspend callback) and that we endup waiting
that the fence timeout expire in radeon_fence_wait().

Cheers,
Jerome
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/