lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <201006212244.09466.rjw@sisk.pl>
Date:	Mon, 21 Jun 2010 22:44:09 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Jerome Glisse <glisse@...edesktop.org>
Cc:	Dave Airlie <airlied@...hat.com>, linux-kernel@...r.kernel.org,
	"dri-devel" <dri-devel@...ts.freedesktop.org>,
	linux-pm@...ts.linux-foundation.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] DRM / radeon / PM: Do not evict VRAM during freeze phase of hibernation

On Monday, June 21, 2010, Jerome Glisse wrote:
> On Sun, Jun 20, 2010 at 01:43:05AM +0200, Rafael J. Wysocki wrote:
> > On Saturday, June 19, 2010, Rafael J. Wysocki wrote:
> > > On Saturday, June 19, 2010, Dave Airlie wrote:
> > > > On Sat, 2010-06-19 at 01:23 +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, June 18, 2010, Dave Airlie wrote:
> > > > > > On Fri, 2010-06-18 at 22:21 +0200, Rafael J. Wysocki wrote:
> > > > > > > From: Rafael J. Wysocki <rjw@...k.pl>
> > > > > > > 
> > > > > > > I have recently noticed a 55 sec. delay during the "device freeze"
> > > > > > > phase of hibernation on my test-bed HP nx6325.  Due to the 100%
> > > > > > > reproducibility of it I was able to narrow it down to
> > > > > > > radeon_suspend_kms() and then it turned out that the delay occured
> > > > > > > somewhere in radeon_bo_evict_vram().  However, it doesn't seem really
> > > > > > > necessary or even very useful to me to evict VRAM at this particular
> > > > > > > point, because we're going to create an image and bring the device
> > > > > > > back to the fully functional state in a little while.  Thus, I think
> > > > > > > the VRAM evicition can be skipped for state.event == PM_EVENT_FREEZE,
> > > > > > > which makes the delay go away.
> > > > > > 
> > > > > > I'm not 100% sure of the hibernate sequencing and its early in the
> > > > > > morning, but we want to evict VRAM before image building so we can have
> > > > > > the contents of VRAM in the image so we can restore them on resume. Does
> > > > > > this just avoid evicting them a second time after we created the image?
> > > > > 
> > > > > No, it's the first time, before creating the image, but I didn't seen any
> > > > > difference on resume with and without the patch, so I thought it was a good
> > > > > idea. :-)
> > > > 
> > > > On the machine you have its most likely not going to show up unless you
> > > > are running a 3D app or something across suspend, since currently X
> > > > re-exposes most apps on VT switch, so they just redraw.
> > > 
> > > Yes.  Moreover, hibernation is always done after a VT switch.  That's why
> > > I said I thought the eviction wasn't necessary in the changelog.
> > > 
> > > BTW, I have three different test boxes with radeon hardware and the
> > > $subject patch is not a problem on any of them.
> > > 
> > > > Was it always this slow?
> > > 
> > > Nope.  It definitely is a regression, although I'm not sure what's the last
> > > good kernel.
> > > 
> > > > you can see how many objects are in vram using
> > > > debugfs (/sys/kernel/debug/dri/0/radeon_vram_mm), it sounds like the TTM
> > > > eviction process is blocking on something,
> > 
> > I did some more debug work (the _total_ lack of comments inside of the
> > relevant radeon and ttm code makes this a next-to-impossible task, though)
> > and found that all of the delays (up to 5 seconds) happen inside of
> > ttm_bo_move_accel_cleanup() called from radeon_move_blit(), where the "new"
> > memory type is TTL_PL_TT and the "old" one is TTL_PL_VRAM.  The preceding
> > radeon_copy() always returns 0.
> > 
> > Please let me know if you need more information.
> > 
> > Thanks,
> > Rafael
> 
> Can you confirm that this is trigger by first radeon_bo_evict_vram in
> radeon_suspend_kms() ?

Not really.

I used the attached debug patch and I got the attached dmesg output from
a "core" hibernate test.

It looks like the first one is relatively sane (71 usecs), but things get worse
going forward.

> Also can you check if irq is enabled (put some
> debug in the irq handler of your gpu). My guess is that irq are stop
> (likely stop before radeon suspend callback)

No, interrupts are not switched off at this point yet.  At least not
permanently.

> and that we endup waiting that the fence timeout expire in radeon_fence_wait().

I guess something like this happens, although I'm not sure about the root
cause.

It looks like it interferes with something happening in parallel with it.
I wonder, however, why it is a problem for hibernation and it's not a problem
for suspend to RAM and why the other machines are not affected.

Rafael

View attachment "drm-ttm-debug-list-cleaning.patch" of type "text/x-patch" (1585 bytes)

View attachment "nx6325-dmesg.log" of type "text/x-log" (224187 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ