[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <s5hshrrytot.wl-tiwai@suse.de>
Date: Thu, 20 Oct 2016 16:35:30 +0200
From: Takashi Iwai <tiwai@...e.de>
To: Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc: dri-devel@...ts.freedesktop.org,
Daniel Vetter <daniel.vetter@...ll.ch>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater
On Thu, 20 Oct 2016 16:17:25 +0200,
Ville Syrjälä wrote:
>
> On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote:
> > On Thu, 20 Oct 2016 15:28:14 +0200,
> > Ville Syrjälä wrote:
> > >
> > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote:
> > > > Since 4.7 kernel, we've seen the error messages like
> > > >
> > > > kernel: [TTM] Buffer eviction failed
> > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
> > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> > > >
> > > > on QXL when switching and accessing on VT. The culprit was the generic
> > > > deferred_io code (qxl driver switched to it since 4.7). There is a
> > > > race between the dirty clip update and the call of callback.
> > > >
> > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
> > > > while it kicks off the update worker outside the spinlock. Meanwhile
> > > > the update worker clears the dirty clip in the spinlock, too. Thus,
> > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is
> > > > called after the clip is cleared in the first worker call.
> > >
> > > Why does that matter? The first worker should have done all the
> > > necessary work already, no?
> >
> > Before the first call, it clears the clip and passes the copied clip
> > to the callback. Then the second call will be with the cleared and
> > untouched clip, i.e. with x1=~0. This confuses
> > qxl_framebuffer_dirty().
> >
> > Of course, we can filter out in the callback side by checking the
> > clip. It was actually my first version. But basically it's a race
> > and should be covered better in the caller side.
>
> The race is still there AFAICS. The worker may already be executing but
> not yet in the critical section, at which point drm_fb_helper_dirty()
> will expand the dirty rectangle, and schedule another work. So the first
> worker will already see the expanded rectangle, and second worker will
> get zilch.
Hrm, right, there's a slight race window there.
> I think the only good fix is to have the worker validate the dirty
> rectangle before calling the driver.
OK, let me cook it quickly. (It was actually the second version of
the patch I wrote, and I sent the third one :)
thanks,
Takashi
Powered by blists - more mailing lists