linux-kernel - Re: [PATCH] drm/fb-helper: Fix race between deferred

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <s5hshrrytot.wl-tiwai@suse.de>
Date:   Thu, 20 Oct 2016 16:35:30 +0200
From:   Takashi Iwai <tiwai@...e.de>
To:     Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc:     dri-devel@...ts.freedesktop.org,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater

On Thu, 20 Oct 2016 16:17:25 +0200,
Ville Syrjälä wrote:
> 
> On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote:
> > On Thu, 20 Oct 2016 15:28:14 +0200,
> > Ville Syrjälä wrote:
> > > 
> > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote:
> > > > Since 4.7 kernel, we've seen the error messages like
> > > > 
> > > >  kernel: [TTM] Buffer eviction failed
> > > >  kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
> > > >  kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> > > > 
> > > > on QXL when switching and accessing on VT.  The culprit was the generic
> > > > deferred_io code (qxl driver switched to it since 4.7).  There is a
> > > > race between the dirty clip update and the call of callback.
> > > > 
> > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
> > > > while it kicks off the update worker outside the spinlock.  Meanwhile
> > > > the update worker clears the dirty clip in the spinlock, too.  Thus,
> > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is
> > > > called after the clip is cleared in the first worker call.
> > > 
> > > Why does that matter? The first worker should have done all the
> > > necessary work already, no?
> > 
> > Before the first call, it clears the clip and passes the copied clip
> > to the callback.  Then the second call will be with the cleared and
> > untouched clip, i.e. with x1=~0.  This confuses
> > qxl_framebuffer_dirty().
> > 
> > Of course, we can filter out in the callback side by checking the
> > clip.  It was actually my first version.  But basically it's a race
> > and should be covered better in the caller side.
> 
> The race is still there AFAICS. The worker may already be executing but
> not yet in the critical section, at which point drm_fb_helper_dirty()
> will expand the dirty rectangle, and schedule another work. So the first
> worker will already see the expanded rectangle, and second worker will
> get zilch.

Hrm, right, there's a slight race window there.

> I think the only good fix is to have the worker validate the dirty
> rectangle before calling the driver.

OK, let me cook it quickly.  (It was actually the second version of
the patch I wrote, and I sent the third one :)


thanks,

Takashi