[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM=9tw8LVWsuA6m_nkUDgm00iz2txYRNZY0b0WWZbyiUVzLEw@mail.gmail.com>
Date:   Wed, 19 Aug 2020 11:12:50 +1000
From:   Dave Airlie <airlied@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>
Cc:     Pavel Machek <pavel@....cz>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Matthew Auld <matthew.auld@...el.com>,
        intel-gfx <intel-gfx@...ts.freedesktop.org>,
        kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Wed, 19 Aug 2020 at 10:38, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Ping on this?
>
> The code disassembles to
>
>   24: 8b 85 d0 fd ff ff    mov    -0x230(%ebp),%eax
>   2a:* c7 03 01 00 40 10    movl   $0x10400001,(%ebx) <-- trapping instruction
>   30: 89 43 04              mov    %eax,0x4(%ebx)
>   33: 8b 85 b4 fd ff ff    mov    -0x24c(%ebp),%eax
>   39: 89 43 08              mov    %eax,0x8(%ebx)
>   3c: e9                    jmp ...
>
> which looks like is one of the cases in __reloc_entry_gpu(). I *think*
> it's this one:
>
>         } else if (gen >= 3 &&
>                    !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
>                 *batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
>                 *batch++ = addr;
>                 *batch++ = target_addr;
>
> where that "batch" pointer is 0xf8601000, so it looks like it just
> overflowed into the next page that isn't there.
>
> The cleaned-up call trace is
>
>   drm_ioctl+0x1f4/0x38b ->
>     drm_ioctl_kernel+0x87/0xd0 ->
>       i915_gem_execbuffer2_ioctl+0xdd/0x360 ->
>         i915_gem_do_execbuffer+0xaab/0x2780 ->
>           eb_relocate_vma
>
> but there's a lot of inling going on, so..
>
> The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU
> relocations only") but that's going purely by "that seems to be the
> main relocation change this mmrge window".
I think there's been some discussion about reverting that change for
other reasons, but it's quite likely the culprit.
Maybe we can push for a revert sooner, (cc'ing more of i915 team).
Dave.
Powered by blists - more mailing lists
 
