lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 3 Dec 2016 08:57:00 +0000
From:   Chris Wilson <chris@...is-wilson.co.uk>
To:     Matt Turner <mattst88@...il.com>
Cc:     intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        Kenneth Graunke <kenneth@...tecape.org>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Mika Kuoppala <mika.kuoppala@...el.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove instructions to file a bug
 report.

On Fri, Dec 02, 2016 at 05:03:05PM -0800, Matt Turner wrote:
> From these instructions, users assume that /sys/class/drm/card0/error
> contains all the information a developer needs to diagnose and fix a GPU
> hang.
> 
> In fact it doesn't, and we have no tools for solving them (other than
> stabbing in the dark). Most of the time the error state itself isn't
> even useful because it just shows a hang on a PIPE_CONTROL or similar.
> 
> Until a time when the error state contains enough information to
> actually solve a hang, stop telling users to file unsolvable bugs, and
> instead rely on users who know where and how to file a good bug report
> to find their own way there.
> 
> Signed-off-by: Matt Turner <mattst88@...il.com>

Nak. Though having stale bug reports is a pain, we've recently adopted
the policy of stopping the request after a certain period, those bug
reports are still vital. They don't just represent bugs in mesa.

> ---
> Maybe now's a good time to discuss what *would* be useful to put in the
> error state for debugging hangs. The currently executing shader program
> would be a great place to start.

Now? That is the conversation we've being trying to have for several
years. The contents of the error state are currently about sufficient to
spot kernel bugs, triage the culprit and the general class of bug.

Capturing all state for a request is unfeasible (because we can't copy
the gigabytes of memory required). Copying a selected set of aux bo is
one option. And since those bo are under user control and do not have to
be executed, you can even store aub data in them or whatnot.

Even if you make attaching the debug information conditional, I would
still keep the error message unconditional.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ