lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM=9tz9Q7gi4BSLP9msifkK1AX=GzguDHth=Mhw9wCpn-EvHg@mail.gmail.com>
Date:	Thu, 30 Jul 2015 21:16:17 +1000
From:	Dave Airlie <airlied@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	"Theodore Ts'o" <tytso@....edu>,
	intel-gfx <intel-gfx@...ts.freedesktop.org>,
	DRI <dri-devel@...ts.freedesktop.org>,
	Daniel Vetter <daniel.vetter@...el.com>,
	Mani Nikula <jani.nikula@...ux.intel.com>,
	Ander Conselvan de Oliveira 
	<ander.conselvan.de.oliveira@...el.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [REGRESSION] Re: i915 driver crashes on T540p if docking station attached

On 30 July 2015 at 15:18, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Wed, Jul 29, 2015 at 6:39 PM, Theodore Ts'o <tytso@....edu> wrote:
>>
>> It's here:  https://goo.gl/photos/xHjn2Z97JQEw6k2C9
>
> You didn't catch enough of the code line to decode the code, but it's
> early enough in drm_crtc_index() (just five bytes in) that it's almost
> certainly the very first dereference, so it's almost guaranteed to be
> that
>
>    crtc->dev
>
> access as part of list_for_each_entry(), with crtc being NULL. And
> yes, "->dev" is the very first field, so the offset is zero too (while
> the "->mode_config" list access would not be at offset zero).
>
> And it looks like it is called from drm_atomic_helper_check_modeset():
> the reason it has a question mark in the backtrace is because the
> fault happens before the stack frame has even been set up.
>
> There are multiple calls to "drm_crtc_index()" from that function, I
> can't tell which one it is. Looking at the code generation I get, I
> think it's because update_connector_routing() gets inlined, and that
> one does several calls. Most of them look like this:
>
>                 if (connector->state->crtc) {
>                         idx = drm_crtc_index(connector->state->crtc);
>
> ie they check that the crtc is non-NULL, but that last one does not:
>
>         connector_state->best_encoder = new_encoder;
>         idx = drm_crtc_index(connector_state->crtc);
>
>         crtc_state = state->crtc_states[idx];
>         crtc_state->mode_changed = true;
>
> and I suspect the fix might be something like the attached. Totally
> untested. Ted?
>
> This whole "atomic modeset" series has been one royal fuck-up, guys.
> We've had too many of these kinds of crap issues.

It hasn't been that bad, on a scale of 1 to MD eats my raid array, I'd
say we are barely at a 5.

There have been a lot of small and seemingly easily fixed teething
problems, essentially rewriting the DRM API to provide a new userspace
API and internal interface, porting some drivers partly to the new
interface, while trying to maintain the old ABI/API on top seamlessly
was always going to be an impossible task. It was never going to
magically all just work in -next and land in your tree fully formed
smelling of lavender and elderberries. This is a massive undertaking,
and doing it over a few kernels was the only possible way it could
ever land.

I think the biggest problem we've had is the QA team at Intel got
reorganised or something right when they really needed to be doing
testing on this stuff, so what was sitting in -next never got as much
testing as it had previously, and you can see that in the types of
cases that are getting through. I think the other thing we can learn
is that when Android forks the kernel we should just say this shit is
too hard, let Google go and create a new API and a complete set of
graphics drivers and deal with it in 10 years, because that was
seriously the only other option.

So yes it's a pity other kernel developers are seeing our fallout, but
I've experienced lots of other kernel developers fall out over the
years, and generally the idea is to get this stuff fixed to a
reasonable state before you release a final kernel.

Note I'm not personally involved in the development for atomic
modesetting at all, I'm running the kernels with it where and when I
can, and I trust the developers who work on it are doing as much as
they can to make it work.

That said hopefully Daniel can find a bag of fucks to debug and write
a proper patch, instead of rage quitting the universe, and just git
reset --hard v4.0 drivers/gpu/drm/i915..

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ