[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <154468977288.4945.12937975200892746470@jlahtine-desk.ger.corp.intel.com>
Date: Thu, 13 Dec 2018 10:29:33 +0200
From: Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>
To: Pavel Machek <pavel@....cz>
Cc: bp@...en8.de, hpa@...or.com,
kernel list <linux-kernel@...r.kernel.org>, mingo@...hat.com,
tglx@...utronix.de, x86@...nel.org, jani.nikula@...ux.intel.com,
rodrigo.vivi@...el.com, intel-gfx@...ts.freedesktop.org,
chris@...is-wilson.co.uk
Subject: Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220,
graphics related?
Quoting Pavel Machek (2018-12-12 20:29:02)
> Hi!
>
> > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine
> > > > > > > > memory corruption (1 bit flipped):
> > > > > > > >
> > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880
> > > > > > > >
> > > > > > > > Any extra information would be of use :)
> > > > > > > >
> > > > > > > > Regards, Joonas
> > > > > > > >
> > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the
> > > > > > > > information in one consolidated place:
> > > > > > > >
> > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs
> > > > > > >
> > > > > > > I prefer email... certainly for bugs that can't be reproduced.
> > > > > >
> > > > > > By adding it to the Bugzilla it may be recognized by somebody else
> > > > > > who is experiencing a similar issue. Internet points are not deducted
> > > > > > for submitting bugs in good faith, even if they get closed as
> > > > > > NOTABUG.
> > > >
> > > > Well, your documentation suggests you'll deduce my internet points:
> > > >
> > > > Before filing the bug, please try to reproduce your issue with the
> > > > latest kernel. Use the latest drm-tip branch from
> > > > http://cgit.freedesktop.org/drm-tip and build as instructed on our
> > > > Build Guide.
> > > >
> > > > :-)
> > >
> > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if
> > > it re-appears (but it takes long time to reproduce :-().
> >
> > If we can or can not reproduce the issue with drm-tip, is a very useful
> > datapoint for us. If we can not reproduce, it'll be possible to bisect
> > which commit fixed it, and backport that. On the other hand, if it's
> > still reproducible, we know we're not spending time on something we
> > already fixed, and the priority gets a bump.
>
> bisect ... is not practical on something that takes 2 days to reproduce.
>
> > > If you think it is useful, I can try to update my machine to
> > > linux-next.
> >
> > linux-next is closer to drm-tip, so it's better. Do you have some
> > specific reason for not wanting to run drm-tip (but linux-next is still
> > ok)?
>
> I already have build/update scripts for -next, and I trust -next not
> to store screenshots of my desktop in my master boot record :-).
>
> Anyway, it does happen with -next. This time, chromiums were running,
> and crash happened minute? after I exited flightgear. It can be seen
> in the logs.
>
> Oh and I might want to mention -- machine was rather deep in swap this
> time, as in "mouse jumping when starting fgfs" and "could feel the
> chromium being swapped back in". I might have had this situation
> before, and just powercycled the machine "because it is so deep in
> swap that it will not recover".
>
> top says:
>
> top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45,
> 3.21
> Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie
> %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9
> si, 0.0 st
> KiB Mem: 5967968 total, 663244 used, 5304724 free, 48876
> buffers
> KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280
> cached Mem
>
> ....but of course that memory is free once everything died.
>
> Any ideas? Should I go back to v4.19 to see if it happens there, too?
linux-next includes very much the same code as drm-tip. There's nobody
magically reviewing the code more than it is reviewed for inclusion into
drm-tip, when it is fed into linux-next. So thinking linux-next would be
some way safer is an illusion.
It sounds like having memory pressure expedites the corruption, which
should make it easier to reproduce and thus fix.
So if you could please try drm-tip reproducing AND open a bug in Bugzilla.
If you are unwilling to do that, it is very difficult to help you more.
Regards, Joonas
>
>
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Powered by blists - more mailing lists