[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVGSQ6CTXAFkPTueb0FMJ8oBd2_R4zB5X9juDoun_kJvQ@mail.gmail.com>
Date: Fri, 21 Nov 2014 15:03:54 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Tejun Heo <tj@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Don Zickus <dzickus@...hat.com>, Dave Jones <davej@...hat.com>,
"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4
On Fri, Nov 21, 2014 at 2:55 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Fri, Nov 21, 2014 at 1:11 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
>>
>> I'm fine with that. I just think it's not horrid enough, but that can
>> be fixed easily :)
>
> Oh, I think it's plenty horrid.
>
> Anyway, here's an actual patch. As usual, it has seen absolutely no
> actual testing, but I did try to make sure it compiles and seems to do
> the right thing on:
> - x86-32 no-PAE
> - x86-32 no-PAE with PARAVIRT
> - x86-32 PAE
> - x86-64
>
> also, I just removed the noise that is "vmalloc_sync_all()", since
> it's just all garbage and nothing actually uses it. Yeah, it's used by
> "register_die_notifier()", which makes no sense what-so-ever.
> Whatever. It's gone.
>
> Can somebody actually *test* this? In particular, in any kind of real
> paravirt environment? Or, any comments even without testing?
>
> I *really* am not proud of the mess wrt the whole
>
> #ifdef CONFIG_PARAVIRT
> #ifdef CONFIG_X86_32
> ...
>
> but I think that from a long-term perspective, we're actually better
> off with this kind of really ugly - but very explcit - hack that very
> clearly shows what is going on.
>
> The old code that actually "walked" the page tables was more
> "portable", but was somewhat misleading about what was actually going
> on.
At the risk of going deeper down the rabbit hole, I grepped for
pgd_list. I found:
__set_pmd_pte in pageattr.c. It appears to be completely incorrect.
Unless I've misunderstood, other than the very first line, it will
either do nothing at all or crash when it falls off the end of the
page tables that it's pointlessly trying to update.
sync_global_pgds: OK, I guess -- this is for hot-add of memory, right?
But if we teach the context switch code to check that the kernel
stack is okay, that can be removed, I think. (We absolutely MUST keep
the static per-cpu stuff populated everywhere before running user
code, but that's never in hot-added memory.)
xen_mm_pin_all and xen_mm_unpin_all: I have no clue. I wonder how
that works with SHARED_KERNEL_PMD.
Anyone want to attack these? It would be kind of nice to remove
pgd_list entirely. (I realize that doing so precludes the use of
bloody enormous 512GB kernel pages, but any attempt to use *those* is
so completely screwed without a major reworking of all of this (or
perhaps stop_machine) that keeping pgd_list around just for that is
probably a mistake.)
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists