[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUu56pLObcDxrNXO5uN_KV5ffDW3EkpLgZL2mh4-aofSQ@mail.gmail.com>
Date: Thu, 20 Nov 2014 15:08:03 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Tejun Heo <tj@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Frederic Weisbecker <fweisbec@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Jones <davej@...hat.com>, Don Zickus <dzickus@...hat.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: Re: frequent lockups in 3.18rc4
On Thu, Nov 20, 2014 at 3:05 PM, Tejun Heo <tj@...nel.org> wrote:
> Hello,
>
> On Thu, Nov 20, 2014 at 11:42:42PM +0100, Thomas Gleixner wrote:
>> On Thu, 20 Nov 2014, Tejun Heo wrote:
>> > On Thu, Nov 20, 2014 at 10:58:26PM +0100, Thomas Gleixner wrote:
>> > > It's completely undocumented behaviour, whether it has been that way
>> > > for ever or not. And I agree with Fredric, that it is insane. Actuallu
>> > > it's beyond insane, really.
>> >
>> > This is exactly the same for any address in the vmalloc space.
>>
>> I know, but I really was not aware of the fact that dynamically
>> allocated percpu stuff is vmalloc based and therefor exposed to the
>> same issues.
>>
>> The normal vmalloc space simply does not have the problems which are
>> generated by percpu allocations which have no documented access
>> restrictions.
>>
>> You created a special case and that special case is clever but not
>> very well thought out considering the use cases of percpu variables
>> and the completely undocumented limitations you introduced silently.
>>
>> Just admit it and dont try to educate me about trivial vmalloc
>> properties.
>
> Why are you always so overly dramatic? How is this productive? Sure,
> this could have been better but I missed it at the beginning and this
> is the first time I hear about this issue. Shits happen and we fix
> them.
>
>> > That isn't enough tho. What if the percpu allocated pointer gets
>> > passed to another CPU without task switching? You'd at least need to
>> > send IPIs to all CPUs so that all the active PGDs get updated
>> > synchronously.
>>
>> You obviously did not even take the time to carefully read what I
>> wrote:
>>
>> "Now after that increment the allocation side needs to wait for a
>> scheduling cycle on all cpus (we have mechanisms for that)"
>>
>> That's exactly stating what you claim to be 'not enough'.
>
> Missed that. Sorry.
>
>> > For the time being, we can make percpu accessors complain when
>> > called from nmi handlers so that the problematic ones can be easily
>> > identified.
>>
>> You should have done that in the very first place instead of letting
>> other people run into issues which you should have thought of from the
>> very beginning.
>
> Sure, it would have been better if I noticed that from the get-go, but
> I couldn't think of the NMI case that time and neither did anybody who
> reviewed the code. It'd be awesome if we could have avoided it but it
> didn't go that way, so let's fix it. Can we please stay technical?
>
> So, for now, all we need is adding nmi check in percpu accessors,
> right?
>
What's the issue with nmi? Page faults are supposed to nest correctly
inside nmi, right?
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists