[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANaxB-zT41eCjraCWRZkMXtk68pSpmmhyt__aNuYpPHvtTy-bA@mail.gmail.com>
Date: Tue, 13 Dec 2016 14:18:15 -0800
From: Andrey Vagin <avagin@...nvz.org>
To: Nikolay Borisov <n.borisov.lkml@...il.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Linux Containers <containers@...ts.linux-foundation.org>,
Jan Kara <jack@...e.cz>, LKML <linux-kernel@...r.kernel.org>,
Serge Hallyn <serge@...lyn.com>
Subject: Re: [inotify] fee1df54b6: BUG_kmalloc-#(Not_tainted):Freepointer_corrupt
On Tue, Dec 13, 2016 at 11:34 AM, Nikolay Borisov
<n.borisov.lkml@...il.com> wrote:
>
>
> On 13.12.2016 20:51, Eric W. Biederman wrote:
>> Nikolay Borisov <n.borisov.lkml@...il.com> writes:
>>
>>> So this thing resurfaced again and I took a hard look into the code but
>>> couldn't find anything suspicious. So the allocating and freeing
>>> contexts leads me to believe it's the 'tbl' pointer that is being
>>> corrupted. The only thing which I do with it is to increase it by two.
>>>
>>> Perhaps some liveness issues.
>>
>> To me it feels like a double free somewhere. Like we call dec_ucount
>> and thus put_ucount multiple times in a way that goes to 0.
>>
>> Perhaps there is a peculiarity in the existing code which allows the
>> count to go to zero which we don't notice because we don't free anything
>> when the count goes to zero today.
>>
>> Perhaps there is some subtle semantic mismatch between your conversion
>> and the inotify code.
>>
>> I don't know if you made a subtle misreading of the code, or if
>> there is an existing bug that your changes took from harmless to
>> problematic, but the evidence is overwhelming that something
>> is going wrong and it is your patch that brings it out.
>>
>> If it helps the openvz folks apparently reproduced this with the criu
>> regression tests and the appropriate kernel debug options, and confirmed
>> the failure was your patch.
>
> Great but I think I missed this conversation, care to send relevant
> threads? I'd like to get to the bottom of this and have it merged?
>
> @openvz guys - if you care to shout with more details I'd love to work
> on getting this fixed!
Hi Nikolay,
We execute CRIU tests for linux-next and a few days ago they triggered
a kernel bug:
http://www.spinics.net/lists/linux-mm/msg118204.html
If you want to execute these tests to reproduce a bug, you need to do
these steps:
$ apt-get install gcc make protobuf-c-compiler libprotobuf-c0-dev libaio-dev \
libprotobuf-dev protobuf-compiler python-ipaddr libcap-dev \
libnl-3-dev gdb bash python-protobuf
$ git clone https://github.com/xemul/criu.git
$ cd criu
$ make
$ python test/zdtm.py run -a -p 4
Here is a config file, which we use to compile a kernel:
https://github.com/avagin/criu-jenkins-digitalocean/blob/master/jenkins-scripts/config
I recommend to boot the kernel with slub_debug=FZ.
Don't hesitate to ask me if you will have any questions.
Thanks,
Andrei
>
>>
>> The current state of play is that I would love to merge this if we can
>> track down this issue. I dropped this from my tree before I sent my pull
>> request to Linus so there is no emergency to get this fixed.
>>
>> Eric
>>
>>
> _______________________________________________
> Containers mailing list
> Containers@...ts.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
Powered by blists - more mailing lists