[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5654EBE8.9030705@seti.kr.ua>
Date: Wed, 25 Nov 2015 00:59:52 +0200
From: Andrew <nitr0@...i.kr.ua>
To: Alexander Duyck <alexander.duyck@...il.com>, netdev@...r.kernel.org
Subject: Re: Kernel 4.1.12 crash
Hi.
I tried to reproduce errors in virtual environment (some VMs on my
notebook).
I've tried to create 1000 client PPPoE sessions from this box via script:
for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password
test nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth
eth0; done
And on VM that is used as client I've got strange random crashes (that
are present only when server is online - so they're network-related):
http://postimg.org/image/ohr2mu3rj/ - crash is here:
(gdb) list *process_one_work+0x32
0xc10607b2 is in process_one_work
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/workqueue.c:1952).
1947 __releases(&pool->lock)
1948 __acquires(&pool->lock)
1949 {
1950 struct pool_workqueue *pwq = get_work_pwq(work);
1951 struct worker_pool *pool = worker->pool;
1952 bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
1953 int work_color;
1954 struct worker *collision;
1955 #ifdef CONFIG_LOCKDEP
1956 /*
http://postimg.org/image/x9mychssx/ - crash is here (noticed twice):
0xc10658bf is in kthread_data
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:136).
131 * The caller is responsible for ensuring the validity of @task when
132 * calling this function.
133 */
134 void *kthread_data(struct task_struct *task)
135 {
136 return to_kthread(task)->data;
137 }
which is leaded by strange place:
(gdb) list *kthread_create_on_node+0x120
0xc1065340 is in kthread
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:176).
171 {
172 __kthread_parkme(to_kthread(current));
173 }
174
175 static int kthread(void *_create)
176 {
177 /* Copy data: it's on kthread's stack */
178 struct kthread_create_info *create = _create;
179 int (*threadfn)(void *data) = create->threadfn;
180 void *data = create->data;
And earlier:
(gdb) list *ret_from_kernel_thread+0x21
0xc13bb181 is at
/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/arch/x86/kernel/entry_32.S:312.
307 popl_cfi %eax
308 pushl_cfi $0x0202 # Reset kernel eflags
309 popfl_cfi
310 movl PT_EBP(%esp),%eax
311 call *PT_EBX(%esp)
312 movl $0,PT_EAX(%esp)
313 jmp syscall_exit
314 CFI_ENDPROC
315 ENDPROC(ret_from_kernel_thread)
316
Stack corruption?..
I'll try to make test environment on real hardware. And I'll try to test
with older kernels.
22.11.2015 07:17, Alexander Duyck пишет:
> On 11/21/2015 12:16 AM, Andrew wrote:
>> Memory corruption, if happens, IMHO shouldn't be a hardware-related -
>> almost all of these boxes, except H61M-based box from 1st log, works
>> for a long time with uptime more than year; and only software was
>> changed on it; H61M-based box runs memtest86 for a tens of hours w/o
>> any error. If it was caused by hardware - they should crash even
>> earlier.
>
> I wasn't saying it was hardware related. My thought is that it could
> be some sort of use after free or double free type issue. Basically
> what you end up with is the memory getting corrupted by software that
> is accessing regions it shouldn't be.
>
>> Rarely on different servers I saw 'zram decompression error' messages
>> (in this case I've got such message on H61M-based box).
>>
>> Also, other people that uses accel-ppp as BRAS software, have
>> different kernel panics/bugs/oopses on fresh kernels.
>>
>> I'll try to apply these patches, and I'll try to switch back to
>> kernels that were stable on some boxes.
>
> If you could bisect this it would be useful. Basically we just need
> to determine where in the git history these issues started popping up
> so that we can then narrow down on the root cause.
>
> - Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists