linux-kernel - [Workqueue] crash in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKZGPAPS+x2V3Ajj323EUStpWGmZ6taOuovjqA3_kaYXA3T_Ag@mail.gmail.com>
Date:	Mon, 29 Sep 2014 21:40:50 +0530
From:	Arun KS <arunks.linux@...il.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc:	laijs@...fujitsu.com, tj@...nel.org,
	Silesh C V <saileshcv@...il.com>,
	Arun KS <getarunks@...il.com>
Subject: [Workqueue] crash in process_one_work

Hello Tejun/Lai,

I am seeing the following crash in 3.10.49 kernel.

[ 1133.893817] Unable to handle kernel NULL pointer dereference at
virtual address 00000004
[ 1133.893821] pgd = c0004000
[ 1133.893827] [00000004] *pgd=00000000
[ 1133.893834] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 1133.893841] Modules linked in:
[ 1133.893849] CPU: 2 PID: 5359 Comm: kworker/u8:20 Not tainted
3.10.28-g99b6153-00006-gc32dab7 #1
[ 1133.893859] task: d8c2aa00 ti: e79a4000 task.ti: e79a4000
[ 1133.893873] PC is at process_one_work+0x18/0x448
[ 1133.893878] LR is at process_one_work+0x14/0x448
[ 1133.893887] pc : [<c0135218>]    lr : [<c0135214>]    psr: 400f0093
               sp : e79a5ef8  ip : daf7f100  fp : 00000089
[ 1133.893891] r10: daf7f118  r9 : ee80e820  r8 : ee80e800
[ 1133.893897] r7 : c111872e  r6 : ee80e800  r5 : ed7cf150  r4 : daf7f100
[ 1133.893902] r3 : ffffffe0  r2 : 00000081  r1 : ed7cf150  r0 : 00000000
[ 1133.893908] Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[ 1133.893914] Control: 10c5383d  Table: a7dbc06a  DAC: 00000015

Pasting the code snippet of process_one_work function where crash happens,

struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;

get_work_pwq returned NULL because WORK_STRUCT_PWQ flag was not set on
work_struct->data. And the crash happened while dereferencing the NULL
pointer. There is no NULL check here, which signifies that this
condition must not have happened.

The corresponding work_struct looks likes this,

crash> struct work_struct ed7cf150
struct work_struct {
  data = {
    counter = 0xffffffe0
  },
  entry = {
    next = 0xed7cf154,
    prev = 0xed7cf154
  },
  func = 0xc0140ac4 <async_run_entry_fn>
}

The value of data is 0xffffffe0, which is basically the value after an
INIT_WORK() or WORK_DATA_INIT().
This can happen if a driver calls INIT_WORK on same struct work again
after queuing it.

>From the above details of the work_struct shows that the work is
queued from kernel/async.c. async_schedule dynamically allocates the
work_struct and queues it to system_unbonded_wq. And possibility of
calling INIT_WORK on same work is not there.

After inspecting ramdump for async_entry structure in kernel/async.c

crash> struct async_entry ed7cf140
struct async_entry {
  domain_list = {
    next = 0xed7cf140,
    prev = 0xed7cf140
  },
  global_list = {
    next = 0xed7cf148,
    prev = 0xed7cf148
  },
  work = {
    data = {
      counter = 0xffffffe0
    },
    entry = {
      next = 0xed7cf154,
      prev = 0xed7cf154
    },
    func = 0xc0140ac4 <async_run_entry_fn>
  },
  cookie = 0x263e5,
  func = 0xc074dda0 <dapm_post_sequence_async>,
  data = 0xed48432c,
  domain = 0xe5457dec
}

the func points to dapm_post_sequence_async. and you can see the
domain_list and global_list is empty. Which shows that the work has
finished execution and there is no pending execution in async.

But how come this struct work was with work queue data structures?
Is there any corner case in work queue which can miss unlinking the
struct_work from pool_workqueue after executing them?

I really appreciate your inputs/pointers.
Please let me know if you want any more information from the crashed system.

Thanks,
Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/