linux-kernel - Re: [Workqueue] crash in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABOM9ZpEhh7PnJT0hRa=Jx1CfYTU0QTG_71Qz-iB3VxygsYgzw@mail.gmail.com>
Date:	Wed, 8 Oct 2014 17:30:20 +0530
From:	Arun KS <getarunks@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Arun KS <arunks.linux@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	laijs@...fujitsu.com, Silesh C V <saileshcv@...il.com>
Subject: Re: [Workqueue] crash in process_one_work

Hello Tejun,

On Mon, Oct 6, 2014 at 9:02 PM, Tejun Heo <tj@...nel.org> wrote:
> Hello, Arun.
>
> On Mon, Sep 29, 2014 at 09:40:50PM +0530, Arun KS wrote:
> ...
>> The value of data is 0xffffffe0, which is basically the value after an
>> INIT_WORK() or WORK_DATA_INIT().
>> This can happen if a driver calls INIT_WORK on same struct work again
>> after queuing it.
>>
>> From the above details of the work_struct shows that the work is
>> queued from kernel/async.c. async_schedule dynamically allocates the
>> work_struct and queues it to system_unbonded_wq. And possibility of
>> calling INIT_WORK on same work is not there.
>>
>> After inspecting ramdump for async_entry structure in kernel/async.c
>>
>> crash> struct async_entry ed7cf140
>> struct async_entry {
>>   domain_list = {
>>     next = 0xed7cf140,
>>     prev = 0xed7cf140
>>   },
>>   global_list = {
>>     next = 0xed7cf148,
>>     prev = 0xed7cf148
>>   },
>>   work = {
>>     data = {
>>       counter = 0xffffffe0
>>     },
>>     entry = {
>>       next = 0xed7cf154,
>>       prev = 0xed7cf154
>>     },
>>     func = 0xc0140ac4 <async_run_entry_fn>
>>   },
>>   cookie = 0x263e5,
>>   func = 0xc074dda0 <dapm_post_sequence_async>,
>>   data = 0xed48432c,
>>   domain = 0xe5457dec
>> }
>>
>> the func points to dapm_post_sequence_async. and you can see the
>> domain_list and global_list is empty. Which shows that the work has
>> finished execution and there is no pending execution in async.
>>
>> But how come this struct work was with work queue data structures?
>> Is there any corner case in work queue which can miss unlinking the
>> struct_work from pool_workqueue after executing them?
>
> I sure hope not.  How reproducible is the issue?  Can you try w/
> CONFIG_DEBUG_OBJECTS_WORK enabled?

Thanks for replying.
That was a problem with one of our driver. It was freeing the
memory(struct work) without flushing workqueue.
We caught faulty driver by adding a BUG_ON() in INIT_WORK and looking
at the func pointer in work_struct( which will be pointing to the
faulty driver work function)

1) faulty driver queue_work to system_unbownded_wq
2) free work_struct memory, but it is still queued in the work queue.
3) another driver request the memory from SLAB, go the same memory, it INIT_WORK
4) process work try to execute the work queued by the faulty driver,
result in a crash.


Thanks,
Arun

>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/