linux-kernel - Re: [RFC] nvmet: Always remove processed AER elements from list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82e496f61183ebdf7924d660d3b8c90e@suse.de>
Date:   Thu, 31 Oct 2019 08:06:10 +0100
From:   Johannes Thumshirn <jthumshirn@...e.de>
To:     Chaitanya Kulkarni <Chaitanya.Kulkarni@....com>
Cc:     Daniel Wagner <dwagner@...e.de>, linux-nvme@...ts.infradead.org,
        Sagi Grimberg <sagi@...mberg.me>, linux-kernel@...r.kernel.org,
        Christoph Hellwig <hch@....de>
Subject: Re: [RFC] nvmet: Always remove processed AER elements from list

On 2019-10-30 20:58, Chaitanya Kulkarni wrote:
> On 10/30/2019 08:24 AM, Daniel Wagner wrote:
>> Hi,
>> 
>> I've got following oops:
>> 
>> PID: 79413  TASK: ffff92f03a814ec0  CPU: 19  COMMAND: "kworker/19:2"
>> #0 [ffffa5308b8c3c58] machine_kexec at ffffffff8e05dd02
>> #1 [ffffa5308b8c3ca8] __crash_kexec at ffffffff8e12102a
>> #2 [ffffa5308b8c3d68] crash_kexec at ffffffff8e122019
>> #3 [ffffa5308b8c3d80] oops_end at ffffffff8e02e091
>> #4 [ffffa5308b8c3da0] general_protection at ffffffff8e8015c5
>>      [exception RIP: nvmet_async_event_work+94]
>>      RIP: ffffffffc0d9a80e  RSP: ffffa5308b8c3e58  RFLAGS: 00010202
>>      RAX: dead000000000100  RBX: ffff92dcbc7464b0  RCX: 
>> 0000000000000002
>>      RDX: 0000000000040002  RSI: 38ffff92dc9814cf  RDI: 
>> ffff92f217722f20
>>      RBP: ffff92dcbc746418   R8: 0000000000000000   R9: 
>> 0000000000000000
>>      R10: 000000000000035b  R11: ffff92efb8dd2091  R12: 
>> ffff92dcbc7464a0
>>      R13: ffff92dbe03a5f29  R14: 0000000000000000  R15: 
>> 0ffff92f92f26864
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> #5 [ffffa5308b8c3e78] process_one_work at ffffffff8e0a3b0c
>> #6 [ffffa5308b8c3eb8] worker_thread at ffffffff8e0a41e7
>> #7 [ffffa5308b8c3f10] kthread at ffffffff8e0a93af
>> #8 [ffffa5308b8c3f50] ret_from_fork at ffffffff8e800235
>> 
>> this maps to nvmet_async_event_results. So it looks like this function
>> access a stale aen pointer:
>> 
>> static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
>> {
>>          return aen->event_type | (aen->event_info << 8) | 
>> (aen->log_page << 16);
>> }
> Can you please explain the test setup ? Is that coming from the tests
> present in the blktests ? if so you can please provide test number ?

No unfortunately this is coming from a customer bug report. We _think_ 
we're having a race between AEN processing and nvmet_sq_destroy(), but 
we're not 100% sure. Hence this RFC.