[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24166154-8b08-bab6-ac8e-618eeba30674@gmail.com>
Date: Fri, 2 Mar 2018 19:39:25 +0300
From: Dmitry Osipenko <digetx@...il.com>
To: Andreas Dilger <adilger@...ger.ca>, Theodore Ts'o <tytso@....edu>
Cc: Adrian Hunter <adrian.hunter@...el.com>,
Ulf Hansson <ulf.hansson@...aro.org>,
linux-mmc <linux-mmc@...r.kernel.org>,
linux-block <linux-block@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Bough Chen <haibo.chen@....com>,
Alex Lemberg <alex.lemberg@...disk.com>,
Mateusz Nowak <mateusz.nowak@...el.com>,
Yuliy Izrailov <Yuliy.Izrailov@...disk.com>,
Jaehoon Chung <jh80.chung@...sung.com>,
Dong Aisheng <dongas86@...il.com>,
Das Asutosh <asutoshd@...eaurora.org>,
Zhangfei Gao <zhangfei.gao@...il.com>,
Sahitya Tummala <stummala@...eaurora.org>,
Harjani Ritesh <riteshh@...eaurora.org>,
Venu Byravarasu <vbyravarasu@...dia.com>,
Linus Walleij <linus.walleij@...aro.org>,
Shawn Lin <shawn.lin@...k-chips.com>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
Christoph Hellwig <hch@....de>,
Thierry Reding <treding@...dia.com>,
Krishna Reddy <vdumpa@...dia.com>,
linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: EXT4 Oops (Re: [PATCH V15 06/22] mmc: block: Add blk-mq support)
On 01.03.2018 23:20, Andreas Dilger wrote:
>
> On Mar 1, 2018, at 9:04 AM, Theodore Ts'o <tytso@....edu> wrote:
>> This doesn't seem to make sense; the PC is where we are currently
>> executing, and LR is the "Link Register" where the flow of control
>> will be returning after the current function returns, right? Well,
>> dx_probe should *not* be returning to __wait_on_bit(). So this just
>> seems.... weird.
>>
>> Ignoring the LR register, this stack trace looks sane... I can't see
>> which pointer could be NULL and getting dereferenced, though. How
>> easily can you reproduce the problem? Can you either (a) translate
>> the PC into a line number, or better yet, if you can reproduce, add a
>> series of BUG_ON's so we can see what's going on?
Ted, thank you for the suggestion. I don't have a bug-reproducer, it happens
only under some IO load and quite randomly. I've applied the BUG_ON()'s, but it
may take some time to catch the bug again.
>> + BUG_ON(frame);
>
> I think you mean:
> BUG_ON(frame == NULL);
> or
> BUG_ON(!frame);
>
>
>> memset(frame_in, 0, EXT4_HTREE_LEVEL * sizeof(frame_in[0]));
>> frame->bh = ext4_read_dirblock(dir, 0, INDEX);
>> if (IS_ERR(frame->bh))
>> return (struct dx_frame *) frame->bh;
>>
>> + BUG_ON(frame->bh);
>> + BUG_ON(frame->bh->b_data);
>
> Same here.
>
> BUG_ON(frame->bh == NULL);
> BUG_ON(frame->bh->b_data == NULL);
>
> This is why I don't like implicit "is NULL" or "is non-zero" usage. Lustre
> used to require "== NULL" or "!= NULL" to avoid bugs like this, but had to
> abandon that because of upstream code style.
Well spotted, thanks Andreas.
>> root = (struct dx_root *) frame->bh->b_data;
>> if (root->info.hash_version != DX_HASH_TEA &&
>> root->info.hash_version != DX_HASH_HALF_MD4 &&
>> root->info.hash_version != DX_HASH_LEGACY) {
>>
>> These are "could never" happen scenarios from looking at the code, but
>> that will help explain what is going on.
>>
>> If this is reliably only happening with mq, the only way I could see
>> that if is something is returning an error when it previously wasn't.
>> This isn't a problem we're seeing with any of our testing, though.
Powered by blists - more mailing lists