[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwO_5EgwVn+d1LGoms+6+u_76cBG0NLqF8tmeo1u=32AKGqVA@mail.gmail.com>
Date: Mon, 19 Mar 2012 17:24:41 -0700
From: Paul Taysom <taysom@...gle.com>
To: Mandeep Singh Baines <msb@...omium.org>
Cc: Alan Stern <stern@...land.harvard.edu>, "Ted Ts'o" <tytso@....edu>,
Theodore Tso <tytso@...gle.com>, Greg KH <greg@...ah.com>,
Paul Taysom <taysom@...omium.org>,
Jens Axboe <axboe@...nel.dk>, Andrew Morton <akpm@...gle.com>,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, stable@...nel.org
Subject: Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
On Sun, Mar 18, 2012 at 3:25 PM, Mandeep Singh Baines <msb@...omium.org> wrote:
> On Sun, Mar 18, 2012 at 1:23 PM, Alan Stern <stern@...land.harvard.edu> wrote:
>> On Sat, 17 Mar 2012, Ted Ts'o wrote:
>>
>>> I can't help thinking that the fact that we're constantly playing
>>> whack-a-mole trying to fix various random crashes when devices
>>> disappear that perhaps we should consider if there's a better way to
>>> do things.
>>
>> Indeed, as Jens's patch mentions, proper reference counting for the BDI
>> stuff hasn't been implemented yet. Obviously it will require somebody
>> who really does know the code (i.e., not me).
>>
>> For example, when Paul's patch assigns &default_backing_dev_info, is
>> the assignment synchronized by any sort of lock? I can't tell -- but
>> if it isn't then the possibility of a race will still exist.
>>
>
> I think its safe without a lock (assuming the assignment is atomic) but it
> wouldn't hurt to add an i_lock. That would also give you a barrier which
> is needed to propagate the assignment to other CPUs.
>
> This is not a perfect fix but its pretty safe and is nice in that it works
> independent of filesystem or bus-type.
>
> Regards,
> Mandeep
>
>>> The fact that at the file system layer I have **no** idea that a
>>> device has disappeared, and just blindly going on trying to write to a
>>> device which is gone just seems a little crazy to me... why shouldn't
>>> block layer inform the upper layers about something as fundamental as,
>>> "the device is gone and is never coming back"?
>>
>> Playing devil's advocate... What would you do differently if you did
>> know the device was gone? All I/O operations will fail regardless, and
>> presumably with an error code like -ENODEV. Pretty much all you could
>> do would be to fail them a little earlier.
>>
>>> > I suspect Paul's patch is the right thing to do. It might even make
>>> > the ext4 fix unnecessary, although I don't understand the details well
>>> > enough to verify it. Maybe Paul can check -- the commit I'm referring
>>> > to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
>>> > kludge to avoid an oops after the disk disappears).
>>>
>>> I have no idea either, because it's not obvious to me what data
>>> structures can be relied upon, and what can't, and when things are
>>> supposed to get freed on sudden device disconnects. The fact that
>>> none of us are sure is part of what makes me think that the current
>>> scheme is, perhaps, non-optimal...
>>
>> That's why someone like Jens or Al needs to take a close look at this
>> (hint, hint).
>>
>> Alan Stern
>>
I have rerun my tests without my change on the 3.2.7 kernel and I was
not able to get it to crash. I even put some code in to do the early
detection so I didn't have to wait for another thread to stumble
across the corruption. The way I test is with several flash drivers
with ext2, ext3, ext4, FAT, and HPFS file systems and just repeatedly
plug and unplug them. When a flash drive is plugged in with a file
system, it is automatically mounted.
Paul Taysom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists