[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1006010722160.3637@i5.linux-foundation.org>
Date: Tue, 1 Jun 2010 07:58:31 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Rusty Russell <rusty@...tcorp.com.au>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Brandon Philips <brandon@...p.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
LKML <linux-kernel@...r.kernel.org>,
Jon Masters <jonathan@...masters.org>,
Tejun Heo <htejun@...il.com>,
Masami Hiramatsu <mhiramat@...hat.com>,
Kay Sievers <kay.sievers@...y.org>
Subject: Re: [PATCH 2/2] module: fix bne2 "gave up waiting for init of module
libcrc32c"
On Tue, 1 Jun 2010, Rusty Russell wrote:
>
> > So explain why I should be more polite, or more terse?
>
> How about this:
>
> "I hate this patch. It makes already-ugly locking in module.c awful, and I
> can't see that it's correct. Why not just reduce the locking to cover
> the minimum needed?"
Umm. Did you read the first few emails I sent out on this thread
originally? They were actually _politer_ than your suggestion above (no
crap mentioned), and did exactly what you ask for. Let me quote the one
where I suggested tightening the lock, for example (along with saying that
I don't want to see pointless semantic changes):
I'm not re-applying it with the pointless semantic changes that are
visible to modules. It doesn't matter if they were informed, if it means
that they'll then just have some odd version dependency and add crazy
"#if LINUX_VERSION" tests that aren't even exact.
I also wonder exactly what that module_mutex() actually ends up
protecting. 99% of load_module() seems to be stuff that is purely about
local issues. Maybe we could tighten the actual lock section to just the
parts that actually expose the vmalloc'ed area to others?
It's generally pointless releasing a lock in the middle - that just makes
the lock even less valid. If it's valid to just release the lock (without
some retry logic starting everything from the beginning), it likely the
lock shouldn't have been held there in the first place.
See? How bad was that?
> Getting email from you is the least fun part of kernel hacking, and that
> sucks. It's never "this code sucks, let's make it better", and always
> "your code sucks and you're wrong" :(
See above. That wasn't what I said. That "you're wrong" was what I got to
when I got frustrated with you for complaining about me actually trying to
fix it.
And "crap locking" only came up in the long explanation to Andrew about
what the original problem was, and why the EBUSY error wasn't a problem.
And we all know the module locking was/is crap. You must have known it
too, since you apparently had a "fix up locking" patch from before.
So what was so bad about calling the locking crap? Really?
So yes, we both get frustrated here. But read the thread again, Rusty, and
look who it is that actually starts complaining about other peoples code.
I called the locking crap, you're the one who called something "wrong,
complex and fundamentally broken".
Kettle. Pot. Black.
> > I do agree with your 1/2, btw, the one you posted under protest after I
> > pointed out that the locking was crap. I was just explaining _why_ the
> > locking was crap, and what the problem was to Andrew.
>
> Ugly as dropping the lock was, it was in some ways less ambitious than
> either patch. I understand that you disagree.
I agree with "less ambitious". But I'd also call it "papering over bad
code", or "making bad locking much worse" or "likely introduce new and
even more subtle problems".
In fact, I might even go so far as using "crap".
Because it really is fundamentally wrong to drop a lock in the middle of
an operation. It may _work_, but it certainly doesn't make the code
better.
The thing is, when we have a problem in some code, we should try to make
sure that the fixed code is _better_ than the old code. Not just fix the
symptoms. Fix the underlying _reason_ for why the bug happened in the
first place.
And the underlying reason the bug happened is (I think we both agree)
simply that the locking was broken. Your patch fixed the symptoms, yes,
but it did so by making the locking _worse_. That's why I hated it. I
really don't think it was a good approach.
And I do think you agree in the end, and know that's true.
> > I happen to also think that my solution to the problem is actually better
> > and more straightforward than your one is.
>
> I don't think so, for several reasons.
>
> (1) You no longer print out which module you gave up waiting for.
> (2) You now need that code complexity for !CONFIG_MODULE_UNLOAD.
> (3) It's obviously *not* straightforward otherwise you'd see why it's buggy.
> (4) The fast booting nutters want more parallelism in module loading anyway.
> (5) The locking *is* getting hairy (we drop it once already), and my patch
> makes that more explicit and clearer.
I agree with 1-2, and even had a comment about #2 pointing that out, and
thinking that it would be good for other reasons (ie /proc/modules).
And as to #4, the "wait for everything after the fact" is actually the
faster approach, although in practice it probably doesn't matter (you'd
hope that modules depending on other modules that are still busy loading
is such a rare case that it does _not_ matter whether you wait for them to
load serially or batched up).
And as to #5, I would not actually suggest that we do "wait later _or_ fix
the locking", I'd do both. It really isn't a either-or situation, Rusty.
So I don't think those ones matter.
HOWEVER.
But as to #3, I think you bring up an interesting issue:
> Prior to your patch, noone could get a reference on a module before it had
> done init. So when init fails, we simply free the module.
>
> Now, you've changed that with your grab-and-check-later code. And you can't
> fix it for the !CONFIG_MODULE_UNLOAD, because there are no reference counts
> for that case.
I agree. And you're right, we'd have to move even more code from the
MODULE_UNLOAD case into !MODULE_UNLOAD to fix it. At which point I do
agree that if we want to keep !MODULE_UNLOD (and I do think we do), my
approach really doesn't work. Or we'd basically make !MODULE_UNLOAD a
pointless exercise where unloading is not allowed, but we still do
everything else the same way.
So I do agree that your later two-patch series is the way to go.
I would like to note that your original "fix things by dropping the lock"
patch that I hated so much had this exact bug too. Making this a good
example of _why_ it's basically always wrong to drop locks in the middle -
even if you claimed you knew and understood the locking.
So I do hope we can agree to call the module locking "total and utter
crap", ok?
And it really wasn't a personal complaint about your prowess. Crap locking
(or code in general) is crap, but calling the code crap shouldn't be
something you take personally. Especially not when there are various valid
historical reasons _why_ it is all crap.
And yes, my code was crap too. I wrote it for the MODULE_UNLOAD case, and
only added a comment saying the !unload case was broken, but didn't look
at just _how_ broken it was. My bad.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists