[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1364585774.31320.9.camel@thor.lan>
Date: Fri, 29 Mar 2013 15:36:14 -0400
From: Peter Hurley <peter@...leysoftware.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Dave Jones <davej@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...riel.com>,
Davidlohr Bueso <davidlohr.bueso@...com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
hhuang@...hat.com, "Low, Jason" <jason.low2@...com>,
Michel Lespinasse <walken@...gle.com>,
Larry Woodman <lwoodman@...hat.com>,
"Vinod, Chegu" <chegu_vinod@...com>,
Stanislav Kinsbursky <skinsbursky@...allels.com>
Subject: Re: ipc,sem: sysv semaphore scalability
On Fri, 2013-03-29 at 12:26 -0700, Linus Torvalds wrote:
> On Fri, Mar 29, 2013 at 12:06 PM, Dave Jones <davej@...hat.com> wrote:
> >
> > Here's an oops I just hit..
> >
> > BUG: unable to handle kernel NULL pointer dereference at 000000000000000f
> > IP: [<ffffffff812c24ca>] testmsg.isra.5+0x1a/0x60
>
> Btw, looking at the code leading up to this, what the f*ck is wrong
> with the IPC stuff?
>
> It's using the generic list stuff for most of the lists, but then it
> open-codes the accesses.
>
> So instead of using
>
> for_each_entry(walk_msg, &msq->q_messages, m_list) {
> ..
> }
>
> the ipc/msg.c code does all that by hand, with
>
> tmp = msq->q_messages.next;
> while (tmp != &msq->q_messages) {
> struct msg_msg *walk_msg;
>
> walk_msg = list_entry(tmp, struct msg_msg, m_list);
> ...
> tmp = tmp->next;
> }
>
> Ugh. The code is near unreadable. And then it has magic memory
> barriers etc, implying that it doesn't lock the data structures, but
> no comments about them. See expunge_all() and pipelined_send().
>
> The code seems entirely random, and it's badly set up (annoyance of
> the day: crazy helper functions in ipc/msgutil.c to make sure that (a)
> you have to spend more effort looking for them, and (b) they won't get
> inlined).
>
> Clearly nobody has cared for the crazy IPC message code in a long time.
Exactly that's what my patch series does; clean this mess up.
This is what I wrote to Andrew a couple of days ago.
On Tue, 2013-03-26 at 22:33 -0400, Peter Hurley wrote:
I just figured out how the queue is being corrupted and why my series
> fixes it.
>
>
> With MSG_COPY set, the queue scan can exit with the local variable
'msg'
> pointing to a real msg if the msg_counter never reaches the
copy_number.
>
> The remaining execution looks like this:
>
> if (!IS_ERR(msg)) {
> ....
> if (msgflg & MSG_COPY)
> goto out_unlock;
> ....
>
> out_unlock:
> msg_unlock(msq);
> break;
> }
> }
> if (IS_ERR(msg))
> ....
>
> bufsz = msg_handler();
> free_msg(msg); <<---- msg never unlinked
>
>
> Since the msg should not have been found (because it failed the match
> criteria), the if (!IS_ERR(msg)) clause should never have executed.
>
> That's why my refactor fixes resolve this; because msg is not
> inadvertently treated as a found msg.
>
> But let's be honest; the real bug here is the poor structure of this
> function that even made this possible. The deepest nesting executes a
> goto to a label in the middle of an if clause. Yuck! No wonder this
> thing's fragile.
>
> So my recommendation still stands. The series that fixes this has been
> getting tested in linux-next for a month. Fixing this some other way
is
> just asking for more trouble.
>
> But why not just revert MSG_COPY altogether for 3.9?
Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists