[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1237485040.4752.16.camel@kitka.ibm.com>
Date: Thu, 19 Mar 2009 18:50:40 +0100
From: Martin Peschke <mpeschke@...ux.vnet.ibm.com>
To: Tom Zanussi <tzanussi@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-s390@...r.kernel.org
Subject: Re: PROBLEM: relay - stale data copied to user space
On Wed, 2009-03-18 at 23:19 -0500, Tom Zanussi wrote:
> On Wed, 2009-03-18 at 16:07 +0100, Martin Peschke wrote
> > This is my theory:
> > Timing matters. It's a race caused by improper protection of critical
> > sections in a producer-consumer scenario. A bug in the bookkeeping
> > allows a reader to read at a position that is just being written to.
> >
>
> It does look consistent with a reader reading an event that's been
> reserved but not yet written, or partially written e.g. if an event
> being written on one cpu was read by another before the first one
> finished.
So this is part of relay's design, and it's up to user space to make
sure that reader and writer are on the same CPU?
> Can you see if the below patch to blktrace userspace helps?
It appears to fix it. I will give it more testing in a larger
environment.
> Or failing that, explicitly using gettid() in place of getpid() in
> sched_setaffinity(). Or, failing that, you had mentioned previously
> that you would try to reproduce the problem on your laptop - were you
> able to do that? If so, it would help in debugging it further...
This didn't work out. But then, it's a single-CPU machine.
Thanks,
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists