[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080319091647.GA11369@schmichrtp.de.ibm.com>
Date: Wed, 19 Mar 2008 10:16:48 +0100
From: Christof Schmitt <christof.schmitt@...ibm.com>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: David Wilder <dwilder@...ibm.com>, linux-btrace@...r.kernel.org,
linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
wilder@...ibm.com
Subject: Re: blktrace/relay/s390: Oops in subbuf_splice_actor
On Mon, Mar 17, 2008 at 04:19:16PM +0100, Christof Schmitt wrote:
> On Mon, Mar 17, 2008 at 09:08:37AM +0100, Jens Axboe wrote:
> > On Fri, Mar 14 2008, Christof Schmitt wrote:
> > > On Fri, Mar 14, 2008 at 08:21:03AM -0700, David Wilder wrote:
> > > > Christof Schmitt wrote:
> > > > >On Fri, Mar 14, 2008 at 02:10:07PM +0100, Jens Axboe wrote:
> > > > >>On Fri, Mar 14 2008, Christof Schmitt wrote:
> > > > >>>On Fri, Mar 14, 2008 at 12:58:03PM +0100, Jens Axboe wrote:
> > > > >>>>That is indeed a bug, does this work for you?
> > > > >>>>
> > > > >>>>diff --git a/kernel/relay.c b/kernel/relay.c
> > > > >>>>index d080b9d..39d1fa8 100644
> > > > >>>>--- a/kernel/relay.c
> > > > >>>>+++ b/kernel/relay.c
> > > > >>>>@@ -1066,7 +1066,7 @@ static int subbuf_splice_actor(struct file *in,
> > > > >>>> unsigned int flags,
> > > > >>>> int *nonpad_ret)
> > > > >>>> {
> > > > >>>>- unsigned int pidx, poff, total_len, subbuf_pages, ret;
> > > > >>>>+ unsigned int pidx, poff, total_len, subbuf_pages, nr_pages, ret;
> > > > >>>> struct rchan_buf *rbuf = in->private_data;
> > > > >>>> unsigned int subbuf_size = rbuf->chan->subbuf_size;
> > > > >>>> uint64_t pos = (uint64_t) *ppos;
> > > > >>>>@@ -1098,7 +1098,9 @@ static int subbuf_splice_actor(struct file *in,
> > > > >>>> pidx = (read_start / PAGE_SIZE) % subbuf_pages;
> > > > >>>> poff = read_start & ~PAGE_MASK;
> > > > >>>>
> > > > >>>>- for (total_len = 0; spd.nr_pages < subbuf_pages; spd.nr_pages++) {
> > > > >>>>+ nr_pages = min_t(unsigned int, subbuf_pages, PIPE_BUFFERS);
> > > > >>>>+
> > > > >>>>+ for (total_len = 0; spd.nr_pages < nr_pages; spd.nr_pages++) {
> > > > >>>> unsigned int this_len, this_end, private;
> > > > >>>> unsigned int cur_pos = read_start + total_len;
> > > > >>>With the patch, i can run dd and 'blktrace -h traceserver' without the
> > > > >>>oops. But the output from blktrace only contains only zeros and no
> > > > >>>usable data for blkparse. Using blktrace to write the data directly to
> > > > >>>disk, without using the blktrace server works. Is there anything i
> > > > >>>should look for to help debugging the problem?
> > > > >>We should probably get Tom in the loop, as he is the relay expert. I'll
> > > > >>make sure the above patch gets into 2.6.25, as it is definitely a bug
> > > > >>that needs fixing.
> > > > >
> > > > >http://relayfs.sourceforge.net/contact.html mentions Tom Zanussi, but
> > > > >his email address seems to be no longer valid. I copy Dave Wilder
> > > > >here, since he is mentioned as relay maintainer on the web page.
> > > > >
> > > > >Dave, can you have a look at this? I can easily reproduce the problem
> > > > >on s390 Linux for testing and getting more debug information.
> > > > >
> > > > >Christof
> > > > >--
> > > > >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > >the body of a message to majordomo@...r.kernel.org
> > > > >More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > >Please read the FAQ at http://www.tux.org/lkml/
> > > > >
> > > >
> > > > I will take a look at the problem.
> > > > Any thing special about the test you are running I need to know?
> > >
> > > The test is about using the blktrace network feature on a 64 bit s390
> > > guest on z/VM. I don't know if 64 bit and z/VM are related to the
> > > problem scenario, it is the only setup i tested. To reproduce, start
> > > somewhere 'blktrace -l' as a server and issue
> > > blktrace -h tracehostname
> > > dd if=/dev/sda of=/dev/null
> > > to start blktrace and some I/O load.
> >
> > If the problem is in the splice part of relay, then the problem will
> > likely go away if you add the -s parameter to blktrace. Might be handy
> > to know if you are experimenting.
>
> I still have your patch applied. blktrace with -s returns the trace
> data, as expected and blktrace without the -s returns data, but
> nothing useful for blkparse.
Two issues i found: First, in subbuf_splice_actor the function pointer
spd_release is not initialized, but later, splice_to_pipe calls
while (page_nr < spd_pages)
spd->spd_release(spd, page_nr++);
I would guess that the function pointer should point to
spd_release_page or something similar.
The second one: It seems that the loop in splice_direct_to_actor first
calls do_splice_to to get the trace data into the pipe and then
actor, to move the data out to the network socket. With some debug
printks, i see that splice_to_pipe waits on the call pipe_wait(pipe)
until there is some free space again. But if the same process should
loop through the read and write parts, the pipe_wait would deadlock
the process, or am i missing something here?
Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists