[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzko1izwBERS6auEna+eAGzQVA7zkDihMjT=tt_EBdhfmaA@mail.gmail.com>
Date: Fri, 4 Feb 2022 10:08:57 -0800
From: Yang Shi <shy828301@...il.com>
To: Chaitanya Kulkarni <chaitanyak@...dia.com>
Cc: "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"axboe@...nel.dk" <axboe@...nel.dk>,
"hch@...radead.org" <hch@...radead.org>
Subject: Re: [v6 PATCH] block: introduce block_rq_error tracepoint
On Thu, Feb 3, 2022 at 6:46 PM Chaitanya Kulkarni <chaitanyak@...dia.com> wrote:
>
> Yang,
>
> On 2/3/22 12:12, Yang Shi wrote:
> > Currently, rasdaemon uses the existing tracepoint block_rq_complete
> > and filters out non-error cases in order to capture block disk errors.
> >
> > But there are a few problems with this approach:
> >
> > 1. Even kernel trace filter could do the filtering work, there is
> > still some overhead after we enable this tracepoint.
> >
> > 2. The filter is merely based on errno, which does not align with kernel
> > logic to check the errors for print_req_error().
> >
> > 3. block_rq_complete only provides dev major and minor to identify
> > the block device, it is not convenient to use in user-space.
> >
> > So introduce a new tracepoint block_rq_error just for the error case.
> > With this patch, rasdaemon could switch to block_rq_error.
> >
>
> This patch looks good, but I've a question for you.
>
> We already have a tracepoint for the request completion
> block_rq_complete(). We are adding a new tracepoint blk_rq_error()
> that is also similar to what blk_rq_complete() reports.
> Similar call sites :-
> trace_block_rq_complete(req, error, nr_bytes);
> trace_block_rq_error(req, error, nr_bytes);
>
> The only delta between blk_rq_complete() and blk_rq_error() is
> cmd field for blk_rq_complete() in the TP_STRUCT_ENTRY() and
> __get_str(cmd) field in TP_printk() which I don't think will
> have any issue if we use that for blk_rq_error().
Yes, I agree. Just no user needs it for our usecase.
>
> Question 1 :- What prevents us from using the same format for
> both blk_rq_complete() and blk_rq_error() ?
Actually nothing if we ignore cmd.
>
> Question 2 :- assuming that blk_rq_complete() and blk_rq_error()
> are using same format why can't we :-
>
> declare DECLARE_EVENT_CLASS(blk_rq_completion....)
> and use that class for blk_rq_complete() and blk_rq_error() ?
>
> since if I remember correctly we need to define a event class
> instead of duplicating a tracepoint with similar reporting.
Very good point. I did overlook it. The original post did have disk
name and didn't have cmd, now the two tracepoints look much more
similar than the original post, so I agree the duplicate could be
combined into an event class.
>
> -ck
>
>
Powered by blists - more mailing lists