linux-kernel - Re: [v6 PATCH] block: introduce block_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 4 Feb 2022 02:46:28 +0000
From:   Chaitanya Kulkarni <chaitanyak@...dia.com>
To:     Yang Shi <shy828301@...il.com>
CC:     "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "hch@...radead.org" <hch@...radead.org>
Subject: Re: [v6 PATCH] block: introduce block_rq_error tracepoint

Yang,

On 2/3/22 12:12, Yang Shi wrote:
> Currently, rasdaemon uses the existing tracepoint block_rq_complete
> and filters out non-error cases in order to capture block disk errors.
> 
> But there are a few problems with this approach:
> 
> 1. Even kernel trace filter could do the filtering work, there is
>     still some overhead after we enable this tracepoint.
> 
> 2. The filter is merely based on errno, which does not align with kernel
>     logic to check the errors for print_req_error().
> 
> 3. block_rq_complete only provides dev major and minor to identify
>     the block device, it is not convenient to use in user-space.
> 
> So introduce a new tracepoint block_rq_error just for the error case.
> With this patch, rasdaemon could switch to block_rq_error.
> 

This patch looks good, but I've a question for you.

We already have a tracepoint for the request completion
block_rq_complete(). We are adding a new tracepoint blk_rq_error()
that is also similar to what blk_rq_complete() reports.
Similar call sites  :-
trace_block_rq_complete(req, error, nr_bytes);
trace_block_rq_error(req, error, nr_bytes);

The only delta between blk_rq_complete() and blk_rq_error() is
cmd field for blk_rq_complete() in the TP_STRUCT_ENTRY() and
__get_str(cmd) field in TP_printk() which I don't think will
have any issue if we use that for blk_rq_error().

Question 1 :- What prevents us from using the same format for
both blk_rq_complete() and blk_rq_error() ?

Question 2 :- assuming that blk_rq_complete() and blk_rq_error()
are using same format why can't we :-

declare DECLARE_EVENT_CLASS(blk_rq_completion....)
and use that class for blk_rq_complete() and blk_rq_error() ?

since if I remember correctly we need to define a event class
instead of duplicating a tracepoint with similar reporting.

-ck