linux-kernel - Re: [PATCH] scsi: Add SCSI error events, sent as kobject uevents by mid-layer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPE3x1735+kU2yRb18OroQoaXQddZTx6XjT+0pghBNYcO3h+zw@mail.gmail.com>
Date: Mon, 9 Jun 2025 12:02:00 -0700
From: Salomon Dushimirimana <salomondush@...gle.com>
To: James Bottomley <James.Bottomley@...senpartnership.com>
Cc: "Martin K . Petersen" <martin.petersen@...cle.com>, linux-scsi@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] scsi: Add SCSI error events, sent as kobject uevents by mid-layer

The team considered the use of SMART tools. While hdd smart data
has info on data command errors, they are not suitable for non data
commands that we use internally. We are unable to use tracing due to
overflow / overrun issues. However we are exploring some other
alternatives like eBPF that can fit well in our infrastructure.

Thanks,
Salomon Dushimirimana


On Thu, May 15, 2025 at 1:11 PM James Bottomley
<James.Bottomley@...senpartnership.com> wrote:
>
> On Thu, 2025-05-15 at 13:03 -0700, Salomon Dushimirimana wrote:
> > Hi,
> >
> > I agree with the recommended use of ftrace or blktrace for tracing.
>
> Great; what made me think of tracing is that your event emits for every
> error or retry which seemed like quite an overhead.  Conditioning it on
> a config parameter really isn't useful to distributions, so using the
> tracepoint system would solve both the quantity and the activation
> problem.
>
> > However, our primary goal for using uevents was not merely for
> > collecting trace information. We are using uevents as a notification
> > mechanism for userspace workflows to determine repair workflows (swap
> > / remove a failing device).
>
> If you're collecting stats for predictive failure, how is this proposed
> active mechanism more effective than the passive one of simply using
> the existing SMART monitor tools?
>
> Regards,
>
> James
>