lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPE3x1735+kU2yRb18OroQoaXQddZTx6XjT+0pghBNYcO3h+zw@mail.gmail.com>
Date: Mon, 9 Jun 2025 12:02:00 -0700
From: Salomon Dushimirimana <salomondush@...gle.com>
To: James Bottomley <James.Bottomley@...senpartnership.com>
Cc: "Martin K . Petersen" <martin.petersen@...cle.com>, linux-scsi@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] scsi: Add SCSI error events, sent as kobject uevents by mid-layer

The team considered the use of SMART tools. While hdd smart data
has info on data command errors, they are not suitable for non data
commands that we use internally. We are unable to use tracing due to
overflow / overrun issues. However we are exploring some other
alternatives like eBPF that can fit well in our infrastructure.

Thanks,
Salomon Dushimirimana


On Thu, May 15, 2025 at 1:11 PM James Bottomley
<James.Bottomley@...senpartnership.com> wrote:
>
> On Thu, 2025-05-15 at 13:03 -0700, Salomon Dushimirimana wrote:
> > Hi,
> >
> > I agree with the recommended use of ftrace or blktrace for tracing.
>
> Great; what made me think of tracing is that your event emits for every
> error or retry which seemed like quite an overhead.  Conditioning it on
> a config parameter really isn't useful to distributions, so using the
> tracepoint system would solve both the quantity and the activation
> problem.
>
> > However, our primary goal for using uevents was not merely for
> > collecting trace information. We are using uevents as a notification
> > mechanism for userspace workflows to determine repair workflows (swap
> > / remove a failing device).
>
> If you're collecting stats for predictive failure, how is this proposed
> active mechanism more effective than the passive one of simply using
> the existing SMART monitor tools?
>
> Regards,
>
> James
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ