linux-kernel - IIO (+ more general?) Error condition handling (e.g. wire fell out errors)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E858F5A.5000901@cam.ac.uk>
Date:	Fri, 30 Sep 2011 10:43:54 +0100
From:	Jonathan Cameron <jic23@....ac.uk>
To:	"linux-iio@...r.kernel.org" <linux-iio@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: IIO (+ more general?) Error condition handling (e.g. wire fell out
 errors)

I'm not sure how general this is, hence sent to LKML and IIO.

Some of the devices we have can produce interrupts indicating
that everything has gone horribly wrong. Sometimes this is
also embedded in the main data stream.

The best examples are all the 'wonderful' ways the ad2s1210 resolver
can go wrong.  None of these would be expected to be part of normal
operating conditions.

So, the question is - do we want these to go through our main
events stream?  They are weird and wonderful and often don't
align well with our existing event codes.  Loss of data error
for example (a wire came out).  Others could possibly be made
to fit but that would loose some of the semantics.  Loss of
tracking could be a rate of change threshold event, but it
really is meant to tell you the data could be garbage.

So three options come to mind:

1) Have a magic set of event codes for device class specific
fault events and push through our main event path.  For reference of
non IIO types - that is an anon file descriptor obtained by an
ioctl on the devices chrdev.  Events are a simple timestamp / eventcode
pair.

2)  Add another anon file that you get via an ioctl as a separate
event reporting channel. We then define a big list of error codes
without aiming for any real structure.  Drivers would then need to
document what they can throw out (preferably under sysfs?)

3) Consider these out of band (from the out of band event data)
and look at other options for reporting them.

Is there anything general out there for reporting hardware failures
that would be appropriate?  Sometime these conditions are the sort
of thing that should cause a siren to go off.
They might be sensor failure.... or they might mean you have
a runaway train heading for a station... 

(p.s. I hope no one is using the current driver for trains, though
that might explain British trains...)

Thanks,

Jonathan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/