[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230515065212.GA522679@hu-pkondeti-hyd.qualcomm.com>
Date: Mon, 15 May 2023 12:22:12 +0530
From: Pavan Kondeti <quic_pkondeti@...cinc.com>
To: Mukesh Ojha <quic_mojha@...cinc.com>
CC: <keescook@...omium.org>, <gregkh@...uxfoundation.org>,
<tony.luck@...el.com>, <gpiccoli@...lia.com>,
<johannes@...solutions.net>, <rafael@...nel.org>,
<linux-hardening@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<quic_shashim@...cinc.com>, <quic_pkondeti@...cinc.com>
Subject: Re: RFC design of device coredump collection on panic in Pstore
On Mon, May 08, 2023 at 09:21:00PM +0530, Mukesh Ojha wrote:
> Hi All,
>
> We are exploring a way where device driver(registered/interested and optional could be
> callback) can dump their data(consist of list of buffers or hardware registers) for post
> mortam debugging of a crash via dumping the content to pstore region. Some of the example
> data could be like clock dump/regulators/ etc.
>
> These stuff should be already part of entire RAM dump but in some cases it's however not
> feasible to capture the entire content of RAM, so was thinking if pstore region can be
> used to capture above information either in the form of elf or binary and how pstore
> can get this in human readable format can be discussed further.
>
> Also, existing devcoredump driver does not cover panic usecases so we thought of something
> like below RFC design where,
>
> 1. Device_coredump allocates some configurable contigous memory that can be controlled
> via CONFIG or bootargs and later registers for panic notifiers.
> 2. Notifier gets added.
> 3. Pstore adds device_coredump as its front-end via dumper registration similar
> to kmsg being dump today.
> 4. Successful registration of dumper.
> 5. A device driver(A-Z) can register their buffer to be dumped as part of panic.
> 6. buffer gets added to the dump list.
> 7. Panic occurs.
> 8. iterate over registered drivers and copy their dump list to its own memory and if
> it crosses device core dump memory log an error stop iterating.
> 9. Similar to devcore_dump() inline with kmsg_dump()
> 10.Copy the content to pstore region and this could be elf or raw binary that can be
> discussed.
>
>
> Device coredump memory(1) size could be passed from pstore and should be same as size
> of devcoredump frontend size given in DT or some other way.
>
> Let me know your concern and view on this.
>
>
> pstore device_coredump deviceA..Z panic
> ┼ │
> │ │ panic_notifier(1) │
> │ ├───────────────────┬─────────────────►│
> │ │ notifier added │ (2) │
> │ │◄──────────────────┼──────────────────┤
> pstore can give │ (3) │ │ │
> its region for │ dumper_registration │ │ │
> dump ├─────────────────────►│ │ │
> │ (4) │ │ │
> │◄─────────────────────┤ │ │
> │ register_dumper │ │ │
> │ │ │ │
> │ │ │ │
> │ │ │ init │
> │ │ ───┼── │
> │ │ (5) │ │
> │ │devcoredump_register(dev,buf)); │
> │ │◄──────────────────┐ │
> │ │ (6) │ │
> │ ├──────────────────►│ │
> │ │device/buf gets add│d │
> │ │ to the list │ │panic
> │ │ │ ──┼───
> │ │ │ │
> │ │(7) panic_notifier │ call │
> │ │◄──────────────────┼──────────────────┤
> │ │ prepare coredump │for the deviceA-Z │
> │ ├──────────────────►├────┐ │
> │ │ (8) │ │ │
> │ │ │ │ │
> │ │ │ │ │
> │ │ │◄───┘ │
> dumper will │ │ (9) │dev_coredump()
> write all the device dump to │◄──────────────────┼──────────────────┤
> pstore ┌◄───┬─────────────────┤ │ │
> region │ │ (10) │ │ │
> │ │ │ │ │
> │ │ │ │ │
> │◄───┘ │ │ │
> │ │ │ │
> │ │
>
>
I really like the idea of providing an interface for device drivers to
dump specific data for offline use. Currently in MSM/QCOM downstream
kernel, we do register for panic notifiers and dump relavant data for
later parsing/analysis. In some cases, this message gets printed to console
so that it would also available in pstore / extracted dmesg. However
having separate record/buffer for individual drivers would help.
I believe dev_coredump() is just an analogy here as it is meant for
taking a dump of the device when the driver thinks its device/firmware
is hung/crashed etc, AFAICT. IIUC, What we are looking for here a different
interface/semantics i.e querying devices to dump any information when
the *system* is crashed/paniced. May be something like dev_panicdump()
and a separate method in device_driver.
Also, are there any use cases for this dev_panicdump() outside the
pstore()? For ex: it would still be helpful to gather all the relavant
data of a device and put it a separate buffer. The list of such buffers
can be parsed offline via standard ramdump analysis.
Thanks,
Pavan
Powered by blists - more mailing lists