[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180403070412.GH3313@nanopsycho>
Date: Tue, 3 Apr 2018 09:04:12 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
Ganesh GR <ganeshgr@...lsio.com>,
Nirranjan Kirubaharan <nirranjan@...lsio.com>,
Indranil Choudhury <indranil@...lsio.com>
Subject: Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware
dump in second kernel
Mon, Apr 02, 2018 at 02:30:45PM CEST, rahul.lakkireddy@...lsio.com wrote:
>On Monday, April 04/02/18, 2018 at 14:41:43 +0530, Jiri Pirko wrote:
>> Fri, Mar 30, 2018 at 08:42:00PM CEST, ebiederm@...ssion.com wrote:
>> >Rahul Lakkireddy <rahul.lakkireddy@...lsio.com> writes:
>> >
>> >> On Friday, March 03/30/18, 2018 at 16:09:07 +0530, Jiri Pirko wrote:
>> >>> Sat, Mar 24, 2018 at 11:56:33AM CET, rahul.lakkireddy@...lsio.com wrote:
>> >>> >Add a new module crashdd that exports the /sys/kernel/crashdd/
>> >>> >directory in second kernel, containing collected hardware/firmware
>> >>> >dumps.
>> >>> >
>> >>> >The sequence of actions done by device drivers to append their device
>> >>> >specific hardware/firmware logs to /sys/kernel/crashdd/ directory are
>> >>> >as follows:
>> >>> >
>> >>> >1. During probe (before hardware is initialized), device drivers
>> >>> >register to the crashdd module (via crashdd_add_dump()), with
>> >>> >callback function, along with buffer size and log name needed for
>> >>> >firmware/hardware log collection.
>> >>> >
>> >>> >2. Crashdd creates a driver's directory under
>> >>> >/sys/kernel/crashdd/<driver>. Then, it allocates the buffer with
>> >>>
>> >>> This smells. I need to identify the exact ASIC instance that produced
>> >>> the dump. To identify by driver name does not help me if I have multiple
>> >>> instances of the same driver. This looks wrong to me. This looks like
>> >>> a job for devlink where you have 1 devlink instance per 1 ASIC instance.
>> >>>
>> >>> Please see:
>> >>> http://patchwork.ozlabs.org/project/netdev/list/?series=36524
>> >>>
>> >>> I bevieve that the solution in the patchset could be used for
>> >>> your usecase too.
>> >>>
>> >>>
>> >>
>> >> The sysfs approach proposed here had been dropped in favour exporting
>> >> the dumps as ELF notes in /proc/vmcore.
>> >>
>> >> Will be posting the new patches soon.
>> >
>> >The concern was actually how you identify which device that came from.
>> >Where you read the identifier changes but sysfs or /proc/vmcore the
>> >change remains valid.
>>
>> Yeah. I still don't see how you link the dump and the device.
>
>In our case, the dump and the device are being identified by the
>driver’s name followed by its corresponding pci bus id. I’ve posted an
>example in my v3 series:
>
>https://www.spinics.net/lists/netdev/msg493781.html
>
>Here’s an extract from the link above:
>
># readelf -n /proc/vmcore
>
>Displaying notes found at file offset 0x00001000 with length 0x04003288:
>Owner Data size Description
>VMCOREDD_cxgb4_0000:02:00.4 0x02000fd8 Unknown note type:(0x00000700)
>VMCOREDD_cxgb4_0000:04:00.4 0x02000fd8 Unknown note type:(0x00000700)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>CORE 0x00000150 NT_PRSTATUS (prstatus structure)
>VMCOREINFO 0x0000074f Unknown note type: (0x00000000)
>
>Here, for my two devices, the dump’s names are
>VMCOREDD_cxgb4_0000:02:00.4 and VMCOREDD_cxgb4_0000:04:00.4.
>
>It’s really up to the callers to write their own unique name for the
>dump. The name is appended to “VMCOREDD_” string.
>
>> Rahul, did you look at the patchset I pointed out?
>
>For devlink, I think the dump name would be identified by
>bus_type/device_name; i.e. “pci/0000:02:00.4” for my example.
>Is my understanding correct?
Yes.
>
>Thanks,
>Rahul
Powered by blists - more mailing lists