[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211227080350.GA469126@ogabbay-vm-u20.habana-labs.com>
Date: Mon, 27 Dec 2021 10:03:50 +0200
From: Oded Gabbay <ogabbay@...nel.org>
To: gregkh@...uxfoundation.org
Cc: linux-kernel@...r.kernel.org
Subject: [git pull] habanalabs pull request for kernel 5.17
Hi Greg,
This is habanalabs pull request for the merge window of kernel 5.17.
It mainly enhances the driver to deal with extreme cases, such as
reset-during-reset, events during reset and allowing monitoring
applications to continue running during reset.
Full details are in the tag.
Thanks,
Oded
The following changes since commit 1bb866dcb8cf5054de88f592fc0ec1f275ad9d63:
Merge tag 'iio-for-5.17a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next (2021-12-22 12:33:01 +0100)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git misc-habanalabs-next-2021-12-27
for you to fetch changes up to ce80098db2439ee44403ec6fccd3a10be21c7aff:
habanalabs: support hard-reset scheduling during soft-reset (2021-12-26 14:42:31 +0200)
----------------------------------------------------------------
This tag contains habanalabs driver changes for v5.17:
- Support reset-during-reset. In case the f/w notifies the driver
that the f/w is going to reset the device, the driver should
support that even if it is in the middle of doing another
reset
- Support events from f/w that arrive during device resets.
These events would be ignored which is bad as critical errors
would not be reported and treated by the driver.
- Don't kill processes that hold the control device open during
hard-reset of the device. The control device operations can't
crash if done during hard-reset. And usually, only monitoring
applications are using the control device, so killing them
defies their purpose.
- Fix handling of hwmon nodes when working with legacy f/w
- Change the compute context pointer to be boolean. This pointer
was abused by multiple code paths that wanted fast access to
the compute context structure.
- Add uapi to fetch historical errors. This is necessary as errors
sometimes result in hard-reset where the user application is
being terminated.
- Optimize GAUDI's MMU cache invalidation.
- Add support for loading the latest f/w.
- Add uapi to fetch HBM replacement and pending rows information.
- Multiple bug fixes to the reset code.
- Multiple bug fixes for Multi-CS ioctl code.
- Multiple bug fixes for wait-for-interrupt ioctl code.
- Many small bug fixes and cleanups.
----------------------------------------------------------------
Bharat Jauhari (3):
habanalabs: handle abort scenario for user interrupt
habanalabs: rename reset flags
habanalabs: refactor wait-for-user-interrupt function
Dani Liberman (6):
habanalabs: change wait for interrupt timeout to 64 bit
habanalabs: add support for fetching historic errors
habanalabs: fix race condition in multi CS completion
habanalabs: add SOB information to signal submission uAPI
habanalabs: enable access to info ioctl during hard reset
habanalabs: keep control device alive during hard reset
Guy Zadicario (1):
habanalabs/gaudi: fix debugfs dma channel selection
Oded Gabbay (16):
habanalabs/gaudi: recover from CPU WD event
habanalabs: make hdev creation code more readable
habanalabs: prevent false heartbeat message
habanalabs: abort reset on invalid request
habanalabs: fix soft reset accounting
habanalabs: rename late init after reset function
habanalabs/gaudi: return EPERM on non hard-reset
habanalabs: free signal handle on failure
habanalabs: remove redundant check on ctx_fini
habanalabs: save ctx inside encaps signal
habanalabs: fix etr asid configuration
habanalabs: add helper to get compute context
habanalabs: remove compute context pointer
habanalabs: remove in_debug check in device open
habanalabs: fix hwmon handling for legacy f/w
habanalabs: replace some -ENOTTY with -EINVAL
Ofir Bitton (18):
habanalabs: expand clock throttling information uAPI
habanalabs: debugfs support for larger I2C transactions
habanalabs: handle device TPM boot error as warning
habanalabs: fix possible deadlock in cache invl failure
habanalabs: move device boot warnings to the correct location
habanalabs: add more info ioctls support during reset
habanalabs: change misleading IRQ warning during reset
habanalabs: handle events during soft-reset
habanalabs: return correct clock throttling period
habanalabs: add current PI value to cpu packets
habanalabs: sysfs support for two infineon versions
habanalabs: expose soft reset sysfs nodes for inference ASIC
habanalabs: modify cpu boot status error print
habanalabs: fix endianness when reading cpld version
habanalabs: fix comments according to kernel-doc
habanalabs: refactor reset information variables
habanalabs: add a lock to protect multiple reset variables
habanalabs: support hard-reset scheduling during soft-reset
Ohad Sharabi (11):
habanalabs: modify wait for boot fit in dynamic FW load
habanalabs: revise and document use of boot status flags
habanalabs: adding indication of boot fit loaded
habanalabs: use variable poll interval for fw loading
habanalabs: don't clear previous f/w indications
habanalabs: skip PLL freq fetch
habanalabs: skip read fw errors if dynamic descriptor invalid
habanalabs: wait again for multi-CS if no CS completed
habanalabs: clean MMU headers definitions
habanalabs: prevent wait if CS in multi-CS list completed
habanalabs: handle skip multi-CS if handling not done
Rajaravi Krishna Katta (2):
habanalabs: add dedicated message towards f/w to set power
habanalabs: Move frequency change thread to goya_late_init
Tomer Tayar (5):
habanalabs: align debugfs documentation to alphabetical order
habanalabs: add power information type to POWER_GET packet
habanalabs: pass reset flags to reset thread
habanalabs: add missing kernel-doc comments for hl_device fields
habanalabs: add CPU-CP packet for engine core ASID cfg
Yuri Nudelman (5):
habanalabs: print va_range in vm node debugfs
habanalabs: wrong VA size calculation
habanalabs: make last_mask an MMU property
habanalabs: add enum mmu_op_flags
habanalabs: partly skip cache flush when in PMMU map flow
farah kassabri (3):
habanalabs/gaudi: Fix collective wait bug
habanalabs: add new opcodes for INFO IOCTL
habanalabs: change wait_for_interrupt implementation
.../ABI/testing/debugfs-driver-habanalabs | 23 +-
drivers/misc/habanalabs/common/command_buffer.c | 46 ++-
.../misc/habanalabs/common/command_submission.c | 389 +++++++++++++++------
drivers/misc/habanalabs/common/context.c | 39 ++-
drivers/misc/habanalabs/common/debugfs.c | 97 +++--
drivers/misc/habanalabs/common/device.c | 387 ++++++++++----------
drivers/misc/habanalabs/common/firmware_if.c | 253 ++++++++++----
drivers/misc/habanalabs/common/habanalabs.h | 301 +++++++++++-----
drivers/misc/habanalabs/common/habanalabs_drv.c | 150 ++++----
drivers/misc/habanalabs/common/habanalabs_ioctl.c | 195 +++++++++--
drivers/misc/habanalabs/common/hw_queue.c | 5 +-
drivers/misc/habanalabs/common/hwmon.c | 209 +++++++++--
drivers/misc/habanalabs/common/irq.c | 14 +-
drivers/misc/habanalabs/common/memory.c | 78 +++--
drivers/misc/habanalabs/common/mmu/mmu.c | 25 ++
drivers/misc/habanalabs/common/mmu/mmu_v1.c | 18 +-
drivers/misc/habanalabs/common/sysfs.c | 56 ++-
drivers/misc/habanalabs/gaudi/gaudi.c | 313 ++++++++++++-----
drivers/misc/habanalabs/gaudi/gaudiP.h | 4 +-
drivers/misc/habanalabs/gaudi/gaudi_coresight.c | 4 +-
drivers/misc/habanalabs/goya/goya.c | 165 +++++++--
drivers/misc/habanalabs/goya/goyaP.h | 14 +-
drivers/misc/habanalabs/goya/goya_coresight.c | 4 +-
drivers/misc/habanalabs/goya/goya_hwmgr.c | 31 +-
drivers/misc/habanalabs/include/common/cpucp_if.h | 62 +++-
.../misc/habanalabs/include/common/hl_boot_if.h | 4 +
.../habanalabs/include/hw_ip/mmu/mmu_general.h | 19 +-
.../misc/habanalabs/include/hw_ip/mmu/mmu_v1_0.h | 18 +-
.../misc/habanalabs/include/hw_ip/mmu/mmu_v1_1.h | 20 +-
include/uapi/misc/habanalabs.h | 166 +++++++--
30 files changed, 2185 insertions(+), 924 deletions(-)
Powered by blists - more mailing lists