linux-kernel - Re: [PATCH v5 4/4] memory: tegra: Add MC error logging support for Tegra264

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWTZBalnJQEBZnVn@orome>
Date: Mon, 12 Jan 2026 13:49:50 +0100
From: Thierry Reding <thierry.reding@...il.com>
To: Ketan Patil <ketanp@...dia.com>
Cc: krzk@...nel.org, jonathanh@...dia.com, linux-kernel@...r.kernel.org, 
	linux-tegra@...r.kernel.org
Subject: Re: [PATCH v5 4/4] memory: tegra: Add MC error logging support for
 Tegra264

On Fri, Dec 19, 2025 at 11:43:54AM +0000, Ketan Patil wrote:
> In Tegra264, different components from memory subsystems like Memory
> Controller Fabric (MCF), HUB, HUB Common (HUBC), Side Band Shim (SBS)
> and MC Channels have different interrupt lines for receiving memory
> controller error interrupts.
> 
> Add support for logging memory controller errors on Tegra264.
> - Add MC error handling flow for MCF, HUB, HUBC, SBS and MC Channel.
> - Each of these components have different interrupt lines for MC error.
> - Register interrupt handlers for interrupts from these different MC
> components.
> - There is no global interrupt status register in Tegra264 unlike older
> Tegra chips.
> - There are common interrupt status registers in case of MCF, HUB, HUBC
> from which figure out the slice number or hub number responsible for
> generating interrupt and then read interrupt status register to find out
> type of violation.
> - Introduce new SoC specific fields in tegra_mc_soc like interrupt mask
> values for MCF, HUB, HUBC etc., which are SOC specific.
> 
> Signed-off-by: Ketan Patil <ketanp@...dia.com>
> ---
>  drivers/memory/tegra/mc.c       |  49 +++-
>  drivers/memory/tegra/mc.h       |  78 +++++-
>  drivers/memory/tegra/tegra186.c |  10 +-
>  drivers/memory/tegra/tegra194.c |   2 +
>  drivers/memory/tegra/tegra20.c  |  10 +-
>  drivers/memory/tegra/tegra234.c |   2 +
>  drivers/memory/tegra/tegra264.c | 439 +++++++++++++++++++++++++++++++-
>  drivers/memory/tegra/tegra30.c  |   2 +
>  include/soc/tegra/mc.h          |   7 +
>  9 files changed, 569 insertions(+), 30 deletions(-)

Hi Ketan,

I share Jon's concerns about this becoming overwhelmingly complex to
understand and review. Let me try to provide a few ideas below on how we
could potentially improve this.

> diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c
[...]
> @@ -976,11 +976,36 @@ static int tegra_mc_probe(struct platform_device *pdev)
>  			}
>  		}
>  
> -		if (mc->soc->num_channels)
> -			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->intmask,
> -				     MC_INTMASK);
> -		else
> -			mc_writel(mc, mc->soc->intmask, MC_INTMASK);
> +		if (mc->soc->ops->num_interrupts > 1) {
> +			/* Unmask MCF interrupts */
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mcf_intmask, MCF_INTMASK_0);
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mcf_intmask,
> +				MCF_INTPRIORITY_0);
> +
> +			/* Unmask HUB and HUBC interrupts */
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hub_intmask,
> +				MSS_HUB_INTRMASK_0);
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hub_intmask,
> +				MSS_HUB_INTRPRIORITY_0);
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hubc_intmask,
> +				MSS_HUB_HUBC_INTMASK_0);
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->hubc_intmask,
> +				MSS_HUB_HUBC_INTPRIORITY_0);
> +
> +			/* Unmask SBS interrupts */
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->sbs_intmask,
> +				MSS_SBS_INTMASK_0);
> +
> +			/* Unmask MC channel interrupt */
> +			mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->mc_ch_intmask,
> +				MC_CH_INTMASK_0);
> +		} else {
> +			if (mc->soc->num_channels)
> +				mc_ch_writel(mc, MC_BROADCAST_CHANNEL, mc->soc->intmask,
> +					MC_INTMASK);
> +			else
> +				mc_writel(mc, mc->soc->intmask, MC_INTMASK);
> +		}

This isn't all that bad, but I think we could probably extract this into
a per-SoC operation that is called from here instead. If for nothing
else it would help with the indentation level.

Something like mc->ops->setup_interrupts() or whatever other flavour you
prefer should do.

>  	}
>  
>  	if (mc->soc->reset_ops) {
> diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h
> index 06ae3dd37a47..fabbebf8a36c 100644
> --- a/drivers/memory/tegra/mc.h
> +++ b/drivers/memory/tegra/mc.h
> @@ -25,6 +25,7 @@
>  #define MC_INT_DECERR_MTS				BIT(16)
>  #define MC_INT_DECERR_GENERALIZED_CARVEOUT		BIT(17)
>  #define MC_INT_DECERR_ROUTE_SANITY			BIT(20)
> +#define MC_INT_DECERR_ROUTE_SANITY_GIC_MSI		BIT(21)
>  
>  #define MC_INTMASK					0x04
>  #define MC_GART_ERROR_REQ				0x30
> @@ -63,6 +64,60 @@
>  
>  #define MC_GLOBAL_INTSTATUS				0xf24
>  
> +/* Registers for MSS HUB */
> +#define MSS_HUB_GLOBAL_INTSTATUS_0			0x6000
> +#define MSS_HUBC_INTR					BIT(0)
> +
> +#define MSS_HUB_HUBC_INTSTATUS_0			0x6008
> +#define MSS_HUB_INTRSTATUS_0				0x600c
> +#define MSS_HUB_HUBC_INTMASK_0				0x6010
> +#define MSS_HUB_HUBC_SCRUB_DONE_INTMASK			BIT(0)
> +
> +#define MSS_HUB_HUBC_INTPRIORITY_0			0x6014
> +#define MSS_HUB_INTRMASK_0				0x6018
> +#define MSS_HUB_COALESCER_ERR_INTMASK			BIT(0)
> +#define MSS_HUB_SMMU_BYPASS_ALLOW_ERR_INTMASK		BIT(1)
> +#define MSS_HUB_ILLEGAL_TBUGRP_ID_INTMASK		BIT(2)
> +#define MSS_HUB_MSI_ERR_INTMASK				BIT(3)
> +#define MSS_HUB_POISON_RSP_INTMASK			BIT(4)
> +#define MSS_HUB_RESTRICTED_ACCESS_ERR_INTMASK		BIT(5)
> +#define MSS_HUB_RESERVED_PA_ERR_INTMASK			BIT(6)
> +
> +#define MSS_HUB_INTRPRIORITY_0				0x601c
> +#define MSS_HUB_SMMU_BYPASS_ALLOW_ERR_STATUS_0		0x6020
> +#define MSS_HUB_MSI_ERR_STATUS_0			0x6024
> +#define MSS_HUB_POISON_RSP_STATUS_0			0x6028
> +#define MSS_HUB_COALESCE_ERR_STATUS_0			0x60e0
> +#define MSS_HUB_COALESCE_ERR_ADR_HI_0			0x60e4
> +#define MSS_HUB_COALESCE_ERR_ADR_0			0x60e8
> +#define MSS_HUB_RESTRICTED_ACCESS_ERR_STATUS_0		0x638c
> +#define MSS_HUB_RESERVED_PA_ERR_STATUS_0		0x6390
> +#define MSS_HUB_ILLEGAL_TBUGRP_ID_ERR_STATUS_0		0x63b0
> +
> +/* Registers for MC Channel */
> +#define MC_CH_INTSTATUS_0				0x82d4
> +#define MC_CH_INTMASK_0					0x82d8
> +#define WCAM_ERR_INTMASK				BIT(19)
> +
> +#define MC_ERR_GENERALIZED_CARVEOUT_STATUS_1_0		0xbc74
> +
> +/* Registers for MCF */
> +#define MCF_COMMON_INTSTATUS0_0_0			0xce04
> +#define MCF_INTSTATUS_0					0xce2c
> +#define MCF_INTMASK_0					0xce30
> +#define MCF_INTPRIORITY_0				0xce34
> +
> +/* Registers for SBS */
> +#define MSS_SBS_INTSTATUS_0				0xec08
> +#define MSS_SBS_INTMASK_0				0xec0c
> +#define MSS_SBS_FILL_FIFO_ISO_OVERFLOW_INTMASK		BIT(0)
> +#define MSS_SBS_FILL_FIFO_SISO_OVERFLOW_INTMASK		BIT(1)
> +#define MSS_SBS_FILL_FIFO_NISO_OVERFLOW_INTMASK		BIT(2)
> +
> +/* Bit field of MC_ERR_ROUTE_SANITY_STATUS_0 register */
> +#define MC_ERR_ROUTE_SANITY_RW				BIT(12)
> +#define MC_ERR_ROUTE_SANITY_SEC				BIT(13)
> +
>  /* Bit field of MC_ERR_STATUS_0 register */
>  #define MC_ERR_STATUS_RW				BIT(16)
>  #define MC_ERR_STATUS_SECURITY				BIT(17)
> @@ -70,12 +125,22 @@
>  #define MC_ERR_STATUS_WRITABLE				BIT(26)
>  #define MC_ERR_STATUS_READABLE				BIT(27)
>  
> +#define MC_ERR_STATUS_GSC_ADR_HI_MASK			0xffff
> +#define MC_ERR_STATUS_GSC_ADR_HI_SHIFT			16
> +#define MC_ERR_STATUS_RT_ADR_HI_SHIFT			15
> +
>  #define MC_ERR_STATUS_TYPE_SHIFT			28
>  #define MC_ERR_STATUS_TYPE_INVALID_SMMU_PAGE		(0x6 << 28)
>  #define MC_ERR_STATUS_TYPE_MASK				(0x7 << 28)
> +#define MC_ERR_STATUS_RT_TYPE_MASK			(0xf << 28)
> +#define MC_ERR_STATUS_RT_TYPE_SHIFT			28
>  
>  #define MC_ERR_STATUS_ADR_HI_SHIFT			20
> -#define MC_ERR_STATUS_ADR_HI_MASK			0x3
> +
> +#define ERR_GENERALIZED_APERTURE_ID_SHIFT		0
> +#define ERR_GENERALIZED_APERTURE_ID_MASK		0x1F
> +#define ERR_GENERALIZED_CARVEOUT_APERTURE_ID_SHIFT	5
> +#define ERR_GENERALIZED_CARVEOUT_APERTURE_ID_MASK	0x1F
>  
>  #define MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE(x)		((x) & 0x1ff)
>  #define MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE_MASK		0x1ff
> @@ -188,15 +253,18 @@ extern const struct tegra_mc_ops tegra30_mc_ops;
>  
>  #if defined(CONFIG_ARCH_TEGRA_186_SOC) || \
>      defined(CONFIG_ARCH_TEGRA_194_SOC) || \
> -    defined(CONFIG_ARCH_TEGRA_234_SOC) || \
> -    defined(CONFIG_ARCH_TEGRA_264_SOC)
> +    defined(CONFIG_ARCH_TEGRA_234_SOC)
>  extern const struct tegra_mc_ops tegra186_mc_ops;
>  #endif
>  
>  irqreturn_t tegra30_mc_handle_irq(int irq, void *data);
>  extern const irq_handler_t tegra30_mc_irq_handlers[];
> -extern const char * const tegra_mc_status_names[32];
> -extern const char * const tegra_mc_error_names[8];
> +extern const char * const tegra20_mc_status_names[32];
> +extern const char * const tegra20_mc_error_names[8];

I nearly wonder if these should be extracted into a per-SoC field like
mc->ops->status_names and mc->ops->error_names. I think we didn't do
this at the time because they were isolated to individual SoC
generations via the error handling functions, but if they are exposed
like this in the header, might as well abstract them out for a bit more
cleanliness, even if it's not going to be a great win.

> +int tegra186_mc_probe(struct tegra_mc *mc);
> +int tegra186_mc_probe_device(struct tegra_mc *mc, struct device *dev);
> +int tegra186_mc_resume(struct tegra_mc *mc);
> +void tegra186_mc_remove(struct tegra_mc *mc);

Looks like the only reason why we need to export these is so that we can
override the IRQ handlers for Tegra264. Have you considered maybe moving
the IRQ handlers into tegra_mc_soc? It looks to me like that would be an
equally good fit now. It's no longer really an "op" in the sense that we
use the others, but rather just another collection of pointers that are
different from chip to chip.

> diff --git a/drivers/memory/tegra/tegra186.c b/drivers/memory/tegra/tegra186.c
> @@ -916,5 +916,7 @@ const struct tegra_mc_soc tegra186_mc_soc = {
>  	.ch_intmask = 0x0000000f,
>  	.global_intstatus_channel_shift = 0,
>  	.mc_regs = &tegra20_mc_regs,
> +	.mc_addr_hi_mask = 0x3,

Can we perhaps derive the new values for this from the existing
num_address_bits? Also, is this the correct value for Tegra186? We use
the same value for Tegra20

> +static const char *const tegra264_mc_status_names[32] = {
> +	[6] = "EMEM address decode error",
> +	[8] = "Security violation",
> +	[12] = "VPR violation",
> +	[13] = "Secure carveout violation",
> +	[16] = "MTS carveout violation",
> +	[17] = "Generalized carveout violation",
> +	[20] = "Route Sanity error",
> +	[21] = "GIC_MSI error",
> +};

Comparing this to tegra20_mc_status_names[] there's significant overlap.
So I think maybe a better way to deal with this would be to complete the
list of status names and use them across all SoC generations. If for
some reason we really want to exclude certain ones for a given SoC
generation, it might be better to use some sort of validity mask or
whatever, rather than use different subsets of strings here.

I'm curious: if we look at something like "EMEM arbitration error" (9)
on Tegra20, which is not available here for Tegra264. Does it mean that
these errors don't exist on Tegra264 (i.e. they cannot happen)? We have
a few in tegra20_mc_status_names[] that no longer apply for even Tegra30
(such as the "GART page fault" (7)). I guess what I'm asking is if there
is a potential problem with listing all of the possible errors in one
table, since there's evidently no conflict in the enumeration values.

> +/*
> + * MC instance aperture mapping for hubc registers
> + */
> +static const int mc_hubc_aperture_number[5] = {
> +	7, 8, 9, 10, 11
> +};

I don't fully understand what this means. In the code this array is only
indexed into with constants from the individual IRQ callbacks, no loops
are iterating over this. So I think there's potential for improvement
here. For example, the indexed nature of these arrays doesn't provide
any context as to what they are. It's merely mapping one integer to
another, so it's both error prone and difficult to understand/review.

Instead, maybe we can define what a HUBC aperture is. If it's currently
only an "ID" or index, maybe it can be as simple as:

	struct tegra_mc_aperture {
		unsigned int index;
	};

	static const struct tegra_mc_aperture tegra264_mc_aperture_disp = {
		.index = 7,
	};

	static const struct tegra_mc_aperture tegra264_mc_aperture_system = {
		.index = 8,
	};

	static const struct tegra_mc_aperture tegra264_mc_aperture_vision = {
		.index = 9,
	};

	...

	static const struct tegra_mc_aperture tegra264_mc_apertures = {
		&tegra264_mc_aperture_disp,
		&tegra264_mc_aperture_system,
		&tegra264_mc_aperture_vision,
		...
	};

	static const struct tegra_mc_soc tegra264_mc_soc = {
		...
		.apertures = tegra264_mc_apertures,
		.num_apertures = ARRAY_SIZE(tegra264_mc_apertures),
		...
	};

This looks verbose at first and perhaps somewhat unnecessary, but I
think it adds so much more extra context. And I'm sure we'll run into
other data that can be added into these structures.

> +static void mcf_log_fault(struct tegra_mc *mc, u32 channel, unsigned long mcf_ch_intstatus)

First a couple of observations.

> +{
> +	unsigned int bit;

This could be called something like "interrupt" or "irq" to better
reflect what it does. It's also slightly confusing that we first iterate
over these bits in mcf_ch_intstatus to then convert back to a mask so
that we can match against the MC_INT_* definitions.

It would probably be better to do something similar to the apertures
above and collate all of the related information into some sort of
structure.

> +
> +	for_each_set_bit(bit, &mcf_ch_intstatus, 32) {
> +		const char *error = tegra264_mc_status_names[bit] ?: "unknown";
> +		u32 intmask = BIT(bit);
> +		u32 status_reg, status1_reg = 0, addr_reg, addr_hi_reg = 0;
> +		u32 addr_val, value, client_id, i, addr_hi_shift = 0, addr_hi_mask = 0, status1;
> +		const char *direction, *secure;
> +		const char *client = "unknown", *desc = "NA";
> +		phys_addr_t addr = 0;
> +		bool is_gsc = false, err_type_valid = false, err_rt_type_valid = false;
> +		u8 type;
> +		u32 mc_rw_bit = MC_ERR_STATUS_RW, mc_sec_bit = MC_ERR_STATUS_SECURITY;
> +
> +		switch (intmask) {
> +		case MC_INT_DECERR_EMEM:
> +			status_reg = mc->soc->mc_regs->mc_err_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_add;
> +			addr_hi_reg = mc->soc->mc_regs->mc_err_add_hi;
> +			err_type_valid = true;
> +			break;
> +
> +		case MC_INT_SECURITY_VIOLATION:
> +			status_reg = mc->soc->mc_regs->mc_err_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_add;
> +			addr_hi_reg = mc->soc->mc_regs->mc_err_add_hi;
> +			err_type_valid = true;
> +			break;

These two are exactly the same, so they could be combined.

> +
> +		case MC_INT_DECERR_VPR:
> +			status_reg = mc->soc->mc_regs->mc_err_vpr_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_vpr_add;
> +			addr_hi_shift = MC_ERR_STATUS_ADR_HI_SHIFT;
> +			addr_hi_mask = mc->soc->mc_addr_hi_mask;
> +			break;
> +
> +		case MC_INT_SECERR_SEC:
> +			status_reg = mc->soc->mc_regs->mc_err_sec_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_sec_add;
> +			addr_hi_shift = MC_ERR_STATUS_ADR_HI_SHIFT;
> +			addr_hi_mask = mc->soc->mc_addr_hi_mask;
> +			break;
> +
> +		case MC_INT_DECERR_MTS:
> +			status_reg = mc->soc->mc_regs->mc_err_mts_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_mts_add;
> +			addr_hi_shift = MC_ERR_STATUS_ADR_HI_SHIFT;
> +			addr_hi_mask = mc->soc->mc_addr_hi_mask;
> +			break;
> +
> +		case MC_INT_DECERR_GENERALIZED_CARVEOUT:
> +			status_reg = mc->soc->mc_regs->mc_err_gen_co_status;
> +			status1_reg = MC_ERR_GENERALIZED_CARVEOUT_STATUS_1_0;
> +			addr_reg = mc->soc->mc_regs->mc_err_gen_co_add;
> +			addr_hi_shift = MC_ERR_STATUS_GSC_ADR_HI_SHIFT;
> +			addr_hi_mask = MC_ERR_STATUS_GSC_ADR_HI_MASK;
> +			is_gsc = true;
> +			break;
> +
> +		case MC_INT_DECERR_ROUTE_SANITY:
> +			status_reg = mc->soc->mc_regs->mc_err_route_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_route_add;
> +			addr_hi_shift = MC_ERR_STATUS_RT_ADR_HI_SHIFT;
> +			addr_hi_mask = mc->soc->mc_addr_hi_mask;
> +			mc_sec_bit = MC_ERR_ROUTE_SANITY_SEC;
> +			mc_rw_bit = MC_ERR_ROUTE_SANITY_RW;
> +			err_rt_type_valid = true;
> +			break;
> +
> +		case MC_INT_DECERR_ROUTE_SANITY_GIC_MSI:
> +			status_reg = mc->soc->mc_regs->mc_err_route_status;
> +			addr_reg = mc->soc->mc_regs->mc_err_route_add;
> +			addr_hi_shift = MC_ERR_STATUS_RT_ADR_HI_SHIFT;
> +			addr_hi_mask = mc->soc->mc_addr_hi_mask;
> +			mc_sec_bit = MC_ERR_ROUTE_SANITY_SEC;
> +			mc_rw_bit = MC_ERR_ROUTE_SANITY_RW;
> +			err_rt_type_valid = true;
> +			break;

These last two are also the same and can be combined.

> +
> +		default:
> +			dev_err_ratelimited(mc->dev, "Incorrect MC interrupt mask\n");
> +			return;
> +		}
> +
> +		value = mc_ch_readl(mc, channel, status_reg);
> +		if (addr_hi_reg) {
> +			addr = mc_ch_readl(mc, channel, addr_hi_reg);
> +		} else {
> +			if (!is_gsc) {
> +				addr = ((value >> addr_hi_shift) & addr_hi_mask);
> +			} else {
> +				status1 = mc_ch_readl(mc, channel, status1_reg);
> +				addr = ((status1 >> addr_hi_shift) & addr_hi_mask);
> +			}
> +		}
> +
> +		addr <<= 32;
> +		addr_val = mc_ch_readl(mc, channel, addr_reg);
> +		addr |= addr_val;
> +
> +		if (value & mc_rw_bit)
> +			direction = "write";
> +		else
> +			direction = "read";
> +
> +		if (value & mc_sec_bit)
> +			secure = "secure";
> +		else
> +			secure = "non-secure";
> +
> +		client_id = value & mc->soc->client_id_mask;
> +		for (i = 0; i < mc->soc->num_clients; i++) {
> +			if (mc->soc->clients[i].id == client_id) {
> +				client = mc->soc->clients[i].name;
> +				break;
> +			}
> +		}
> +
> +		if (err_type_valid) {
> +			type = (value & mc->soc->mc_err_status_type_mask) >>
> +					MC_ERR_STATUS_TYPE_SHIFT;
> +			desc = tegra264_mc_error_names[type];
> +		} else if (err_rt_type_valid) {
> +			type = (value & MC_ERR_STATUS_RT_TYPE_MASK) >>
> +				MC_ERR_STATUS_RT_TYPE_SHIFT;
> +			desc = tegra_rt_error_names[type];
> +		}
> +
> +		dev_err_ratelimited(mc->dev, "%s: %s %s @%pa: %s (%s)\n",
> +				    client, secure, direction, &addr, error, desc);
> +		if (is_gsc) {
> +			dev_err_ratelimited(mc->dev, "gsc_apr_id=%u gsc_co_apr_id=%u\n",
> +					    ((status1 >> ERR_GENERALIZED_APERTURE_ID_SHIFT)
> +					    & ERR_GENERALIZED_APERTURE_ID_MASK),
> +					    ((status1 >> ERR_GENERALIZED_CARVEOUT_APERTURE_ID_SHIFT)
> +					    & ERR_GENERALIZED_CARVEOUT_APERTURE_ID_MASK));
> +		}
> +	}
> +
> +	/* clear interrupts */
> +	mc_ch_writel(mc, channel, mcf_ch_intstatus, MCF_INTSTATUS_0);
> +}

To further refactor this, here's another observation: this function is
split up into three steps: 1) collect a number of register offsets, bit
positions and flags, 2) perform some operations based on these
parameters, extract additional information and 3) generate a report of
the collected information.

I think one option for refactoring this would be to make this more data
driven. This is somewhat difficult because we have these SoC parameters
that we cannot easily put into another structure, but I think we can
employ a trick to get the same results. If we look at the above sequence
of steps, we can split this into three (or possibly even two or just
one) callbacks per type of interrupt:

	struct tegra_mc_error_handler {
		unsigned int id;
		const char *name;
		void (*get_offsets)(...);
		void (*collect_info)(...);
		void (*report)(...);
	};

If we can make the output of .collect_info() generic enough, we could
probably get rid of .report() and make that completely generic, taking
the generic output as input.

We might even be able to make .get_offsets() optional and have
.collect_info() handle everything itself. We can then define a table
such as this:

	static const struct tegra_mc_error_handler[] = {
		{
			.id = MC_INT_DECERR_EMEM_BIT,
			.name = "external memory decode error",
			.collect_info = tegra_mc_emem_collect,
		},
		...
	};

That said, we might even be able to derive the tegra_mc_error_names from
that new table instead of keeping a separate one.

With all of the above, we now have the means to simply iterate over the
bits set in the top-level interrupt status, find the handler
corresponding to each one and run the appropriate sequence.

Moving the code for this into separate functions will decrease the
complexity of each function and make this a *lot* easier to review and
understand in my opinion. We will need to take some care to reduce
duplication (since we have a few cases in the switch above that share
the same content, I suspect we'll be able to create .collect_info()
callbacks that can be reused multiple times, or we might be able to
split these callbacks into smaller pieces that can be more easily
reused).

> +
> +static irqreturn_t handle_mcf_irq(int irq, void *data)
> +{
> +	struct tegra_mc *mc = data;
> +	unsigned long mcf_common_intstat, mcf_intstatus;
> +	unsigned int slice;
> +
> +	/* Read MCF_COMMON_INTSTATUS0_0_0 from MCB block */
> +	mcf_common_intstat = mc_ch_readl(mc, MC_BROADCAST_CHANNEL, MCF_COMMON_INTSTATUS0_0_0);
> +	if (mcf_common_intstat == 0) {
> +		dev_err(mc->dev, "No interrupt in MCF\n");

I'd think this would be better as a warning than an error.

> +		return IRQ_NONE;
> +	}
> +
> +	for_each_set_bit(slice, &mcf_common_intstat, 32) {
> +		/* Find out the slice number on which interrupt occurred */
> +		if (slice > 4) {

This looks like it should be parameterized from the start.

> +			dev_err(mc->dev, "Invalid value in registeer MCF_COMMON_INTSTATUS0_0_0\n");

Make this more useful by adding the invalid value to the message. Also,
no need to be specific about the register name, since nobody will know
what that means. We also know quite specifically why the value is
invalid. A better error message would be:

	dev_err(mc->dev, "slice index out of bounds: %u\n", slice);

> +			return IRQ_NONE;
> +		}
> +
> +		mcf_intstatus = mc_ch_readl(mc, slice, MCF_INTSTATUS_0);
> +		if (mcf_intstatus != 0)
> +			mcf_log_fault(mc, slice, mcf_intstatus);
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void hub_log_fault(struct tegra_mc *mc, u32 hub, unsigned long hub_intstat)
> +{
> +	unsigned int bit;
> +
> +	for_each_set_bit(bit, &hub_intstat, 32) {
> +		const char *error = tegra_hub_status_names[bit] ?: "unknown";
> +		u32 intmask = BIT(bit), client_id;
> +		const char *client = "unknown";
> +		u32 status_reg, addr_reg = 0, addr_hi_reg = 0;
> +		u32 value, addr_val, i;
> +		phys_addr_t addr = 0;
> +
> +		switch (intmask) {
> +		case MSS_HUB_COALESCER_ERR_INTMASK:
> +			status_reg = MSS_HUB_COALESCE_ERR_STATUS_0;
> +			addr_reg = MSS_HUB_COALESCE_ERR_ADR_0;
> +			addr_hi_reg = MSS_HUB_COALESCE_ERR_ADR_HI_0;
> +			break;
> +
> +		case MSS_HUB_SMMU_BYPASS_ALLOW_ERR_INTMASK:
> +			status_reg = MSS_HUB_SMMU_BYPASS_ALLOW_ERR_STATUS_0;
> +			break;
> +
> +		case MSS_HUB_ILLEGAL_TBUGRP_ID_INTMASK:
> +			status_reg = MSS_HUB_ILLEGAL_TBUGRP_ID_ERR_STATUS_0;
> +			break;
> +
> +		case MSS_HUB_MSI_ERR_INTMASK:
> +			status_reg = MSS_HUB_MSI_ERR_STATUS_0;
> +			break;
> +
> +		case MSS_HUB_POISON_RSP_INTMASK:
> +			status_reg = MSS_HUB_POISON_RSP_STATUS_0;
> +			break;
> +
> +		case MSS_HUB_RESTRICTED_ACCESS_ERR_INTMASK:
> +			status_reg = MSS_HUB_RESTRICTED_ACCESS_ERR_STATUS_0;
> +			break;
> +
> +		case MSS_HUB_RESERVED_PA_ERR_INTMASK:
> +			status_reg = MSS_HUB_RESERVED_PA_ERR_STATUS_0;
> +			break;
> +
> +		default:
> +			dev_err_ratelimited(mc->dev, "Incorrect HUB interrupt mask\n");
> +			return;
> +		}
> +
> +		value = mc_ch_readl(mc, hub, status_reg);
> +		if (addr_reg) {
> +			addr = mc_ch_readl(mc, hub, addr_hi_reg);
> +			addr <<= 32;
> +			addr_val = mc_ch_readl(mc, hub, addr_reg);
> +			addr |= addr_val;
> +		}
> +
> +		client_id = value & mc->soc->client_id_mask;
> +		for (i = 0; i < mc->soc->num_clients; i++) {
> +			if (mc->soc->clients[i].id == client_id) {
> +				client = mc->soc->clients[i].name;
> +				break;
> +			}
> +		}
> +
> +		dev_err_ratelimited(mc->dev, "%s: @%pa: %s status:%u\n",
> +				    client, &addr, error, value);
> +	}
> +
> +	/* clear interrupts */
> +	mc_ch_writel(mc, hub, hub_intstat, MSS_HUB_INTRSTATUS_0);
> +}

Similar comments apply to this as to mcf_log_fault().

> +
> +static irqreturn_t handle_hub_irq(int irq, void *data, int mc_hubc_aperture_number)
> +{
> +	struct tegra_mc *mc = data;
> +	unsigned long hub_global_intstat, hub_intstat, hub_interrupted = 0;
> +	unsigned int hub_global_mask = 0x7F00, hub_global_shift = 8, hub;

I think you can shorten a lot of these variables by dropping the hub_
prefix. It's in a handle_*hub*_irq() function, so you don't need extra
context.

> +
> +	/* Read MSS_HUB_GLOBAL_INTSTATUS_0 from mc_hubc_aperture_number block */
> +	hub_global_intstat = mc_ch_readl(mc, mc_hubc_aperture_number, MSS_HUB_GLOBAL_INTSTATUS_0);

We seem to have different names for the same thing here. We use the
"memory controller channel read register" function and pass the aperture
number in as the channel number. Maybe we should settle on one name and
use that going forward?

Or are these different things historically? Are we reusing channels from
earlier chips for apertures on newer chips?

> +	if (hub_global_intstat == 0) {
> +		dev_err(mc->dev, "No interrupt in HUB/HUBC\n");

Again, this can probably just be a warning, no need for an error.

Thierry

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)