lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49b24812-dafc-4ff9-a79b-07d1e2c6364b@huawei.com>
Date:   Tue, 12 Apr 2022 09:39:32 +0100
From:   John Garry <john.garry@...wei.com>
To:     Yicong Yang <yangyicong@...wei.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        <gregkh@...uxfoundation.org>, <helgaas@...nel.org>,
        <alexander.shishkin@...ux.intel.com>, <lorenzo.pieralisi@....com>,
        <will@...nel.org>, <mark.rutland@....com>,
        <mathieu.poirier@...aro.org>, <suzuki.poulose@....com>,
        <mike.leach@...aro.org>, <leo.yan@...aro.org>,
        <jonathan.cameron@...wei.com>, <daniel.thompson@...aro.org>,
        <joro@...tes.org>, <shameerali.kolothum.thodi@...wei.com>,
        <robin.murphy@....com>, <peterz@...radead.org>, <mingo@...hat.com>,
        <acme@...nel.org>, <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>,
        <coresight@...ts.linaro.org>, <linux-pci@...r.kernel.org>,
        <linux-perf-users@...r.kernel.org>,
        <iommu@...ts.linux-foundation.org>
CC:     <prime.zeng@...wei.com>, <liuqi115@...wei.com>,
        <zhangshaokun@...ilicon.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH v7 2/7] hwtracing: Add trace function support for
 HiSilicon PCIe Tune and Trace device

>>> +static int hisi_ptt_alloc_trace_buf(struct hisi_ptt *hisi_ptt)
>>> +{
>>> +    struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
>>> +    struct device *dev = &hisi_ptt->pdev->dev;
>>> +    int i;
>>> +
>>> +    hisi_ptt->trace_ctrl.buf_index = 0;
>>> +
>>> +    /* If the trace buffer has already been allocated, zero it. */
>>
>> I am not sure why this is not called from the probe
>>
> 
> The buffer allocation is done when necessary as driver will probe the device on booting but
> the user may never use it. In this condition it's a waste of memory if we allocate the buffers
> in probe. Currently we'll allocate 16M memory for 4 buffers.
> 

But that's just not how we do things. We setup the driver fully to be 
used in the probe. If the user cannot really afford the memory then 
he/she should not load the driver.

In addition, this driver would be used in a machine which will have 
gigbytes of memory, so I think that the memory mentioned here is 
relatively insignificant.

> So this function is called every time before we start trace. In the first time it will allocate
> the DMA buffers and it the other times it just zero the buffers to clear the data of last time.
> 
>>> +    if (ctrl->trace_buf) {
>>> +        for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; i++)
>>> +            memset(ctrl->trace_buf[i].addr, 0, HISI_PTT_TRACE_BUF_SIZE);
>>> +        return 0;
>>> +    }
>>> +
>>> +    ctrl->trace_buf = devm_kcalloc(dev, HISI_PTT_TRACE_BUF_CNT,
>>> +                       sizeof(struct hisi_ptt_dma_buffer), GFP_KERNEL);
>>
>> sizeof(*ctrl->trace_buf) may be better
>>
> 
> ok.
> 
>>> +    if (!ctrl->trace_buf)
>>> +        return -ENOMEM;
>>> +
>>> +    for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; ++i) {
>>> +        ctrl->trace_buf[i].addr = dmam_alloc_coherent(dev, HISI_PTT_TRACE_BUF_SIZE,
>>> +                                 &ctrl->trace_buf[i].dma,
>>> +                                 GFP_KERNEL);
>>> +        if (!ctrl->trace_buf[i].addr) {
>>> +            hisi_ptt_free_trace_buf(hisi_ptt);
>>> +            return -ENOMEM;
>>> +        }
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void hisi_ptt_trace_end(struct hisi_ptt *hisi_ptt)
>>> +{
>>> +    writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +    hisi_ptt->trace_ctrl.started = false;
>>> +}
>>> +
>>> +static int hisi_ptt_trace_start(struct hisi_ptt *hisi_ptt)
>>> +{
>>> +    struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
>>> +    u32 val;
>>> +    int i;
>>> +
>>> +    /* Check device idle before start trace */
>>> +    if (!hisi_ptt_wait_trace_hw_idle(hisi_ptt)) {
>>> +        pci_err(hisi_ptt->pdev, "Failed to start trace, the device is still busy\n");
>>> +        return -EBUSY;
>>> +    }
>>> +
>>> +    ctrl->started = true;
>>> +
>>> +    /* Reset the DMA before start tracing */
>>> +    val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +    val |= HISI_PTT_TRACE_CTRL_RST;
>>> +    writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +
>>> +    hisi_ptt_wait_dma_reset_done(hisi_ptt);
>>> +
>>> +    val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +    val &= ~HISI_PTT_TRACE_CTRL_RST;
>>> +    writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +
>>> +    /* Clear the interrupt status */
>>> +    writel(HISI_PTT_TRACE_INT_STAT_MASK, hisi_ptt->iobase + HISI_PTT_TRACE_INT_STAT);
>>> +    writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_INT_MASK);
>>> +
>>> +    /* Configure the trace DMA buffer */
>>
>> I am not sure why this sort of thing is done outside probing
>>

...

>>> +
>>> +    val = FIELD_GET(HISI_PTT_PMU_DIRECTION_MASK, event->attr.config);
>>> +    ret = hisi_ptt_trace_valid_config_onehot(val, hisi_ptt_trace_available_direction,
>>
>> how about put all those arrays in hisi_ptt_trace_valid_config_onehot() and pass some flag to say which array you want to use? Or something like that. Passing the arrays in this fashion is messy
>>
> 
> Since there are 3 configs (type, direction, format) with different available range and setting method (onehot, non-onehot),
> moving the arrays into the valid checking function means we need to recognize the config types (passed by the caller but need
> to know the available value array) and the checking method together. That may make the code more complex than now: 1st picking
> up the right array and judge wich checking method this array applied and 2nd do the checking.
> 
> Currently it's designed to decouple the checking method and the available value array. The hisi_ptt_trace_valid_config{_onehot}()
> won't care about which array to use since caller take responsibilty for this. So perhaps current approach is simple and clear
> enough.


A couple of points:
- hisi_ptt_trace_valid_config_type() only has 1x caller so can make it 
dedicated for that caller
- there is not much code in hisi_ptt_trace_valid_config_onshot(), so ok 
to duplicate if makes overall code look better

So I think dedicated functions make the code simpler, easier to follow, 
and maintain:

static int hisi_ptt_trace_valid_config_dir(u32 val)
{
	int i;
	/*
	 * The supported value of the direction parameter. See hisi_ptt.rst
	 * documentation for more details.
	 */
	static const u32 hisi_ptt_trace_available_direction[] = {
		0,
		1,
		2,
		3,
	};

	for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_available_direction); i++)
		if (val == hisi_ptt_trace_available_direction[i])
			return 0;

	return -EINVAL;
}

static int hisi_ptt_trace_valid_config_format(u32 val)
{
	int i;
	static const u32 hisi_ptt_trace_availble_format[] = {
		0,	/* 4DW */
		1,	/* 8DW */
	};

	for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_availble_format); i++)
		if (val == hisi_ptt_trace_availble_format[i])
			return 0;

	return -EINVAL;
}

static int hisi_ptt_trace_valid_config_type(u32 val)
{
	int i;
	/* Different types can be set simultaneously */
	static const u32 hisi_ptt_trace_available_type[] = {
		1,	/* posted_request */
		2,	/* non-posted_request */
		4,	/* completion */
	};

	for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_available_type); i++)
		val &= ~hisi_ptt_trace_available_type[i];

	if (val)
		return -EINVAL;
	return 0;
}

...

static int hisi_ptt_pmu_event_init(struct perf_event *event)
{

...

	val = FIELD_GET(HISI_PTT_PMU_DIRECTION_MASK, event->attr.config);
	ret = hisi_ptt_trace_valid_config_dir(val);
	if (ret < 0)
		goto out;
	ctrl->direction = val;

	val = FIELD_GET(HISI_PTT_PMU_TYPE_MASK, event->attr.config);
	ret = hisi_ptt_trace_valid_config_type(val);
	if (ret < 0)
		goto out;
	ctrl->type = val;

	val = FIELD_GET(HISI_PTT_PMU_FORMAT_MASK, event->attr.config);
	ret = hisi_ptt_trace_valid_config_format(val);
	if (ret < 0)
		goto out;
	ctrl->format = val;

	...
}
> 
>>
>>> +                         ARRAY_SIZE(hisi_ptt_trace_available_direction));
>>> +    if (ret < 0)
>>> +        goto out;
>>> +    ctrl->direction = val;
>>> +
>>> +    val = FIELD_GET(HISI_PTT_PMU_TYPE_MASK, event->attr.config);
>>> +    ret = hisi_ptt_trace_valid_config(val, hisi_ptt_trace_available_type,
>>> +                      ARRAY_SIZE(hisi_ptt_trace_available_type));
>>> +    if (ret < 0)
>>> +        goto out;
>>> +    ctrl->type = val;
>>> +
>>> +    val = FIELD_GET(HISI_PTT_PMU_FORMAT_MASK, event->attr.config);
>>> +    ret = hisi_ptt_trace_valid_config_onehot(val, hisi_ptt_trace_availble_format,
>>> +                         ARRAY_SIZE(hisi_ptt_trace_availble_format));
>>> +    if (ret < 0)
>>> +        goto out;
>>> +    ctrl->format = val;
>>> +
>>> +out:
>>> +    mutex_unlock(&hisi_ptt->mutex);
>>> +    return ret;
>>> +}
>>> +
>>> +static void *hisi_ptt_pmu_setup_aux(struct perf_event *event, void **pages,
>>> +                    int nr_pages, bool overwrite)
>>> +{
>>> +    struct hisi_ptt_pmu_buf *buf;
>>> +    struct page **pagelist;
>>> +    int i;
>>> +
>>> +    if (overwrite) {
>>> +        dev_warn(event->pmu->dev, "Overwrite mode is not supported\n");
>>> +        return NULL;
>>> +    }
>>> +
>>> +    /* If the pages size less than buffers, we cannot start trace */
>>> +    if (nr_pages < HISI_PTT_TRACE_TOTAL_BUF_SIZE / PAGE_SIZE)
>>> +        return NULL;
>>> +
>>> +    buf = kzalloc(sizeof(*buf), GFP_KERNEL);
>>> +    if (!buf)
>>> +        return NULL;
>>> +
>>> +    pagelist = kcalloc(nr_pages, sizeof(*pagelist), GFP_KERNEL);
>>> +    if (!pagelist) {
>>> +        kfree(buf);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    for (i = 0; i < nr_pages; i++)
>>> +        pagelist[i] = virt_to_page(pages[i]);
>>> +
>>> +    buf->base = vmap(pagelist, nr_pages, VM_MAP, PAGE_KERNEL);
>>> +    if (!buf->base) {
>>> +        kfree(pagelist);
>>> +        kfree(buf);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    buf->nr_pages = nr_pages;
>>> +    buf->length = nr_pages * PAGE_SIZE;
>>> +    buf->pos = 0;
>>> +
>>> +    kfree(pagelist);
>>> +    return buf;
>>> +}
>>> +
>>> +static void hisi_ptt_pmu_free_aux(void *aux)
>>> +{
>>> +    struct hisi_ptt_pmu_buf *buf = aux;
>>> +
>>> +    vunmap(buf->base);
>>> +    kfree(buf);
>>> +}
>>> +
>>> +static void hisi_ptt_pmu_start(struct perf_event *event, int flags)
>>> +{
>>> +    struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu);
>>> +    struct perf_output_handle *handle = &hisi_ptt->trace_ctrl.handle;
>>> +    struct hw_perf_event *hwc = &event->hw;
>>> +    struct hisi_ptt_pmu_buf *buf;
>>> +    int cpu = event->cpu;
>>> +    int ret;
>>> +
>>> +    hwc->state = 0;
>>> +    mutex_lock(&hisi_ptt->mutex);
>>> +    if (hisi_ptt->trace_ctrl.started) {
>>> +        pci_dbg(hisi_ptt->pdev, "trace has already started\n");
>>
>> doesn't perf core guard against this sort of thing?
>>
> 
> Maybe not as tested. The perf core will start the events 1)on the cpus user specified or
> 2)on all the cpus, but the PTT trace is intended to start once on one cpu.
> 
> For the 2) case, the driver will make default cpu to start the trace and block others
> in pmu::add(). For the 1) case we'll met the condition here. So the started status is
> test here to avoid a second start.

if this is a realistic and sensible usecase then it would be nice to 
handle in core perf code at some stage

> 
>>> +        goto stop;
>>> +    }
>>> +
>>> +    if (cpu == -1)
>>> +        cpu = hisi_ptt->trace_ctrl.default_cpu;
>>> +
>>> +    /*
>>> +     * Handle the interrupt on the same cpu which starts the trace to avoid
>>> +     * context mismatch. Otherwise we'll trigger the WARN from the perf
>>> +     * core in event_function_local().
>>> +     */
>>> +    WARN_ON(irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, HISI_PTT_TRACE_DMA_IRQ),
>>> +                 cpumask_of(cpu)));
>>> +
>>> +    ret = hisi_ptt_alloc_trace_buf(hisi_ptt);
>>> +    if (ret) {
>>> +        pci_dbg(hisi_ptt->pdev, "alloc trace buf failed, ret = %d\n", ret);
>>> +        goto stop;
>>> +    }
>>> +
>>> +    buf = perf_aux_output_begin(handle, event);
>>> +    if (!buf) {
>>> +        pci_dbg(hisi_ptt->pdev, "aux output begin failed\n");
>>> +        goto stop;
>>> +    }
>>> +
>>> +    buf->pos = handle->head % buf->length;
>>> +
>>> +    ret = hisi_ptt_trace_start(hisi_ptt);
>>> +    if (ret) {
>>> +        pci_dbg(hisi_ptt->pdev, "trace start failed, ret = %d\n", ret);
>>> +        perf_aux_output_end(handle, 0);
>>> +        goto stop;
>>> +    }
>>> +
>>> +    mutex_unlock(&hisi_ptt->mutex);
>>> +    return;
>>> +stop:
>>> +    event->hw.state |= PERF_HES_STOPPED;
>>> +    mutex_unlock(&hisi_ptt->mutex);
>>> +}
>>> +
>>> +static void hisi_ptt_pmu_stop(struct perf_event *event, int flags)
>>> +{
>>> +    struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu);
>>> +    struct hw_perf_event *hwc = &event->hw;
>>> +
>>> +    if (hwc->state & PERF_HES_STOPPED)
>>> +        return;
>>> +
>>> +    mutex_lock(&hisi_ptt->mutex);
>>> +    if (hisi_ptt->trace_ctrl.started) {
>>> +        hisi_ptt_trace_end(hisi_ptt);
>>> +        WARN(!hisi_ptt_wait_trace_hw_idle(hisi_ptt), "Device is still busy");
>>> +        hisi_ptt_update_aux(hisi_ptt, hisi_ptt->trace_ctrl.buf_index, true);
>>> +    }
>>> +    mutex_unlock(&hisi_ptt->mutex);
>>> +
>>> +    hwc->state |= PERF_HES_STOPPED;
>>> +    perf_event_update_userpage(event);
>>> +    hwc->state |= PERF_HES_UPTODATE;
>>> +}
>>> +
>>> +static int hisi_ptt_pmu_add(struct perf_event *event, int flags)
>>> +{
>>> +    struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu);
>>> +    struct hw_perf_event *hwc = &event->hw;
>>> +    int cpu = event->cpu;
>>> +
>>> +    /*
>>> +     * Only allow the default cpu to add the event if user doesn't specify
>>> +     * the cpus.
>>> +     */
>>> +    if (cpu == -1 && smp_processor_id() != hisi_ptt->trace_ctrl.default_cpu)
>>> +        return 0;
>>> +
>>> +    hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
>>> +
>>> +    if (flags & PERF_EF_START) {
>>> +        hisi_ptt_pmu_start(event, PERF_EF_RELOAD);
>>> +        if (hwc->state & PERF_HES_STOPPED)
>>> +            return -EINVAL;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void hisi_ptt_pmu_del(struct perf_event *event, int flags)
>>> +{
>>> +    hisi_ptt_pmu_stop(event, PERF_EF_UPDATE);
>>> +}
>>> +
>>> +static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt)
>>> +{
>>> +    u16 core_id, sicl_id;
>>> +    char *pmu_name;
>>> +    u32 reg;
>>> +
>>> +    hisi_ptt->hisi_ptt_pmu = (struct pmu) {
>>> +        .module        = THIS_MODULE,
>>> +        .capabilities    = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
>>> +        .task_ctx_nr    = perf_sw_context,
>>> +        .attr_groups    = hisi_ptt_pmu_groups,
>>> +        .event_init    = hisi_ptt_pmu_event_init,
>>> +        .setup_aux    = hisi_ptt_pmu_setup_aux,
>>> +        .free_aux    = hisi_ptt_pmu_free_aux,
>>> +        .start        = hisi_ptt_pmu_start,
>>> +        .stop        = hisi_ptt_pmu_stop,
>>> +        .add        = hisi_ptt_pmu_add,
>>> +        .del        = hisi_ptt_pmu_del,
>>> +    };
>>> +
>>> +    reg = readl(hisi_ptt->iobase + HISI_PTT_LOCATION);
>>> +    core_id = FIELD_GET(HISI_PTT_CORE_ID, reg);
>>> +    sicl_id = FIELD_GET(HISI_PTT_SICL_ID, reg);
>>> +
>>> +    pmu_name = devm_kasprintf(&hisi_ptt->pdev->dev, GFP_KERNEL, "hisi_ptt%u_%u",
>>> +                  sicl_id, core_id);
>>> +    if (!pmu_name)
>>> +        return -ENOMEM;
>>> +
>>> +    return perf_pmu_register(&hisi_ptt->hisi_ptt_pmu, pmu_name, -1);
>>> +}
>>> +
>>> +/*
>>> + * The DMA of PTT trace can only use direct mapping, due to some
>>> + * hardware restriction. Check whether there is an IOMMU or the
>>> + * policy of the IOMMU domain is passthrough, otherwise the trace
>>> + * cannot work.
>>> + *
>>> + * The PTT device is supposed to behind the ARM SMMUv3, which
>>
>> /s/ the ARM SMMUv3/an ARM SMMUv3/
>>
> ok.
>>> + * should have passthrough the device by a quirk.
>>> + */
>>> +static int hisi_ptt_check_iommu_mapping(struct pci_dev *pdev)
>>> +{
>>> +    struct iommu_domain *iommu_domain;
>>> +
>>> +    iommu_domain = iommu_get_domain_for_dev(&pdev->dev);
>>> +    if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY)
>>> +        return 0;
>>> +
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static int hisi_ptt_probe(struct pci_dev *pdev,
>>> +              const struct pci_device_id *id)
>>> +{
>>> +    struct hisi_ptt *hisi_ptt;
>>> +    int ret;
>>> +
>>> +    ret = hisi_ptt_check_iommu_mapping(pdev);
>>> +    if (ret) {
>>> +        pci_err(pdev, "requires direct DMA mappings\n");
>>> +        return ret;
>>> +    }
>>> +
>>> +    hisi_ptt = devm_kzalloc(&pdev->dev, sizeof(*hisi_ptt), GFP_KERNEL);
>>> +    if (!hisi_ptt)
>>> +        return -ENOMEM;
>>> +
>>> +    mutex_init(&hisi_ptt->mutex);
>>> +    hisi_ptt->pdev = pdev;
>>> +    pci_set_drvdata(pdev, hisi_ptt);
>>> +
>>> +    ret = pcim_enable_device(pdev);
>>> +    if (ret) {
>>> +        pci_err(pdev, "failed to enable device, ret = %d\n", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    ret = pcim_iomap_regions(pdev, BIT(2), DRV_NAME);
>>> +    if (ret) {
>>> +        pci_err(pdev, "failed to remap io memory, ret = %d\n", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    hisi_ptt->iobase = pcim_iomap_table(pdev)[2];
>>> +
>>> +    ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret) {
>>> +        pci_err(pdev, "failed to set 64 bit dma mask, ret = %d\n", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    pci_set_master(pdev);
>>> +
>>> +    ret = hisi_ptt_register_irq(hisi_ptt);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    ret = hisi_ptt_init_ctrls(hisi_ptt);
>>> +    if (ret) {
>>> +        pci_err(pdev, "failed to init controls, ret = %d\n", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    ret = hisi_ptt_register_pmu(hisi_ptt);
>>> +    if (ret) {
>>> +        pci_err(pdev, "failed to register PMU device, ret = %d", ret);
>>> +        return ret;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void hisi_ptt_remove(struct pci_dev *pdev)
>>> +{
>>> +    struct hisi_ptt *hisi_ptt = pci_get_drvdata(pdev);
>>> +
>>> +    /*
>>> +     * We have to manually unregister the PMU device rather than make it
>>> +     * devres managed to keep order that the PMU device's unregistration
>>> +     * is prior to the release of DMA buffers. As the DMA buffers are
>>> +     * devm allocated when necessary which is after the registration of
>>> +     * the PMU device.
>>> +     */
>>
>> do you really need to mention all this?
>>
> 
> I think yes. Otherwise people may ask why not register PMU device in managed
> way as well.

I expect devres work to be done after hisi_ptt_remove() so I would know 
this...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ