linux-kernel - Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5hwskybvjh.wl%tiwai@suse.de>
Date:	Mon, 09 Jun 2008 15:13:54 +0200
From:	Takashi Iwai <tiwai@...e.de>
To:	Benjamin Kidwell <benjkidwell@...oo.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected

At Mon, 09 Jun 2008 13:11:43 +0200,
I wrote:
> 
> At Sat, 7 Jun 2008 18:56:33 -0700 (PDT),
> Benjamin Kidwell wrote:
> > 
> > --- Takashi Iwai <> wrote:
> > 
> > > The problem is that your hardware reports the wrong DMA position.
> > > I didn't expect that this delay is so much, (IOW, the hardware is so
> > > buggy).  Maybe it should be put into its own workq.
> > 
> > I've been getting my process timing information from ps, top, and htop, which
> > I know can be unreliable in some ways (I've been using a tickless (NO_HZ=y)
> > kernel). I recompiled the g8a4bd4d kernel with that option off and a few more
> > changes to make my settings more conservative, and rechecked events/0. It
> > still reports at a consistent 2.5%, (down from about 4.3% with previous kernel
> > options) and after a short period of use (uptime 30 minutes) events/0 has had
> > 0:48 of time. In comparison, Xorg and Firefox (usually the biggest users)
> > report 1:12 and 0:44 in the same timeframe. 
> > 
> > So if I understand the purpose of the code, that means that my system has
> > generated 48 seconds worth of timing mistakes in audio playback relative to 25
> > minutes or so of audio stream, so that suggests the timing error of the
> > hardware is about 3%! That seems like huge latency for hardware so I see why
> > you wouldn't expect that.
> 
> Could you check the patch below in addition?  It's against my latest
> git tree and should be applicable as is to linux-next (since it merges
> from my tree), too.
> 
> This will show some latency time in the kernel messages.  If the
> average latency is about 20 or 30us, this is almost in the expected
> range.  Occasional high latencies should be the result of
> cond_resched(), so it can be ignored.
> 
> > I should also say that close listening has convinced me that my audio
> > performance has been IMPROVED by the patch - IOW, the code is working well at
> > improving the audio stream, even though its using a fair amount of CPU time to
> > do it. With the new code, I seem to hear less random clicks, interruptions, or
> > static that I had previously blamed on the source of the audio. I don't know
> > how to test the output to the speakers relative to the input source
> > quantitatively, though.
> 
> First we need to gather some hardware information to sort out the
> issue.  If the latency is almost constant, we may add some another
> trick, e.g. setting different IRQ-trigger pointers.  I'll prepare
> another patch.

Here is the patch to do another trick.
The driver tries to delay the IRQ timing in one sample so that the DMA
pointer returns the proper position.

It seems working on my test machine.  This patch can work together
with the last "latency measurement" patch, and you'll see no longer
latency things in the kernel message if this workaround really plays
its role.

Give it a try, and let me know if you still have any issue.


thanks,

Takashi


diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index dc68709..401a4fd 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -58,6 +58,7 @@ static int position_fix[SNDRV_CARDS];
 static int probe_mask[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS-1)] = -1};
 static int single_cmd;
 static int enable_msi;
+static int bdl_pos_adj = 1;
 
 module_param_array(index, int, NULL, 0444);
 MODULE_PARM_DESC(index, "Index value for Intel HD audio interface.");
@@ -77,6 +78,8 @@ MODULE_PARM_DESC(single_cmd, "Use single command to communicate with codecs "
 		 "(for debugging only).");
 module_param(enable_msi, int, 0444);
 MODULE_PARM_DESC(enable_msi, "Enable Message Signaled Interrupt (MSI)");
+module_param(bdl_pos_adj, int, 0644);
+MODULE_PARM_DESC(bdl_pos_adj, "BDL position adjustment offset");
 
 #ifdef CONFIG_SND_HDA_POWER_SAVE
 /* power_save option is defined in hda_codec.c */
@@ -310,6 +313,7 @@ struct azx_dev {
 	unsigned int opened :1;
 	unsigned int running :1;
 	unsigned int irq_pending: 1;
+	unsigned int irq_ignore: 1;
 };
 
 /* CORB/RIRB */
@@ -943,6 +947,11 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
 			azx_sd_writeb(azx_dev, SD_STS, SD_INT_MASK);
 			if (!azx_dev->substream || !azx_dev->running)
 				continue;
+			/* ignore the first dummy IRQ (due to pos_adj) */
+			if (azx_dev->irq_ignore) {
+				azx_dev->irq_ignore = 0;
+				continue;
+			}
 			/* check whether this IRQ is really acceptable */
 			if (azx_position_ok(chip, azx_dev)) {
 				azx_dev->irq_pending = 0;
@@ -977,14 +986,53 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
 
 
 /*
+ * set up a BDL entry
+ */
+static int setup_bdle(struct snd_pcm_substream *substream,
+		      struct azx_dev *azx_dev, u32 **bdlp,
+		      int ofs, int size, int with_ioc)
+{
+	struct snd_sg_buf *sgbuf = snd_pcm_substream_sgbuf(substream);
+	u32 *bdl = *bdlp;
+
+	while (size > 0) {
+		dma_addr_t addr;
+		int chunk;
+
+		if (azx_dev->frags >= AZX_MAX_BDL_ENTRIES)
+			return -EINVAL;
+
+		addr = snd_pcm_sgbuf_get_addr(sgbuf, ofs);
+		/* program the address field of the BDL entry */
+		bdl[0] = cpu_to_le32((u32)addr);
+		bdl[1] = cpu_to_le32(upper_32bit(addr));
+		/* program the size field of the BDL entry */
+		chunk = PAGE_SIZE - (ofs % PAGE_SIZE);
+		if (size < chunk)
+			chunk = size;
+		bdl[2] = cpu_to_le32(chunk);
+		/* program the IOC to enable interrupt
+		 * only when the whole fragment is processed
+		 */
+		size -= chunk;
+		bdl[3] = (size || !with_ioc) ? 0 : cpu_to_le32(0x01);
+		bdl += 4;
+		azx_dev->frags++;
+		ofs += chunk;
+	}
+	*bdlp = bdl;
+	return ofs;
+}
+
+/*
  * set up BDL entries
  */
 static int azx_setup_periods(struct snd_pcm_substream *substream,
 			     struct azx_dev *azx_dev)
 {
-	struct snd_sg_buf *sgbuf = snd_pcm_substream_sgbuf(substream);
 	u32 *bdl;
 	int i, ofs, periods, period_bytes;
+	int pos_adj = 0;
 
 	/* reset BDL address */
 	azx_sd_writel(azx_dev, SD_BDLPL, 0);
@@ -998,39 +1046,44 @@ static int azx_setup_periods(struct snd_pcm_substream *substream,
 	bdl = (u32 *)azx_dev->bdl.area;
 	ofs = 0;
 	azx_dev->frags = 0;
-	for (i = 0; i < periods; i++) {
-		int size, rest;
-		if (i >= AZX_MAX_BDL_ENTRIES) {
-			snd_printk(KERN_ERR "Too many BDL entries: "
-				   "buffer=%d, period=%d\n",
-				   azx_dev->bufsize, period_bytes);
-			/* reset */
-			azx_sd_writel(azx_dev, SD_BDLPL, 0);
-			azx_sd_writel(azx_dev, SD_BDLPU, 0);
-			return -EINVAL;
+	azx_dev->irq_ignore = 0;
+	if (bdl_pos_adj > 0) {
+		struct snd_pcm_runtime *runtime = substream->runtime;
+		pos_adj = (bdl_pos_adj * runtime->rate + 47999) / 48000;
+		if (!pos_adj)
+			pos_adj = 1;
+		pos_adj = frames_to_bytes(runtime, pos_adj);
+		if (pos_adj >= period_bytes) {
+			snd_printk(KERN_WARNING "Too big adjustment %d\n",
+				   bdl_pos_adj);
+			pos_adj = 0;
+		} else {
+			ofs = setup_bdle(substream, azx_dev,
+					 &bdl, ofs, pos_adj, 1);
+			if (ofs < 0)
+				goto error;
+			azx_dev->irq_ignore = 1;
 		}
-		rest = period_bytes;
-		do {
-			dma_addr_t addr = snd_pcm_sgbuf_get_addr(sgbuf, ofs);
-			/* program the address field of the BDL entry */
-			bdl[0] = cpu_to_le32((u32)addr);
-			bdl[1] = cpu_to_le32(upper_32bit(addr));
-			/* program the size field of the BDL entry */
-			size = PAGE_SIZE - (ofs % PAGE_SIZE);
-			if (rest < size)
-				size = rest;
-			bdl[2] = cpu_to_le32(size);
-			/* program the IOC to enable interrupt
-			 * only when the whole fragment is processed
-			 */
-			rest -= size;
-			bdl[3] = rest ? 0 : cpu_to_le32(0x01);
-			bdl += 4;
-			azx_dev->frags++;
-			ofs += size;
-		} while (rest > 0);
+	}
+	for (i = 0; i < periods; i++) {
+		if (i == periods - 1 && pos_adj)
+			ofs = setup_bdle(substream, azx_dev, &bdl, ofs,
+					 period_bytes - pos_adj, 0);
+		else
+			ofs = setup_bdle(substream, azx_dev, &bdl, ofs,
+					 period_bytes, 1);
+		if (ofs < 0)
+			goto error;
 	}
 	return 0;
+
+ error:
+	snd_printk(KERN_ERR "Too many BDL entries: buffer=%d, period=%d\n",
+		   azx_dev->bufsize, period_bytes);
+	/* reset */
+	azx_sd_writel(azx_dev, SD_BDLPL, 0);
+	azx_sd_writel(azx_dev, SD_BDLPU, 0);
+	return -EINVAL;
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/