# Simple noise shaping DACs: some examples

A few days ago, @Brouhaha and I had a conversation on Twitter on simple noise shaping digital-to-analogue converters, a.k.a sigma-delta or delta-sigma DACs (depending on who you talk to).

These converters can generate high-quality analogue signals, such as CD-quality audio, by very quickly switching between a limited number of output voltages. One-bit DACs take this to the extreme: they can only output two voltages, e.g. 1V and -1V.

The advantage of such DACs is simplicity. A 1-bit DAC can be built using a single microcontroller or FPGA pin! It almost seems too good to be true — and it is!

Two big disadvantages are: 1) the fast switching causes a huge amount of noise to be generated, which we must take care of, and 2) the microcontroller or FPGA must be able to calculate and generate a new pin state at the switching frequency.

## Noise shaping

Naively generating a 1-bit output signal is very simple: we just take our desired signal and see if it’s value is larger or smaller than zero and output, for instance, 1V when it’s larger and -1V when it’s smaller. While the output signal will very coarsely resemble the desired signal, a lot of error (noise!) is present.

Luckily, and perhaps surprisingly, this noise can be moved, or shaped, to be vastly more pronounced at high frequencies than at low frequencies. A lowpass filter at the output of the DAC will then be able to remove much of the noise, resulting in a much cleaner signal. In fact, the audio DAC in your PC, tablet or phone you’re reading this on work on exactly these principles!

The noise shaping is achieved in three steps: 1) the 1-bit (digital) output signal is fed back and subtracted from the input signal, thereby generating a signal representing the error at the output. 2) the error is fed into an integrator, which forms an error accumulator: the output represent the total output error over time. 3) the accumulated error is quantised by the 1-bit quantiser and turned into an analogue signal.

The method outlined above ensures that, on average / observed over time, the 1-bit output signal tracks the desired input signal. In other words: if we apply some kind of moving averaging on the 1V/-1V output, we obtain the low-frequency desired signal. In practice, this averaging is done using an analogue filter, often in the form of one or two RC sections.

The more integrators are used, the higher the order of the noise shaping and the more noise is suppressed at low frequencies. Of course, the higher the order the more hardware and/or calculations are required. At a certain point, increasing the order has diminishing returns. It is rare to see noise shapers use more than seven integrators.

To keep things simple, we will limit our experiments to first- and second-order architectures.

## A 1-bit first-order DAC

Fig 1: A two-level (1-bit quantizer) first-order noise shaping DAC with dithering.

A first order DAC is shown in Fig 1. For testing purposes, we use a 1 kHz sine wave with a 75% amplitude (relative to full scale, i.e. -1 .. 1). The sample rate (fs) of the desired signal is chosen to be 50ksps and the desired output band is assumed to be from 0 to 10kHz.

Our performance metric is Signal-to-Noise-And-Distortion ratio (SINAD), which is expressed in dB. For reference: an ideal 16-bit converter should have a SINAD of around 96 dB when generating a single full-scale wave. An ideal 8-bit converter should have a SINAD of around 48 dB when generating a single full-scale sine wave.

The noise shaper operates at a clock rate of fclk = OSR*fs, where fs is the sample rate of the input signal and OSR is the oversampling factor. Common oversampling factors are 32, 64, 128 and 256.

The following plots were generated by calculating the output spectrum of the noise shaping DAC at different oversampling factors. All relevant system parameters are shown in the title of the plot:

The plots clearly show the desired signal (the peak at 1 kHz) and the noise, which is increasing with frequency. Note that the slope of the increase is independent of the oversampling ratio. But, the higher the oversampling ratio, the lower the noise at the 1kHz desired frequency. This is because, at higher oversampling ratios, there is more spectral room to put the noise, due to the Nyquist sampling theorem.

The effective number of bits (ENOB) for OSR = 32, 64, 128 and 256 are 7 bits, 9 bits, 10.5 bits and 12 bits, respectively.

## A 3-level (1.58-bit) first-order DAC

Instead of the usual 2-level quantizer, we can use a 3-level quantizer that outputs -1V, 0V and 1V. This will require at least two digital output pins but has the added benefit that a large-value DC decoupling capacitor can be avoided.

The architecture is exactly the same as the 2-level version, but is included here for completeness:

The following plots were generated by calculating the output spectrum of the noise shaping DAC at different oversampling factors. All relevant system parameters are shown in the title of the plot:

The noise level is around 1dB lower than in the 2-level case. It does not seem worth increasing the number of quantizer levels from 2 to 3 for noise reasons. In fact, I expect this performance gain is quickly lost because we now have to match the voltage levels of the two output pins. Any mismatch will result in additional in-band noise.

Given there is no noise advantage, the effective number of bits (ENOB) for this converter is approximately equal to the 2-level variant.

## A 1-bit second-order DAC

To get better noise shaping, moving to a second-order noise shaper architecture makes sense. The following block diagram show such a second-order device:

Note that I have added additional power-of-two gain factors. These are needed to limit the working range of the registers between -1.0 and 1.0. Any serious implementation will include saturation logic for each integrator.

Again, the system is evaluated by generating spectrum plots:

All the plots show an increase is noise shaping: the slope is steeper compared to the first-order architectures. As expected, the increased noise shaping also has a large positive effect on the SINAD figures. A second-order DAC with OSR=32 attains the same SINAD as a first-order converter with an OSR of 128!

A second-order OSR=128 DAC has a simulated/theoretical performance of a traditional 16-bit converter, after brick wall lowpass filtering at 10kHz.

The OSR=256 DAC theoretically has more than 18 effective bits!

The following plot shows part of the OSR=32 converter output and a filtered version (8th order butterworth filter):

## There be dragons here!

Wouldn’t it be nice if the actual performance was equal to the theoretical/simulated performance? Yes, yes it would — unfortunately, this will not be the case. Perhaps even more unfortunately, it won’t even be close!

There are a lot of limitations of the hardware that must be overcome in order to make good DACs. Some of these limitations are: high clock jitter sensitivity, non-zero ON resistance of the transistors in the output stage, slewing artifacts, out-of-band noise intermodulation and power supply noise injection, to name a few. Having to solve all these problems makes designing good DACs a highly skilled task indeed!

I write this not to keep the reader from experimenting, far from it. These are very nice architectures to explore, learn and have fun with! But be prepared and accept to get performance lower than your simulations show.

The bootloader for the HD6309 computer got enhanced today. It now supports SREC format as input, instead of the custom format:

# Retro Challenge #1

Today is April 1st; the start of the Retro Challenge 2017/04! (no joke).

During this Retro Challenge period, I am going to design and hopefully build a working computer around the HD6309 8-bit CPU.

I had ordered HD63C09P and HD63C09EP devices from different sellers on eBay. Some of the HD6309 CPUs showed up on my doorstep yesterday! Perfect timing!

# Details of the HD63C09P

The Hitachi HD6309 is a powerful 8-bit microprocessor that is an enhanced version of the well-known Motorola MC6809. The processor is pin compatible with the MC6809. Although there are some subtle differences, systems designed for the MC6809 will most likely also run with an HD6309.

HD6309P pinout

There are two versions of the HD6309; one with a built-in crystal oscillator/clock generator and one without. The latter version has the letter ‘E’ in the suffix of the part name, which probably means ‘External’. While the ‘E’ version is more flexible, it is also harder to use because it requires an externally generated quadrature clock and DMA handling logic — buyer beware!

The version I have is the HD63C09P, with the ‘C’ indicating that it will run at an internal clock rate of 3 MHz. The internal clock is generated from a 4x higher external clock or crystal connected to the XTAL and EXTAL pins.

The address space is 64K and all peripherals are memory mapped. Slow memory or peripherals can be accomodated by stretching the clock using the MRDY pin. The datasheet states that the maximum stretch duration is 5 microseconds. While this may be correct for the MC6809, I doubt this is true for a fully-static CMOS design like the HD6309.

What goes where in memory is mostly up to the system designer, except for the HD6309 interrupt vector table. This is fixed at the top of the memory map:

Two signals, BA and BS, tell the user about the state of the processor:

These signals are used by external DMA logic to figure out when the bus is available for use by another bus master.

## Internal structure

The internal structure of the HD6309 is shown by the figure below:

Note that this figure, taken from the official datasheet, does not show the extended features of the HD6309. It seems these features were never officially documented by Hitachi.

The programming model, showing the registers, is shown in the following figure:

Again, this is a copy from the official documentation. The HD6309 contains several additional registers not shown.

The 6309 has a two 8-bit accumulators (A and B) for computational use. These accumulators can by combined to form a 16-bit accumulator ‘D’. In addition, two 16-bit index registers, X and Y, are available for easy memory access. The CPU also has two stack pointers, S and U.

## Extensions

The HD6309 has more registers and instructions than a MC6809. Here is a brief overview of them:

• Two additional 8-bit registers: E and F.
• An additional 16-bit register W formed by combining E and F.
• A 32-bit register Q formed by combining D and W.
• 16×16 multiplication instruction.
• 32/16 bit and 16/8 bit hardware division instructions.
• Interruptable memory-to-memory block moves.
• Inter-register arithmetic and logical operations.
• Byte manipulation instructions.
• Single-bit operations.
• Several new indexed addressing modes.
• Illegal opcode trap interrupt.
• Division-by-zero trap interrupt.
• 16-bit arithmetic instructions.

When the 6309 starts or is reset, the CPU is in emulation mode; it behaves mostly as a 6809. Through a set of special instructions, the 6309 can be switched into native mode. In native mode, a few additional features become available and many instructions execute in fewer clock cycles.

You can find more information about all the different enhancements in ‘ The 6309 Book: Inside the 6309‘ by Chris Burke.

## Next..

This post was a general overview of the hardware and software capabilities of the HD6309 processor, mainly to get myself acquainted with the HD6309. The next steps are to figure out which peripherals are needed and what the memory map of the computer will look like.

# Introduction

In my day job, I develop electronic systems,  many of which have embedded microprocessors. The prototype or production systems need to be tested in both the prototype/development phase and during production at the manufacturing plant. These tests can be anything from “does this button work?” to “is the noise on this power rail below 50mV?”. In both cases, software on the system itself and an external PC is used to drive the tests. The external PC is the master controller and the system-under-test responds to command from the PC.

The commands sent from the PC are pretty simple: “pull this pin high/low” or “produce a sine wave on this DAC output”. So simple, in fact, that developing testing firmware for each product seems superfluous. There are, however, always specific tests unique to one product which means that, in practice, every product has its own testing firmware that has to be written an maintained.

In an attempt to reduce the workload of developing the testing firmware, I looked around for a more generic approach to writing testing software. I hope I’ve found a solution in Forth.

# Wat is Forth?

Forth is a stack-based extensible programming language and virtual machine invented by Charles Moore. A complete Forth system also has an interpreter through which programs can be interactively created, much like the BASIC interpreter on early personal computers.

The advantage of Forth is that most of the Forth system is programmed in a small subset of Forth itself, making it largely self hosted. This means that a Forth system can be created with modest effort, even in assembly language. In fact, this was one of the primary goals of the system — to be able to get a system up and running quickly when no other compilers or languages are available.

Nowadays, C compilers are ubiquitous and it might seem that there is no need for systems like Forth. However, Forth has one big advantage over compiled code: it is interactive.

In an automated hardware test setup, the interactivity is used by the external PC to control the device under test (DUT). This way, the test engineer can create, add or change test procedures without having to change the software running on the DUT — at least, that’s the idea!

# Forth stacks

The Forth system has two stacks: a data stack and a return stack. The data stack is by far the most important stack and is therefore referred to as the stack.

Forth commands, also called words, reside on the data stack. When a word is executed, the Forth virtual machine puts the next command’s address on the return stack. When the command has been executed, it pops the address from the return stack and continues executing the next command. The return stack is also used to store the parameters of do..while loops.

Say we want to add two numbers 123 and 456. In Forth this is achieved by entering:

123 456 +

The numbers 123 and 456 are pushed onto the stack first, followed by the forth word ‘+’. The virtual machine pops the first word off the stack, which is ‘+’ and executes it. In turn, the ‘+’ word pops two numbers from the stack, in this case 123 and 456, and adds them together, pushing the result onto the stack.

# Forth words

A Forth system has a number of primitive words. The actions associated with these words have been hard-coded into the virtual machine. Some do simple data or return stack manipulation, while others perform arithmetic operations or print data to the console.

A Forth system is extensible by defining new words, each of which are made up of words that have already been defined. These new words are added to the Forth dictionary.

For hardware testing, it makes sense to have primitive words for low-level functions such as configuring pins, reading from and writing to memory addresses, and writing to a debug UART. More high-level functions will be defined interactively to give the test engineer maximum flexibility whilst keeping the development effort of the Forth system to a minimum.

<to be continued>

# Extracting music from EMFINTRO

In October of 1991, a demo group called EMF released a small intro called ‘EMFINTRO’:

It was one of the first demos I remember seeing. This little gem featured excellent music by Purple Motion. The audio was generated through a DIY digital-to-analog converter made up of resistors connected to the parallel printer port of my PC. This crude music system was called a COVOX. At the time, it was a major step up from the beep-only speaker and I didn’t have a Soundblaster.

I always wanted to have the tracker module of the music but was never able to find it. Today I decided to try and extract the module from the executable.

First I downloaded and installed DOSBox and ran the intro. It worked and played the soundtrack through an emulated Soundblaster! Hoping for an easy score, I ran several module extractor programs on the executable. No luck there! The executable was compressed using PKLite, a popular exe-packer at the time.

Of course, this wasn’t a new problem. People developed tools to decompress the executables ever since PKLite was invented. Unfortunately, the EMF coders messed with the headers because none of the 7 tools I tried would unpack it. Some did nothing, some crashed DOSBox, still others didn’t recognise the PKLite version. So far for the easy way out…

It turns out that DOSBox has a version with a built-in debugger! So, I ran EMFINTRO and halted the simulated CPU by entering debug mode at the sound setup screen, hoping that the executable had decompressed itself completely in memory. Using the debugger, I dumped the entire 640K memory into a binary file and opened it up in a HEX viewer:

Succes! The executable has unpacked itself into memory. Now to search for the module..

Again, I ran module extraction software on the dumped binary file. No luck. It is either in a format that the tools don’t recognize, or it’s in a proprietary format. So I started searching for things that looked like a module; it’s surprising what you can find just by looking at the ASCII representation:

Bingo! There’s Purple Motion’s signature (PM) and a tracker name that I recognize: ScreamTracker. However, I’ve never seen this particular ‘!Scream!’ ID tag before. A bit of Googling revealed this to be a ScreamTracker 2 signature. I also found an example of a valid ScreamTracker 2 module (.STM) file and found out there should be 19 bytes containing the module name before the ‘!Scream!’ ID tag.

Knowing that module players generally don’t care how long the module file is; they’ll load only the bits they need, I had enough information to dump the tracker module.

Luckily there are still copies of ScreamTracker 2 on the internet! So, I downloaded it and installed it in DOSBox. Loading the dumped module into ST2 worked like a charm! It played it just like the intro itself!

To cut off the excess bytes of the original dump, I used ST2’s ‘save module’ to write the module to disk. Hey Presto! A 69Kb STM module appeared on my harddrive.

As a final test, I opened the module in OpenMPT, a modern module player. There are some differences in the way OpenMPT plays it. The channels are not panned correctly; something that is easy to fix. As an added bonus, I saved the module in ScreamTracker 3 format as this is more widely supported and has fewer playback bugs. Even good old WinAMP plays S3Ms more or less correctly but fails miserably playing the STM version.

I’ve submitted the STM and S3M versions to the Mod Archive. Or listen to it on SoundCloud.

# Pseudo sinusoidal waveform II

Mr Stenzel, CEO of Waldorf Music GmbH, decided to disclose his excellent pseudo sinusoidal formula: $4x \cdot (1-|x|)$.

Here is a plot of the function:

As you can see, this really looks like a sine wave! The following error plot reveals a maximum error of 0.056:

In terms of distortion, Mr. Stenzel is right. This function is 2.85 times better than the previous pseudo sine.

# Examining the Zyxel P2602H-D1A ADSL router

I found an old Zyxel ADSL modem in my junk pile and was going to throw it away. However, curiousity got the better of me and I cracked open the enclosure so I could see inside.

The interesting components are:
* Infineon ADM6996I 10M/100M ethernet switch/processor
* TI TNETV9-1PAG DC66ACTHW G4 chip near the telephone jacks
* 2x Silicon Labs Si3215-FM programmable CODEC with ringing
* Altera EPM3032A MAX3000 CPLD
* Eorex EM48AM1684VTA-75F 4MByte * 4 banks * 16 bits Synchronous DRAM
* EN29LV320B-70TCP 4 Megabit 256K x 16-bit Flash ROM (bottom of PCB)
* TI TNETD 7200ZZDW AR7W ADSL processor

On first glance, there doesn’t seem to be a lot of information on this line of processors. According to wikipedia https://en.wikipedia.org/wiki/TI-AR7 ,the processor is based on the MIPS 4KEc 32-Bit RISC processor core. Texas instruments sold the product line to Infineon. It was sold again to Lantiq.

There is little information about the booting process of the AR7, except for this tidbit: 4Kb PROM (0xBFC00000) and 4Kb RAM (0x80000000) on the chip for boot purposes.

I assume the PROM contains minimal code to boot the processor from the external flash chip. More info on the booting process: http://www.nulltrace.org/2013/04/mips-bootstrapping.html

On the PCB, there are a few populated headers that look suspiciously like JTAG or a serial console. More on that later…