Some notes on maze routers

A maze router is an algorithm that connects two or more points together on a grid. A cost function is used to make sure the connections are routed through the grid efficiently. By adapting the cost function can be defined to fit the routing problem at hand. For example, blockages can be avoided by making the cost function return positive infinity.

Maze routers have found application in routing electronic circuit boards and integrated circuits. The algorithm is not limited to this and can be adapted, through a problem-specific cost function, to many situations.

The original paper by Lee describing the algorithm, “An Algorithm for Path Connections and Its Applications”, IRE Transactions on Electronic Computers, EC-10 (2): 346–365, doi:10.1109/TEC.1961.5219222, is hard to read because of heavy use of mathematics and abstract notation. Luckily, Wikipedia offers a simpler description of the basic algorithm.

Several enhancements to the original algorithm were proposed. For example, Rubin discusses implementation details to make it more efficient by introducing a cost function that modifies the Lee algorithm from a breadth-first search into a more directed depth-first search. The paper is a lot more readable than the original Lee one and includes a concise description of the Lee maze routing algorithm:

In my view, the maze routing algorithm is really elegant and powerful; one might call it aMAZEing!

Retrochallenge 2019/10 #3

Booting FLEX

The FLEX operating system uses track 0 of each disk for its own housekeeping. Sectors 1 and 2 contain a boot program. Sector 3 holds the System Information Record (SIR), which tells FLEX the geometry of the disk, where the first empty sector is located and the number of empty sectors, among other things. The directory information is stored from sector 5 onward.

The boot process consists of three stages. First, the boot ROM loads sectors 1 and 2 of the disk into memory. These sectors contain a secondary boot loader. Second, the boot loader will load the operating system, contained in the FLEX.SYS file. Third, the boot loader jumps to FLEX entry point at address \$CD00.

The boot loader in ROM can be rather simple. All it has to do is load two raw sectors (512 bytes) into memory and jump to the correct entry point. There is some freedom here; the exact load address is not critical as long as the code doesn’t use the area where FLEX will be loaded later. In effect, memory located below \$C000 is safe to use.

The purpose of the secondary boot loader is to load the FLEX.SYS file. This file is formatted like any other FLEX binary; it consists of records which hold a load address and a record size followed by up to 252 payload bytes.

Unlike the ROM boot loader, the secondary loader must adhere to the FLEX disk layout as the FLEX.SYS can be fragmented on the disk. To simplify things a little, boot sectors 1 and 2 have a pointer to the starting track and sector where FLEX.SYS resides. The user can set this pointer using the LINK utility.

The track and sector of the FLEX.SYS file are written to track 0, sector 1 (the first sector on the disk), at offsets 5 and 6 respectively. This way, the secondary boot loader does not have to traverse the directory list, keeping the code small.

The new boot ROM and secondary boot loader were tested in the simulator:

Booting FLEX

The code for the secondary boot loader can be found here.

Retrochallenge 2019/10 #2

I’m glad to report the development of the new bootloader is coming along nicely. At this moment, it can view memory using the ‘M’ command, it can run a program using the ‘R’ command and it will accept Motorola s-record dumps via the ‘S’ command. This is a bit more fancy than the old one, which just prompted for an s-record dump.

The real benefit of the new bootloader is that it has an address table with ROM routine addresses, located at a fixed address. The entries of the table can change between different bootloader versions without affecting the operation of the user programs.

```; ==================================================
; ROM ROUTINE JUMP TABLE
; ==================================================

ORG \$FFA0
.dw CMDINT
.dw INCHAR
.dw INCHARE
.dw INCHECK
.dw OUTCHAR
.dw PRINTSTRING
.dw WRITEHEX```

The address table is located at \$FFA0 and contains addresses to the following routines:

• CMDINT: the command interpreter of the bootloader.
• INCHAR: read character from UART and put in A register.
• INCHARE: read character from UART with echo and put in A register.
• INCHECK: set zero flag to 1 if there are characters to be read from UART.
• OUTCHAR: write character in A register to the UART.
• PRINTSTRING: write a 0-terminated string, address in X reg, to the UART.
• WRITEHEX: write the value of A register as two-digit HEX to UART.

In the future, other ROM routines will be added.

Expansion port

During the 2017/04 Retrochallenge, I never got around to testing the I/O expansion port of the HD6309 computer. So I wrote a simple program to read or write to the expansion port.

A quick test with the oscilloscope revealed that, as expected, the I/O port did not work. There was a mistake in the output enable expression for the 74HCT541 bus driver, making sure that the data never actually arrived at the expansion port pins.

The fix was simple and quick; I installed Atmel WinCUPL, the software that turns the PLD file into a bitstream file my TL866 GAL/EPROM programmer understands, and verified via simulation my fix was indeed correct:

And indeed, the EXP_OE_N line goes low when the I/O expansion port is addressed.

After updating the I/O decoder GAL and checking with the oscilloscope I could see data arriving at the pins of the expansion port. One less problem to worry about!

An operating system

The updates to the bootloader ROM were aimed at getting more software to run on the HD6309. Loading software over the UART in s-record format, however, is not that convenient, even considering the standards at the time. The HD6309 really needs mass storage and an operating system.

There seemed to be two major operating systems for the 6809, namely OS9 and FLEX. Of the two, OS9 is the more complex, offering multi-user and multi-tasking as standard. For this to work, the CPU needs to be interrupted several times a second via timer hardware to switch tasks and do housekeeping. The HD6309 computer does not have such a timer, so porting OS9 to it is not feasible without additional hardware. So that leaves us with FLEX!

Finding the source code for FLEX turned out to be a lot more challenging than I had expected. After couple of nights scouring the web turning up only binaries, I finally found the source code to a 6809 version of FLEX. This will be the basis for the HD6309’s operating system.

There are lots of documents on FLEX online. I found a user manual, a programmer’s manual and even a porting manual. Things are really well documented, from the memory layout to the floppy disk format. This makes life a lot easier.

Next time…

That’s it for now. Next time, more on the porting of FLEX and mass storage solutions!

Retrochallenge 2019/10 #1

I’ve entered the 2019/10 Retrochallenge. Two years ago I built an 8-bit computer based on the HD6309 processor from Hitachi.

Now it needs some software. The onboard 4k ROM contains a bootloader which, at the moment, only accepts Motorola SREC dumps so one can load software via the UART.

It could really use a classic monitor and elaborate I/O routines to support an operating system. So, for the first part of the Retrochallenge, I’m going to develop a new boot ROM.

I’m developing the boot ROM using Visual Studio Code and LWTOOLS. Rather than burning EEPROMs and testing it on the real system, I coded a simple simulator based on Ray Bellis’ 6809 simulation library. You can find it here.

Simple noise shaping DACs: some examples

A few days ago, @Brouhaha and I had a conversation on Twitter on simple noise shaping digital-to-analogue converters, a.k.a sigma-delta or delta-sigma DACs (depending on who you talk to).

These converters can generate high-quality analogue signals, such as CD-quality audio, by very quickly switching between a limited number of output voltages. One-bit DACs take this to the extreme: they can only output two voltages, e.g. 1V and -1V.

The advantage of such DACs is simplicity. A 1-bit DAC can be built using a single microcontroller or FPGA pin! It almost seems too good to be true — and it is!

Two big disadvantages are: 1) the fast switching causes a huge amount of noise to be generated, which we must take care of, and 2) the microcontroller or FPGA must be able to calculate and generate a new pin state at the switching frequency.

Noise shaping

Naively generating a 1-bit output signal is very simple: we just take our desired signal and see if it’s value is larger or smaller than zero and output, for instance, 1V when it’s larger and -1V when it’s smaller. While the output signal will very coarsely resemble the desired signal, a lot of error (noise!) is present.

Luckily, and perhaps surprisingly, this noise can be moved, or shaped, to be vastly more pronounced at high frequencies than at low frequencies. A lowpass filter at the output of the DAC will then be able to remove much of the noise, resulting in a much cleaner signal. In fact, the audio DAC in your PC, tablet or phone you’re reading this on work on exactly these principles!

The noise shaping is achieved in three steps: 1) the 1-bit (digital) output signal is fed back and subtracted from the input signal, thereby generating a signal representing the error at the output. 2) the error is fed into an integrator, which forms an error accumulator: the output represent the total output error over time. 3) the accumulated error is quantised by the 1-bit quantiser and turned into an analogue signal.

The method outlined above ensures that, on average / observed over time, the 1-bit output signal tracks the desired input signal. In other words: if we apply some kind of moving averaging on the 1V/-1V output, we obtain the low-frequency desired signal. In practice, this averaging is done using an analogue filter, often in the form of one or two RC sections.

The more integrators are used, the higher the order of the noise shaping and the more noise is suppressed at low frequencies. Of course, the higher the order the more hardware and/or calculations are required. At a certain point, increasing the order has diminishing returns. It is rare to see noise shapers use more than seven integrators.

To keep things simple, we will limit our experiments to first- and second-order architectures.

A 1-bit first-order DAC

Fig 1: A two-level (1-bit quantizer) first-order noise shaping DAC with dithering.

A first order DAC is shown in Fig 1. For testing purposes, we use a 1 kHz sine wave with a 75% amplitude (relative to full scale, i.e. -1 .. 1). The sample rate (fs) of the desired signal is chosen to be 50ksps and the desired output band is assumed to be from 0 to 10kHz.

Our performance metric is Signal-to-Noise-And-Distortion ratio (SINAD), which is expressed in dB. For reference: an ideal 16-bit converter should have a SINAD of around 96 dB when generating a single full-scale wave. An ideal 8-bit converter should have a SINAD of around 48 dB when generating a single full-scale sine wave.

The noise shaper operates at a clock rate of fclk = OSR*fs, where fs is the sample rate of the input signal and OSR is the oversampling factor. Common oversampling factors are 32, 64, 128 and 256.

The following plots were generated by calculating the output spectrum of the noise shaping DAC at different oversampling factors. All relevant system parameters are shown in the title of the plot:

The plots clearly show the desired signal (the peak at 1 kHz) and the noise, which is increasing with frequency. Note that the slope of the increase is independent of the oversampling ratio. But, the higher the oversampling ratio, the lower the noise at the 1kHz desired frequency. This is because, at higher oversampling ratios, there is more spectral room to put the noise, due to the Nyquist sampling theorem.

The effective number of bits (ENOB) for OSR = 32, 64, 128 and 256 are 7 bits, 9 bits, 10.5 bits and 12 bits, respectively.

A 3-level (1.58-bit) first-order DAC

Instead of the usual 2-level quantizer, we can use a 3-level quantizer that outputs -1V, 0V and 1V. This will require at least two digital output pins but has the added benefit that a large-value DC decoupling capacitor can be avoided.

The architecture is exactly the same as the 2-level version, but is included here for completeness:

The following plots were generated by calculating the output spectrum of the noise shaping DAC at different oversampling factors. All relevant system parameters are shown in the title of the plot:

The noise level is around 1dB lower than in the 2-level case. It does not seem worth increasing the number of quantizer levels from 2 to 3 for noise reasons. In fact, I expect this performance gain is quickly lost because we now have to match the voltage levels of the two output pins. Any mismatch will result in additional in-band noise.

Given there is no noise advantage, the effective number of bits (ENOB) for this converter is approximately equal to the 2-level variant.

A 1-bit second-order DAC

To get better noise shaping, moving to a second-order noise shaper architecture makes sense. The following block diagram show such a second-order device:

Note that I have added additional power-of-two gain factors. These are needed to limit the working range of the registers between -1.0 and 1.0. Any serious implementation will include saturation logic for each integrator.

Again, the system is evaluated by generating spectrum plots:

All the plots show an increase is noise shaping: the slope is steeper compared to the first-order architectures. As expected, the increased noise shaping also has a large positive effect on the SINAD figures. A second-order DAC with OSR=32 attains the same SINAD as a first-order converter with an OSR of 128!

A second-order OSR=128 DAC has a simulated/theoretical performance of a traditional 16-bit converter, after brick wall lowpass filtering at 10kHz.

The OSR=256 DAC theoretically has more than 18 effective bits!

The following plot shows part of the OSR=32 converter output and a filtered version (8th order butterworth filter):

There be dragons here!

Wouldn’t it be nice if the actual performance was equal to the theoretical/simulated performance? Yes, yes it would — unfortunately, this will not be the case. Perhaps even more unfortunately, it won’t even be close!

There are a lot of limitations of the hardware that must be overcome in order to make good DACs. Some of these limitations are: high clock jitter sensitivity, non-zero ON resistance of the transistors in the output stage, slewing artifacts, out-of-band noise intermodulation and power supply noise injection, to name a few. Having to solve all these problems makes designing good DACs a highly skilled task indeed!

I write this not to keep the reader from experimenting, far from it. These are very nice architectures to explore, learn and have fun with! But be prepared and accept to get performance lower than your simulations show.

Retrochallenge 2017/10 #7: Final report

At the beginning of the Retrochallenge, I had the following goals:

• Make a Verilog implementation of the SP0256-AL2.
• Synthesize and test the design on a Terasic DE0 FPGA board.
• Cut corners.
• Learn Verilog.
• Have fun!

I think I succeeded on all five points.

The platform agnostic Verilog code for the Speech256 is available on GITHUB. I also have a Quartus II project available to run a demonstration on the Terasic DE0 board (no not Digilent, as I mentioned many times before.. derp).

I cut quite a few corners by not implementing the compressed ROM format of the SP0256-AL2, but using my own encoding and controller.

I learned Verilog in the process, although Clifford Wolf did have a few pointers on my coding style regarding non-blocking assignments…

And finally, I synthesized and tested the design on the DE0 board:

… and I had fun doing this; I hope you liked it too!

A big shout-out to John W. Linville for running the Retrochallenge 2017/10.

Until next time — Retrocompute!

Retrochallenge 2017/10 #6

Debugging

The last two days, I spent several hours debugging the filter engine. The filter didn’t want to behave, meaning the output values were going all over the place.

The debugging process involved setting up the filter to the ‘EH’ allophone and going through the changing filter states, one by one, to find the differences between a known/good model, which I had made in MATLAB, and the Verilog code. Using this method, I finally tracked this down to a signed/unsigned problem in the serial/parallel multiplier.

Will it float or will it sink?

The source-filter model, controller and top-level Speech256 block were mostly complete. To test the final design, all I needed to do was feed a string of allophones to the Speech256 top-level block and capture the simulated output to the DAC.

I set up the following allophones: 0x1B, 0x07, 0x2D, 0x35, 0x03, 0x2E, 0x1E, 0x33, 0x2D, 0x15, 0x03.

This is the result:

So YES! it floats!

Next: actually get it synthesized and running on the DE0 FPGA board…

Retrochallenge 2017/10 #5

I’ve been working on the controller part of the Speech256. The current status is that is it mostly working: it sends pitch, duration, voiced/unvoiced and filter coefficients control signals to the source-filter model.

Coefficient translation

To save command ROM space, the designers of the SP0256 command ROM chose to encode the filter coefficients in a 8-bit sign-magnitude format instead of the 10-bit format needed by the filter engine. To be clear, the Speech256 project does not use a copy of the SP0256 ROM; it uses a custom format. However, the bit-depth reduction technique is used to keep the control ROM small.

The SP0256 contains an expansion circuit that translates the 8-bit format into the 10-bit format. The SP0256 seems to use the same translation table as the SP0250, which is documented in its Applications Manual:

The table pertains to the (7-bit) magnitude part of the (8-bit) sign-magnitude only: positive and negative numbers are translated by the same table, keeping the sign untouched.

Although the manual claims it uses a lookup ROM, the content looks like a piece-wise linear curve which can therefore be implemented without a ROM. I found four lines (C1 .. C4) that represent the table quite accurately:
The following code snippet will translate a 7-bit magnitude into a 9-bit magnitude according to the table …

```if x < 38
return x*8;
if x < 69
return 301 + (x-38)*4;
if x < 97
return 425 + (x-69)*2;
if x < 128
return 481 + (x-97);
```

… except for x=2, where the result should be 12 instead of 16. Note that the C1 line also has a slight offset error but I think this translation function is most likely good enough.

The convenient factors-of-two scaling found, makes me suspect the function was meant to be implemented directly in logic, not in a ROM.

Time is running out, so I’d better get on implementing the above!

Retrochallenge 2017/10 #4

Another small update of the Retrochallenge 2017/10 Speech256 project. The source part is finished! It can produce both the pulse wave and noise:

Here’s the audio for the brave:

Noise generator

In case you were wondering, the linear feedback shift register used for the noise generator is a single Verilog line:

`lfsr = {lfsr[15:0], lfsr[16] ^ lfsr[2]}`

The updated Verilog source code is up on the Github page!

Next…

Up next will be integrating the filter engine and the source to complete the source-filter model of the synthesizer.

Retrochallenge 2017/10 #3

During the past few days I’ve managed to squeeze some time out of my schedule to make progress on the Verilog code for the Speech256 synthesizer.

Basically, I think I have the 12-pole filter working. This is the most complicated part of the project, with three shift registers holding the filter coefficients and the filter states:

The coefficient shift register is used as a roundabout, which avoids the use of an actual addressable structure, like a RAM. It also simplifies loading a new set of coefficients into the filter engine, as these can simply be served in a serial fashion, reducing the interface complexity to the Speech256 controller.

The multiplier shown is implemented using a shift-and-add technique, which has a reduced speed but a lower resource requirement compared to a fully parallel implementation.

The 12-pole filter control logic (not shown) was a bit tricky due to the serial nature of many of the components, see the meaningless screenshot here:

The most recent code is up on my Speech256 Github page..