Retrochallenge 2017/10 #1

Introduction

For this season’s Retro Challenge I’m going to make an implementation of the SP0256-AL2 speech synthesis chip in a Field Programmable Gate Array (FPGA).

photo_b.jpg

What is an SP0256-AL2?

In the late 70ies and early 80ies, many companies such as Texas Instruments, Votrax and General Instruments, produced speech synthesis chips. These chips ended up in various toys (speak-n-spell) and expansion products for home computers. Back then it was hot tech!

The SP0256-Al2 is a speech synthesis chip that was made by General Instruments. It has a built-in ROM containing 59 allophones, which are small snippets of speech. By concatenating allophones, words could be formed. It has a relatively simple 8-bit parallel interface and a built-in digital-to-analogue converter.

The SP0256-AL2 was used in a number of products, such as the the Tandy Speech/Sound catridge,the Speech 64 module for the Commodore 64 and the Currah uSpeech interface for the ZX Spectrum:

Why?

So why make an (FPGA) implementation of the SP0256-AL2?

  • I find the obsolete speech technology interesting.
  • SP0256-AL2 NOS ICs are expensive.
  • Stocks are going to run out in the future.
  • Some being sold are fake.
  • If I succeed, FPGA-based retro computer remakes can include this speech core.
  • I like a challenge.

Goals

The goals for this Retro Challenge I set for myself are the following:

  • Produce a working equivalent of the SP0256-AL2 on an Terasic DE0 board.
  • Use Verilog as the hardware description language (I mostly know VHDL).
  • Make the Verilog and supporting code available on Github.
  • Avoid fully parallel multipliers in the implementation so it will fit into small FPGAs.
  • Cut corners wherever possible.
  • Blog about the process.

 

Advertisements

Cortex M0 GPIO hardfault

When working on a Cortex M0+ board bring-up the other day, I kept running into a hardfault as soon as I tried to write to the GPIO subsystem. The C code looked like this:

    // enable clocks to GPIO
    SIM_SCGC5 = SIM_SCGC5_PORTA | SIM_SCGC5_PORTB |
                SIM_SCGC5_PORTC | SIM_SCGC5_PORTD |
                SIM_SCGC5_PORTE | SIM_SCGC5_LPTIMER;
    
    // set PIN E18 as output
    PORTE_PCR18 = 0x0100; // set MUX to digital out
    PORTE_PDDR = 0x40000; // 18

The exception was triggered by “PORTE_PCR18 = 0x0100;” and it took me a while to figure out was was going on.

The Cortex processors will throw a hardfault exception when writing to peripherals that don’t have their clock enabled. But I enabled the clock — or so I thought..

Like all RISC processors, commands issued by the core can take a few clock cycles to arrive at the peripherals, depending on the depth of the pipeline. Then there is the latency of the peripheral itself. In the case above, the clocks to the GPIO subsystem haven’t been enabled before the processor executes the write to PORT E and a hardfault is triggered.

The solution is to do something else between writing to SIM_SCGC5 and PORTE, or insert NOPs, like this:

    // enable clocks to GPIO
    SIM_SCGC5 = SIM_SCGC5_PORTA | SIM_SCGC5_PORTB |
                SIM_SCGC5_PORTC | SIM_SCGC5_PORTD |
                SIM_SCGC5_PORTE | SIM_SCGC5_LPTIMER;

    asm volatile ("nop");
    asm volatile ("nop");    

    // set PIN E18 as output
    PORTE_PCR18 = 0x0100; // set MUX to digital out
    PORTE_PDDR = 0x40000; // 18

No more hardfaults!

Freescale Kinetis MK20DX-series FLASH erasing.

I needed a small program to initialize two MK20DX256 CPUs that were acting up. Below is part of the code to erase the flash. The mk20dx128.h file is from Paul Stoffregen’s Teensy 3.x GIT repository. You also want his linker scripts (.ld). The code was tested on MK20DX128 and 256 CPUs. Use at your own risk!

/* 
    MK20DX128/256 flash init/write program.
    
    Niels A. Moseley, 2015.
    
    1) Disable the watchdog
    2) If needed, erase the FLASH protection sector & stop. (requires reboot to reinit FLASH)
    3) Fully erase the FLASH.
    
    Note: this program MUST be loaded into RAM. Use at your own risk!

    This code is released into the public domain.
*/

#include <mk20dx128.h>
#include <stdint.h>

// NOTE: the entire program should execute from RAM
// address 0x1FFF8000

#define size_t unsigned int;

inline void flash_word(uint32_t address, uint32_t word);
inline void flash_erase();
inline void flash_erase_zerosector();

void unused_isr() __attribute__((interrupt("IRQ")));
void hardfault_isr() __attribute__((interrupt("IRQ")));
void boot_error() __attribute__((interrupt("IRQ")));
void systick_isr() __attribute__((interrupt("IRQ")));

__attribute__ ((section(".ramcode")))
void unused_isr()
{
    asm("bkpt #0");
    while(1) {};
}

__attribute__ ((section(".ramcode")))
void hardfault_isr()
{
    asm("bkpt #0");
    while(1) {};
}

__attribute__ ((section(".ramcode")))
void boot_error()
{
    asm("bkpt #0");
    while(1) {};
}
__attribute__ ((section(".ramcode")))
void systick_isr()
{
    
}

// NOTE: all of the peripheral interrupt handlers were deleted.
// we don't need them!
__attribute__ ((section(".vectorsram"), used))
void (* const gVectors[])(void) =
{
    (uint32_t*)0x20008000,	    // 0 ARM: Initial Stack Pointer
    boot_error,	// 1 ARM: Initial Program Counter
    unused_isr,	// 2 ARM: Non-maskable Interrupt (NMI)
    hardfault_isr,	// 3 ARM: Hard Fault
    unused_isr,	// 4 ARM: MemManage Fault
    unused_isr,	// 5 ARM: Bus Fault
    unused_isr,	// 6 ARM: Usage Fault
    unused_isr,	// 7 --
    unused_isr,	// 8 --
    unused_isr,	// 9 --
    unused_isr,	// 10 --
    unused_isr,	// 11 ARM: Supervisor call (SVCall)
    unused_isr,	// 12 ARM: Debug Monitor
    unused_isr,	// 13 --
    unused_isr,	// 14 ARM: Pendable req serv(PendableSrvReq)
    systick_isr,	// 15 ARM: System tick timer (SysTick)
    
    unused_isr,	// 16 DMA channel 0 transfer complete
    unused_isr,	// 17 DMA channel 1 transfer complete
    unused_isr,	// 18 DMA channel 2 transfer complete
    unused_isr,	// 19 DMA channel 3 transfer complete
    unused_isr,	// 20 DMA channel 4 transfer complete
    unused_isr,	// 21 DMA channel 5 transfer complete
    unused_isr,	// 22 DMA channel 6 transfer complete
    unused_isr,	// 23 DMA channel 7 transfer complete
    unused_isr,	// 24 DMA channel 8 transfer complete
    unused_isr,	// 25 DMA channel 9 transfer complete
    unused_isr,	// 26 DMA channel 10 transfer complete
    unused_isr,	// 27 DMA channel 10 transfer complete
    unused_isr,	// 28 DMA channel 10 transfer complete
    unused_isr,	// 29 DMA channel 10 transfer complete
    unused_isr,	// 30 DMA channel 10 transfer complete
    unused_isr,	// 31 DMA channel 10 transfer complete
    unused_isr,	// 32 DMA error interrupt channel
    unused_isr,	// 33 --
    unused_isr,	// 34 Flash Memory Command complete
    unused_isr,	// 35 Flash Read collision
    unused_isr,	// 36 Low-voltage detect/warning
    unused_isr,	// 37 Low Leakage Wakeup
    unused_isr,	// 38 Both EWM and WDOG interrupt
    unused_isr,	// 39 --
    unused_isr,	// 40 I2C0
    unused_isr,	// 41 I2C1
    unused_isr,	// 42 SPI0
    unused_isr,	// 43 SPI1
    unused_isr,	// 44 --
    unused_isr,	// 45 CAN OR'ed Message buffer (0-15)
    unused_isr,	// 46 CAN Bus Off
    unused_isr,	// 47 CAN Error
    unused_isr,	// 48 CAN Transmit Warning
    unused_isr,	// 49 CAN Receive Warning
    unused_isr,	// 50 CAN Wake Up
    unused_isr,	// 51 I2S0 Transmit
    unused_isr,	// 52 I2S0 Receive
    unused_isr,	// 53 --
    unused_isr,	// 54 --
    unused_isr,	// 55 --
    unused_isr,	// 56 --
    unused_isr,	// 57 --
    unused_isr,	// 58 --
    unused_isr,	// 59 --
    unused_isr,	// 60 UART0 CEA709.1-B (LON) status
    unused_isr,	// 61 UART0 status
    unused_isr,	// 62 UART0 error
    unused_isr,	// 63 UART1 status
    unused_isr,	// 64 UART1 error
    unused_isr,	// 65 UART2 status
    unused_isr,	// 66 UART2 error
    unused_isr,	// 67 --
    unused_isr,	// 68 --
    unused_isr,	// 69 --
    unused_isr,	// 70 --
    unused_isr,	// 71 --
    unused_isr,	// 72 --
    unused_isr,	// 73 ADC0
    unused_isr,	// 74 ADC1
    unused_isr,	// 75 CMP0
    unused_isr,	// 76 CMP1
    unused_isr,	// 77 CMP2
    unused_isr,	// 78 FTM0
    unused_isr,	// 79 FTM1
    unused_isr,	// 80 FTM2
    unused_isr,	// 81 CMT
    unused_isr,	// 82 RTC Alarm interrupt
    unused_isr,	// 83 RTC Seconds interrupt
    unused_isr,	// 84 PIT Channel 0
    unused_isr,	// 85 PIT Channel 1
    unused_isr,	// 86 PIT Channel 2
    unused_isr,	// 87 PIT Channel 3
    unused_isr,	// 88 PDB Programmable Delay Block
    unused_isr,	// 89 USB OTG
    unused_isr,	// 90 USB Charger Detect
    unused_isr,	// 91 --
    unused_isr,	// 92 --
    unused_isr,	// 93 --
    unused_isr,	// 94 --
    unused_isr,	// 95 --
    unused_isr,	// 96 --
    unused_isr,	// 97 DAC0
    unused_isr,	// 98 --
    unused_isr,	// 99 TSI0
    unused_isr,	// 100 MCG
    unused_isr,	// 101 Low Power Timer
    unused_isr,	// 102 --
    unused_isr,	// 103 Pin detect (Port A)
    unused_isr,	// 104 Pin detect (Port B)
    unused_isr,	// 105 Pin detect (Port C)
    unused_isr,	// 106 Pin detect (Port D)
    unused_isr,	// 107 Pin detect (Port E)
    unused_isr,	// 108 --
    unused_isr,	// 109 --
    unused_isr,	// 110 Software interrupt    
};

__attribute__ ((section(".startramcode")))
void start(void)
{
    // disable Watchdog!
	WDOG_UNLOCK = WDOG_UNLOCK_SEQ1;
	WDOG_UNLOCK = WDOG_UNLOCK_SEQ2;
	asm volatile ("nop");
    asm volatile ("nop");    
    
	WDOG_STCTRLH = WDOG_STCTRLH_ALLOWUPDATE;
	SIM_SCGC5 = 0x00043F82;
    SIM_SCGC6 |= SIM_SCGC6_FTFL;

    // custom ISR vector table (table should be in RAM! 🙂 )    
    SCB_VTOR = gVectors;
    
    uint32_t *ptr = 0;
    if (*ptr != 0xFFFFFFFF)
    {
        uint32_t *protbits = (uint32_t*)0x040C;
        if (*protbits != 0xFEFFFFFF)
        {
            // mass erase won't work
            // first erase the flash sector that
            // contains the protection bits.
            flash_erase_zerosector();
        
            // unlock the flash!
            flash_word(0x040C, 0xfeffffff);
            // reboot needed!
        }
        else
        {
            // mass erase the flash!
            flash_erase();
        }
    }
    
    asm("bkpt #0");
    while (1) {};
}

__attribute__ ((section(".ramcode")))
void flash_word(uint32_t address, uint32_t word)
{    
    // wait for flash to be ready!
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {};
    
    // clear error flags
    FTFL_FSTAT = FTFL_FSTAT_RDCOLERR | FTFL_FSTAT_ACCERR | FTFL_FSTAT_FPVIOL;

    // program long word!    
    FTFL_FCCOB0 = 0x06;             // PGM
    FTFL_FCCOB1 = address >> 16;
    FTFL_FCCOB2 = address >> 8;
    FTFL_FCCOB3 = address;
    FTFL_FCCOB4 = word >> 24;
    FTFL_FCCOB5 = word >> 16;
    FTFL_FCCOB6 = word >> 8;
    FTFL_FCCOB7 = word;
    
    FTFL_FSTAT  = FTFL_FSTAT_CCIF;  // execute!
    
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {};    
}

__attribute__ ((section(".ramcode")))
void flash_erase()
{
    // wait for flash to be ready!
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {}; // wait
    
    // clear error flags
    FTFL_FSTAT = FTFL_FSTAT_RDCOLERR | FTFL_FSTAT_ACCERR | FTFL_FSTAT_FPVIOL;

    // mass erase!
    FTFL_FCCOB0 = 0x44;             // Erase!
    
    FTFL_FSTAT  = FTFL_FSTAT_CCIF;  // execute!
    
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {}; // wait
    
    // unlock flash!
    flash_word(0x040C, 0xfeffffff);
}

__attribute__ ((section(".ramcode")))
void flash_erase_zerosector()
{
    // wait for flash to be ready!
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {}; // wait
    
    // clear error flags
    FTFL_FSTAT = FTFL_FSTAT_RDCOLERR | FTFL_FSTAT_ACCERR | FTFL_FSTAT_FPVIOL;

    // erase sector
    FTFL_FCCOB0 = 0x09;
    FTFL_FCCOB1 = 0;
    FTFL_FCCOB2 = 0;
    FTFL_FCCOB3 = 0;
    
    FTFL_FSTAT  = FTFL_FSTAT_CCIF;  // execute!
    
    while((FTFL_FSTAT & FTFL_FSTAT_CCIF) != FTFL_FSTAT_CCIF) {}; // wait
}

Freescale Kinetis K20 and openOCD, trouble at ‘mill?

In my day job I develop hardware and firmware for sensor networks. Currently, I’m working on underwater acoustic MODEMs. The most recent design I made is a dual Kinetis K20 / ARM Cortex M4 based node which uses one M4 CPU as the system controller and the other as the core DSP/MODEM processor. The processors are daisy-chained via their JTAG interfaces for easy programming and debugging — or so I thought..

When the fully-populated PCBs returned from the board house a few weeks ago, there were some minor production issues but, on the whole, the board looked fine. After a visual inspection, I proceeded to try to load a test program into the CPUs using openOCD 0.7.0 and an Olimex ARM-USB-OCD-H.

ImageOpenOCD correctly detected my two K20 devices on the JTAG bus but after that, it started spewing all kinds of “JTAG-DP STICKY” errors. Many hours of scope probing, source code staring and documentation reading later, it became clear that the K20 CPUs will reset themselves upon executing an invalid instruction and/or a watchdog time-out — nothing out of the ordinary here. However, the K20 CPUs will also drive their own /RESET lines!

In the design, I tied the /RESET lines of the CPUs together to make a single system reset. This is common practice and fits with the fact that JTAG interface pods only features a single /RESET line. In the case of two unprogrammed/blank K20s, things don’t work because when the first CPU comes out of reset for programming, the other will too. The second processor has no valid code and isn’t being controlled by openOCD. This results in invalid instructions being executed and a (global!) reset occurring. Needless to say, this isn’t good.

As a work-around, I developed my own flash programming firmware which works completely from RAM. The firmware disables the watchdog timer, performs a mass erase, set the flash protection bits, and flashes a small program that makes the processor go into a spin lock at start-up. Using openOCD, I can now flash valid code into the two CPUs simultaneously. For this to work, I also had to remove the ‘cortex_m3 reset_config’ line from the K20.cfg openOCD configuration file so openOCD will drive the /RESET pin, instead of producing a core interrupt for resetting.

From the openOCD mailing list, I see that more people are running into this /RESET problem on the Kinetis series and I suspect Freescale isn’t making friends with this /RESET driving business. Hopefully this post will prevent people from spending hours trying to figure out what is wrong with their Kinetis-based design, especially when two or more processors are linked using JTAG and share a common /RESET line.

I still have trouble programming the second Kinetis K20 flash when openOCD knows about the flash in the first CPU. The work-around here is to not define the flash for the first CPU. This “feature” is still present in openOCD 0.8.0 rc1 it seems.