# Pseudo sinusoidal waveform

I needed a small routine to generate a cyclic waveform resembling sin(x) to test a CODEC board. This is what I came up with:

float pseudoSin(float x)
{
return (1.0f-x*x)*x
}


This produces the following waveform over the domain [-1,1]:

The derivative of the function is $1-3x^2$. The maximum and minimum occur at $x = \sqrt{ \frac{1}{3} }$ and $x = -\sqrt{ \frac{1}{3} }$, respectively. The function has maximum and minimum values of $\frac{2}{3} \sqrt{ \frac{1}{3} } \approx 0.38490$ and $-\frac{2}{3} \sqrt{ \frac{1}{3} } \approx -0.38490$.

The amplitude normalisation factor is approximately 2.5981. For completeness, here is the error of the function with respect to a real sin(x):

The error maximum is about 16.3%.

# Introduction

In the past, I’ve written several blog posts on calculating sin(x) and cos(x) using polynomials. These methods can be very accurate but require many multiplications and additions. For usage in hand-held calculators, where a few evaluations per second are enough, these algorithms are fine. However, in high-speed signal processing and generation applications, such as software-defined radio or test equipment, a less computationally intensive method is required.

A simple way to generate waveforms is to read the contents of a table, which resides in memory, at a constant speed. Usually, a single cycle of the desired waveform is stored in the table. When the table has $N$ number of entries and the table entries are read at a speed of $F_s$ entries per second, the frequency of the generated waveform will be $F=F_s/N$ Hz.

# Algorithm #1: Table lookup

To generate a sine (or cosine) wave, we must simply fill the table with sin(x). The table entries $(k=0 \ldots N-1)$ are thus: $table[k] = \sin \left (2 \cdot \pi \cdot \frac{k}{N} \right )$.

/* Get integer types with well-defined number of bits */
#include <stdint.h>
#include <stdio.h>
#include <math.h>

/* Define a table of 256 entries containing a single cycle waveform.
The table entries are not shown.
*/
float table[256];

/* Fill the table with a single sin(x) cycle */
void fillTable()
{
for(uint32_t i=0; i<256; i++)
table[i] = sin(2.0*3.1415927*(float)i/256.0);
}

int main(int, char**)
{
uint32_t sampleRate=1000;    // sample rate of 1000 samples-per-second
uint32_t desiredFreq=199;    // desired generated frequency
uint32_t phaseAccumulator=0; // fixed-point (16.16) phase accumulator

fillTable();

/* fixed-point (16.16) phase increment */
uint32_t phaseIncrement=(256*65536*desiredFreq/sampleRate);

while(true)        // loop forever
{
/* Increment the phase accumulator */
phaseAccumulator += phaseIncrement;

/* Limit the phase accumulator to 24 bits.
The lower 16 bits are the fractional table
index part, while the remaining 8 bits
are the integer index into the waveform
table.
*/
phaseAccumulator &= (256*65536)-1;

/* Read the table by removing right shifting
the lowest (fractional) bits into oblivion
and we are left with the remaining 8 bits.
This truncates the table index, without
rounding. */
uint32_t tableIndex = phaseAccumulator >> 16;
printf("%f\n", table[tableIndex]);
}
}



The code above will generate a sine wave and print the result to the console. The phase accumulator technique is basically an counter that periodically “overflows”, i.e. restarts, when the end of the table is reached. This is done by removing the highest bits from the 32-bit integer. For each new output sample, the phase accumulator is increased by the phase increment. The phase increment determines the frequency. This way, the frequency produced is independent from the number of table entries.

Which entry from the table is read is determined by bit-shifting the phase accumulator to the right by 16, which effectively removes the lower 16 bits from the 24-bit accumulator, leaving an 8-bit table index. The index will therefore always be within 0 .. 255!

The lower 16 bits of the phase accumulator represent the desired read position of the waveform “between” the table entries. In the example code above, this information is discarded. We will come back to this later.

## Evaluation: Algorithm #1

To evaluate the spectral purity of the generated sine wave, the output of the example program was analyzed using the FFT. The following plot shows the results:

The spectral purity of the signal is not spectacular, to say the least. There are strong spurious responses as high as -50 dBc. Relatively large errors are produced because of the finite length of the table.

# Algorithm #2: Table lookup and interpolation

We can easily get better results without increasing the number of table entries significantly by interpolating between the table values; we have the fractional part! The following algorithm uses linear interpolation:

/* Get integer types with well-defined number of bits */
#include <stdint.h>
#include <stdio.h>
#include <math.h>

/* Define a table of 257! entries containing a single cycle waveform.
The additional table entry makes sure that table[index+1] never
causes an out-of-bound read.

The table entries are not shown.
Note: we now set table[256] = table[0]!
*/
float table[257];

/* Fill the table with a single sin(x) cycle */
void fillTable()
{
for(uint32_t i=0; i<256; i++)
table[i] = sin(2.0*3.1415927*(float)i/256.0);
table[256] = table[0];
}

int main(int, char**)
{
uint32_t sampleRate=1000;    // sample rate of 1000 samples-per-second
uint32_t desiredFreq=199;    // desired generated frequency
uint32_t phaseAccumulator=0; // fixed-point (16.16) phase accumulator

fillTable();

/* fixed-point (16.16) phase increment */
uint32_t phaseIncrement=(256*65536*desiredFreq/sampleRate);

while(true)        // loop forever
{
/* Increment the phase accumulator */
phaseAccumulator += phaseIncrement;

/* Limit the phase accumulator to 24 bits.
The lower 16 bits are the fractional table
index part, while the remaining 8 bits
are the integer index into the waveform
table.
*/
phaseAccumulator &= (256*65536)-1;

/* Calculate the table index. */
uint32_t index = phaseAccumulator >> 16;

/* Get the table entry and the one
directly following it.
*/
float v1 = table[index];
float v2 = table[index+1];

/* turn the fixed-point fractional part into
a float-point representation so we can
use it. */
float fmul = static_cast<float>(phaseAccumulator & 65535)/65536.0f;

/* perform the linear interpolation */
float out = v1 + (v2-v1)*fmul;

printf("%f\n", out);
}
}



## Evaluation: Algorithm #2

Using the same method as earlier, the performance of the algorithm is evaluated, resulting in the following figure:

Clearly, the linear interpolation increases the spectral purity significantly! There are fewer spurious responses and the highest one is better than -90 dBc. For software radio applications this good enough.

# Algorithm #3: Table lookup and circular interpolation

There are certain applications where -90 dBc of spectral purity isn’t enough, for instance, in test equipment, very high purity digital down-converters are needed for precise measurements, e.g. to reduce noise folding. To increase the spectral purity, we could increase the table size, or we could choose a better interpolation method.

One way of getting a better interpolation method would be to use following identity:

$\sin(A+B) = \sin(A) \cdot \cos(B) + \cos(A) \cdot sin(B)$

If we choose A to be derived from the upper 8 bits (integer part) of the phase accumulator, we can lookup $\sin(A)$ and $\cos(A)$ using our table and these lookups will be exact. Note that $\cos(A) = \sin(A + \frac{\pi}{2})$, so $\cos(A)$ is simply found 64 entries further in the table. Then, B is the fractional angle, which is represented by the lower 16 bits of the phase accumulator. As the fractional angle is always somewhere between the table entries, we cannot use our lookup table. We could use a second lookup table or we can use the well-known approximations of $\sin(x)$ and $\cos(x)$ for small angles:

$\sin(x) \approx x$ and $\cos(x) \approx 1-\frac{x^2}{2}$.

The following code example uses this principle:

/*Get integer types with well-defined number of bits */
#include <stdint.h>
#include <stdio.h>
#include <math.h>
/* Define a table of 256 entries containing a single cycle waveform.
The table entries are not shown.
*/
float table[256];

/* Fill the table with a single sin(x) cycle */
void fillTable()
{
for(uint32_t i=0; i<256; i++)
table[i] = sin(2.0*3.1415927*(float)i/256.0);
}

int main(int, char**)
{
uint32_t sampleRate=1000;    // sample rate of 1000 samples-per-second
uint32_t desiredFreq=199;    // desired generated frequency
uint32_t phaseAccumulator=0; // fixed-point (16.16) phase accumulator
fillTable();
/* fixed-point (16.16) phase increment */
uint32_t phaseIncrement=(256*65536*desiredFreq/sampleRate);

//while(true)        // loop forever
for(uint32_t i=0; i<65536; i++)
{
/* Increment the phase accumulator */
phaseAccumulator += phaseIncrement;
/* Limit the phase accumulator to 24 bits.
The lower 16 bits are the fractional table
index part, while the remaining 8 bits
are the integer index into the waveform
table.
*/
phaseAccumulator &= (256*65536)-1;

/* Calculate the table index. */
uint32_t index = phaseAccumulator >> 16;

/* Get the table entry and the one
directly following it.
*/
float v_sin = table[index];
float v_cos = table[(index+64) & 255];
float frac  = 2.0f*3.1415927f*static_cast<float>(phaseAccumulator & 65535)/65536.0f/256.0f;

// fractional sin/cos
float f_sin = frac;
float f_cos = 1.0f - 0.5f*frac*frac;

float result = v_sin*f_cos + v_cos*f_sin;
printf("%f\n", result);
}
}


## Evaluation: Algorithm #3

Again, the waveform was evaluated. The results are shown in the figure below:

The circular interpolation gives an improvement over linear interpolation at the cost of an additional table lookup and a few multiplications and additions. The spurious responses are now found at around -128 dBc, which is 38 dB better.

Further experiments show that increasing the table size to 512 entries produces spurious responses that are below -147 dBc. A table size of 128 entries gives responses below -110 dBc.

# Conclusions

The table-lookup and interpolate method of signal generation is a powerful way to generate high-purity signals without resorting to complex approximations.With simple linear interpolation, a 256-entry table can produce sine and cosine waveforms with spurious responses that are around 90 dB below the carrier. At the expense of a few additional operations, spurious responses can be as low as -128 dBc by using circular interpolation.

The presented algorithms are also well suited to generate quadrature signals as both sine and cosine waveforms can be generated simultaneously.

# Sin(x) and Cos(x) based on CORDIC

During the 50ies, Jack Volder invented the CORDIC algorithm. This clever algorithm can calculate hyperbolic and trigonometric functions, and do vector rotations. A highly desirable property of the algorithm is that it does not use any multiplications; it only needs add, subtract and right-shift operations and a small lookup-table.

While other, more cycle efficient, ways exist to calculate these functions on a general purpose CPU or a DSP, the CORDIC algorithm still has its place in small devices, such as 8-bit systems, or in very high speed systems, such as digital down-converters and numerically controlled oscillators.

As an experiment I implemented the CORDIC algorithm in C to calculate sin(x) and cos(x) using BCD arithmetic. I chose BCD arithmetic so I could get arbitrary precision, even on 8-bit processors.

Currently, the accuracy of the results is limited by the precision of the lookup-table but it already exceeds the capability of the built-in float-point processor of my PC.

The code is available on GitHub under the MIT license for your enjoyment!

# Introduction

The previous posts on sin() approximations has made me wonder which methods are implemented on 8-bit processors. So I took a look at both the AVR C library for the 8-bit megaAVR (used in the Arduino Uno) and the ZX Spectrum (an early Z80-based 8-bit computer) ROM disassembly.

# Sin(x): The AVR C lib version

After downloading the most recent sources from the Subversion repository, I started going through the directory tree to get a sense of what can be found where. The files of interest are located in the avr-libc/libm/fplib/ directory; all of them are written in AVR assembly language.

The first routine to look at is sin which can be found in sin.S (gee, what gave it away?). This routine calls __fp_rempio2, which changes the argument so it is within the right range for calling the actual sin() calculating function, __fp_sinus.

The __fp_sinus function does some more domain manipulation after which it starts setting up pointers to a table holding six coefficients:

Coefficient Value
$c_0$ 1.0000000000
$c_1$ -0.1666666664
$c_2$ 0.0083333315
$c_3$ -0.0001984090
$c_4$ 0.0000027526
$c_5$ -0.0000000239

After the setup code, __fp_powsodd is called, which evaluates the following polynomial with odd powers only:

$P(x) = c_0\cdot x + c_1\cdot x^3 + c_2\cdot x^5 + c_3\cdot x^7 + c_4\cdot x^9 + c_5\cdot x^{11}$.

The careful observer will have noticed that the coefficients are those of the Taylor series for $\sin(x)$.

## Performance

The maximum error of the Taylor approximation is determined by the first truncated term, which is then $\frac{(2\pi \cdot 0.25)^{13}}{13!} \approx 5.6922 \cdot 10^{-8}$.

Summarizing, the AVR lib C uses an 11th-order polynomial based on the Taylor series and domain symmetry to calculate sin(x).

# Sin(x): Z80 / ZX Spectrum version

According to the Complete Spectrum ROM Disassembly the main part of the sin() function is the series generator routine. This routine evaluates Chebyshev polynomials up to a specified order. The calculation of sin() is based on the first six polynomials:

$T_0(x) = 1$
$T_1(x) = x$
$T_2(x) = 2x^2 - 1$
$T_3(x) = 4x^3 -3x$
$T_4(x) = 8x^4 -8x^2 +1$
$T_5(x) = 16x^5 -20x^3 +5x$

These polynomials are weighted using the following coefficients:

Coefficient Value
$c_0$ 1.276278962
$c_1$ -.142630785
$c_2$ 0.004559008
$c_3$ -.000068294
$c_4$ 0.000000592
$c_5$ -.000000003

The approximation polynomial is equal to:

$P(z) = c_0 \cdot T_0(z) + 2c_1 \cdot T_1(z) + 2c_2 \cdot T_2(z) + 2c_3 \cdot T_3(z) + 2c_4 \cdot T_4(z) + \ldots$
$\qquad + 2c_5 \cdot T_5(z)$

,where $z=2w^2-1$ and $w=4x$ is between -1 and 1.

Finally, the approximation is calculated by $\sin(2\pi\cdot x) \approx P(z)\cdot w$.

The interesting thing about the ZX Spectrum code is that the Chebyshev polynomials $T_0 \ldots T_5$ are not evaluated one at a time, but that the entire calculation of $P(z)$ is done in one go. This code is almost as efficient as evaluating one polynomial!

## Performance

The performance of this approximation is very good, as shown by the following graphs:

The maximum error is approximately $-2\cdot 10^{-9}$, which is better than the AVR version! Not bad for an oldskool machine!

# Conclusion

Sometimes older is better!

BOCHS, the IA-32 CPU emulator, uses a 17th-order Taylor polynomial to calculate sin().

# Introduction

In the previous posts [1,2] about $\sin(2\pi \cdot x)$ approximations, I quickly discarded the Taylor series believing their approximation error characteristics were not ideal. However, this was based on using the Taylor derived polynomial for the entire domain $x \in [0 \ldots 1)$. In the ‘better sin(x)‘ post, the domain was divided in four parts and the symmetry of sin(x) was exploited to reduce the approximation error. This trick is, of course, also possible with Taylor-based polynomial.

I tried it with the following results..

# Taylor series of $\sin(2\pi \cdot x)$

The Taylor series of $\sin (2\pi \cdot x)$ developed around $x=0$ is:

$P(x) = 2\pi\cdot x - \frac{8\pi^3}{3!} \cdot x^3 + \frac{32\pi^5}{5!} \cdot x^5 - \ldots,$

which, given $N$ terms, can be compactly written as:

$P(x) = \sum_{i=1}^{N} (-1)^{(i+1)} \cdot {(2\pi)^{(2i - 1)} \over {(2i-1)!}} x^{(2i-1)}$.

A very useful property of this particular series is that the sign of each term is the opposite of the one that comes before it. In addition, the next term is always smaller than the previous one. This means that if we truncate the series at a certain term, the total approximation error is always smaller or equal to the first term we left out. So, we can choose the number of terms to include based on the desired approximation error!

The approximation error of our Taylor series is largest at the largest value of $x$ we’re going to use. By diving the domain by four, the largest value of $x$ we’ll encounter is $\frac{1}{4}$. Knowing this, we can make a table of the approximation error versus the number of terms included in our polynomial:

Terms, N P(x) order max. error
1 1 0.64596
2 3 0.07969
3 5 0.00468
4 7 0.00016
5 9 3.60e-6

Clearly, the maximum approximation error drops rapidly as the order of the polynomial is increased.

# Performance evaluation

## The third-order Taylor polynomial versus the ‘better sin(x)’ version.

Which approximation is better, the one from the previous post or a 3rd order Taylor approximation?

The following graph shows the spectrum of the approximated sinusoidal wave. It was generated using an 65536-point FFT and 2129 sine wave periods.

The third harmonic can be found at -35.0 dBc, which is 11.9 dB worse. The signal-to-noise ratio (SNR) of the Taylor-based version is 33.2 dB, which is 11.7 dB worse. Summarizing, you’re better off with the ‘better sin(x)’ approximation version.

## Ninth-order Taylor performance

If you want a high-purity sine oscillator and have cycles to spare, but don’t want the memory footprint of a table-lookup oscillator, you can use a 9th-order Taylor approximation.

Here are the performance graphs for the approximation:

The following graph shows the spectrum of the approximated sinusoidal wave. It was generated using an 65536-point FFT and 2129 sine wave periods.

The SNR is around 121.2 dB when all calculations are implemented using floating-point arithmetic.

# Conclusions

The Taylor series are well suited for approximating $\sin(2\pi \cdot x)$, especially when the domain is kept small. The precision of the approximation is easily determined and can be increased simply by adding additional terms. A 9th order polynomial is capable of reaching professional audio quality, in terms of SNR.

# Introduction

Last time I showed a simple algorithm for approximating $\sin(2 \cdot \pi \cdot x)$. The third harmonic was -28.6 dBc, which is not good enough for high-quality sound synthesis.

In this post, I will derive a more accurate approximation polynomial.

# Method

The first change we will make in the algorithm is to split the domain $[0..1)$ into four pieces, instead of two, and exploit additional symmetry properties of the sine function. This well-known trick allows us to decrease the input range over which our polynomial $P(x)$ must be accurate.

Now, we pick the order of the polynomial to use. Our previous algorithm used a second-order polynomial so we’ll try a third-order one here. The third-order polynomial has four coefficients, so we’ll be able to enforce four constraints:

$P(x) = \alpha_3 \cdot x^3 + \alpha_2 \cdot x^2 + \alpha_1 \cdot x + \alpha_0$.

## Setting constraints

To ensure the curve starts and ends at the correct amplitude, we should constrain the end points, i.e. $P(0)$ and $P(\frac{1}{4})$. These should be equal to $\sin \left( 2 \cdot \pi \cdot 0 \right )$ and $\sin \left( 2 \cdot \pi \cdot \frac{1}{4} \right )$, respectively. This leads to the following constraints:

$P(0) = 0$ and $P(\frac{1}{4}) = 1$.

Now, we still have two addition constraints we can set. We could either choose two additional points our polynomial must intersect, or we could set two constraints on the derivative of the polynomial, $P'(x)$:

$P'(x) = 3 \cdot \alpha_3 \cdot x^2 + 2 \cdot \alpha_2 \cdot x + \alpha_1$.

I chose to constrain the derivatives at the end points because it makes the piece-wise sin approximation continuous in the first derivative, which is desirable from a spectral point-of-view.

The derivative of $\sin(2\cdot \pi \cdot x)$ is $2 \cdot \pi \cdot \cos(2 \cdot \pi \cdot x)$ and we equate the derivative of the polynomial at $x=0$ and $x=\frac{1}{4}$ to it:

$P'(0) = 2 \cdot \pi$ and $P'(\frac{1}{4}) = 0$.

## Solving for the coefficients

Given our constraints, we can now solve the system of equations. The system we must solve is:

$\alpha_3 \cdot 0^3 + \alpha_2 \cdot 0^2 + \alpha_1 \cdot 0 + \alpha_0 = 0$.
$\alpha_3 \cdot \frac{1}{4}^3 + \alpha_2 \cdot \frac{1}{4}^2 + \alpha_1 \cdot \frac{1}{4} + \alpha_0 = 1$.
$3 \cdot \alpha_3 \cdot 0^2 + 2 \cdot \alpha_2 \cdot 0 + \alpha_1 = 2 \cdot \pi$.
$3 \cdot \alpha_3 \cdot \frac{1}{4}^2 + 2 \cdot \alpha_2 \cdot \frac{1}{4} + \alpha_1 = 0$,

The first and third equations give values for $\alpha_0 = 0$ and $\alpha_1 = 2 \cdot \pi$. Using these values and simplifying leads to:

$\alpha_3 \cdot \frac{1}{64} + \alpha_2 \cdot \frac{1}{16} + \frac{1}{2}\pi = 1$.
$\alpha_3 \cdot \frac{3}{16} + \alpha_2 \cdot \frac{1}{2} + 2\pi = 0$,

which, after some manipulation, gives:

$\alpha_3 = 32\pi - 128 \approx -27.469$ and $\alpha_2 = -16\pi + 48 \approx -2.2655$.

# How good is it?

The polynomial P(x) follows the $\sin(2\cdot \pi \cdot x)$ curve (for $x=[0..\frac{1}{4}]$) very accurately:

As can be seen from the error graph below, the error is less than +/- 1.1% (with respect to full-scale):

The following graph shows the spectrum of the approximated sinusoidal wave. It was generated using an 65536-point FFT and 2129 sine wave periods:

The spectrum graph shows that the third harmonic, which is found at -46.9 dBc, is much lower than the simple sine approximation from the previous post.

# Show us the code!

/*
A better sin(2*pi*x) approximation.
The caller must make sure that the argument lies
within [0..1).

Note: this can be optimized further by using Horner's
rule for evaluating polynomials, see:

http://en.wikipedia.org/wiki/Horner%27s_method

Omitted here for clarity.
*/

float sin_poly(float x)
{
const float a3 = -27.469f;
const float a2 = -2.2655f;
const float a1 = 6.2832f;

return a3*x*x*x + a2*x*x + a1*x;
}

float better_sin(float x)
{
if (x > 0.75f)
return -sin_poly(1.0f - x);
else if (x  > 0.5f)
return -sin_poly(x - 0.5f);
else if (x > 0.25f)
return sin_poly(0.5f - x);
else
return sin_poly(x);
};


# Conclusion

This approximation is quite good and it has a spectral purity approximately equivalent to a 7-bit signal.

# Introduction

When implementing digital oscillators or filters, a simple way of calculating $\sin(2\pi \cdot x)$ for $x=[0..1)$ is needed. Efficiency, in terms of the number of CPU cycles or hardware complexity, is favoured above accuracy. Several methods are known that produce good approximations to $\sin(\pi \cdot x)$. These methods can be divided into three categories: table-lookup algorithms, approximation functions and recursive resonator algorithms.

Table-lookup algorithms are cycle efficient and their accuracy can be easily increased by increasing the number of table entries. Accuracy can also be increased by introducing interpolation between table entries.

In some memory limited situations, such as small microcontroller and DSP targets, a table is undesirable and an approximation based on functions is preferred.

Here, I will show how to approximate $\sin(2 \cdot \pi\cdot x)$ based on a simple polynomial.

# Approximating sin(2*pi*x)

The well known Taylor series seems a good candidate for obtaining a polynomial that approximates $\sin(2\pi \cdot x)$ well. However, the approximation gets worse the further away the function argument gets from the point from which the series was derived. This generates a discontinuity when $\sin(2\pi \cdot x)$ “wraps around”. To avoid such discontinuities, the approximation must produce the same value for $x=0$ and $x=1$.

An interesting candidate is the polynomial $-16x^2 + 8x$, which approximates $\sin(2\pi \cdot x)$ quite well for $x=[0..\frac{1}{2}]$. The other half of the sinusoid is simply generated by mirroring the polynomial and adjusting for the offset, resulting in $16x^2 - 24x + 8$.

# How good is it?

The approximation is accurate to within +/- 6% of the total amplitude as is shown by the graphs below:

In sound synthesis and radio applications it is desirable to have low spurious tones. The following graph shows the spectrum of the approximated sinusoidal wave. It was generated using an 65536-point FFT and 2129 sine wave periods.

The spectrum is free of all even harmonics as the approximation is point symmetrical and the “grass” is due to high frequency components aliasing to lower frequencies. The first major spurious response is the third harmonic, which is found at -28.6 dBc.

# Show me the code!

/*
Fast sin(2*pi*x) approximation.
The caller must make sure that the argument lies
within [0..1).

Note: this can be optimized further by using Horner's
rule for evaluating polynomials, see:

http://en.wikipedia.org/wiki/Horner%27s_method

Omitted here for clarity.
*/

float fast_sin(float x)
{
if (x < 0.5f)
{
return -16.0f*x*x + 8.0f*x;
}
else
{
return 16.0f*x*x - 24.0f*x + 8.0f;
}
};


# Conclusion

This may not be the world’s best sin approximation, but it sure is simple. Next time we’ll look at a more accurate polynomial-based sin approximation.