Arduino/AVR and ZX Spectrum: sin() routines.

Introduction

The previous posts on sin() approximations has made me wonder which methods are implemented on 8-bit processors. So I took a look at both the AVR C library for the 8-bit megaAVR (used in the Arduino Uno) and the ZX Spectrum (an early Z80-based 8-bit computer) ROM disassembly.

Sin(x): The AVR C lib version

After downloading the most recent sources from the Subversion repository, I started going through the directory tree to get a sense of what can be found where. The files of interest are located in the avr-libc/libm/fplib/ directory; all of them are written in AVR assembly language.

The first routine to look at is sin which can be found in sin.S (gee, what gave it away?). This routine calls __fp_rempio2, which changes the argument so it is within the right range for calling the actual sin() calculating function, __fp_sinus.

The __fp_sinus function does some more domain manipulation after which it starts setting up pointers to a table holding six coefficients:

Coefficient Value
c_0 1.0000000000
c_1 -0.1666666664
c_2 0.0083333315
c_3 -0.0001984090
c_4 0.0000027526
c_5 -0.0000000239

After the setup code, __fp_powsodd is called, which evaluates the following polynomial with odd powers only:

P(x) = c_0\cdot x + c_1\cdot x^3 + c_2\cdot x^5 + c_3\cdot x^7 + c_4\cdot x^9 + c_5\cdot x^{11}.

The careful observer will have noticed that the coefficients are those of the Taylor series for \sin(x).

Performance

The maximum error of the Taylor approximation is determined by the first truncated term, which is then \frac{(2\pi \cdot 0.25)^{13}}{13!} \approx 5.6922 \cdot 10^{-8}.

Summarizing, the AVR lib C uses an 11th-order polynomial based on the Taylor series and domain symmetry to calculate sin(x).

Sin(x): Z80 / ZX Spectrum version

According to the Complete Spectrum ROM Disassembly the main part of the sin() function is the series generator routine. This routine evaluates Chebyshev polynomials up to a specified order. The calculation of sin() is based on the first six polynomials:

T_0(x) = 1
T_1(x) = x
T_2(x) = 2x^2 - 1
T_3(x) = 4x^3 -3x
T_4(x) = 8x^4 -8x^2 +1
T_5(x) = 16x^5 -20x^3 +5x

These polynomials are weighted using the following coefficients:

Coefficient Value
c_0 1.276278962
c_1 -.142630785
c_2 0.004559008
c_3 -.000068294
c_4 0.000000592
c_5 -.000000003

The approximation polynomial is equal to:

P(z) = c_0 \cdot T_0(z) + 2c_1 \cdot T_1(z) + 2c_2 \cdot T_2(z) + 2c_3 \cdot T_3(z) + 2c_4 \cdot T_4(z) + \ldots
\qquad + 2c_5 \cdot T_5(z)

,where z=2w^2-1 and w=4x is between -1 and 1.

Finally, the approximation is calculated by \sin(2\pi\cdot x) \approx P(z)\cdot w.

The interesting thing about the ZX Spectrum code is that the Chebyshev polynomials T_0 \ldots T_5 are not evaluated one at a time, but that the entire calculation of P(z) is done in one go. This code is almost as efficient as evaluating one polynomial!

Performance

The performance of this approximation is very good, as shown by the following graphs:

The maximum error is approximately -2\cdot 10^{-9}, which is better than the AVR version! Not bad for an oldskool machine!

Conclusion

Sometimes older is better!

Additional information

BOCHS, the IA-32 CPU emulator, uses a 17th-order Taylor polynomial to calculate sin().

Advertisements

3 thoughts on “Arduino/AVR and ZX Spectrum: sin() routines.

  1. “the Chebyshev polynomials T_0 \ldots T_5 are not evaluated one at a time, but that the entire calculation of P(z) is done in one go.”

    How?

    • Something like this I presume:

      #define T0(x) { 1 }
      #define T1(x) { x }
      #define T2(x) { 2 * x*x – 1 }
      #define T3(x) { 4 * x*x*x – 3 * x }
      #define T4(x) { 8 * x*x*x*x – 8 * x*x + 1 }
      #define T5(x) { 16 * x*x*x*x*x – 20 * x*x*x + 5 * x }

      #define C0 1.276278962
      #define C1 (-0.142630785)
      #define C2 0.004559008
      #define C3 (-0.000068294)
      #define C4 0.000000592
      #define C5 (-0.000000003)

      #define P(z) { C0 * T0(z) + 2 * C1 * T1(z) + 2 * C2 * T2(z) + 2 * C3 * T3(z) + 2 * C4 * T4(z) + 2 * C5 * T5(z) }

      x*x and x*x*x occurs two times each so the compiler might be able to cache the results of those while calculating P. You could also modify the macros to make sure the results of those calculation are always cached (create temporary variables for x*x and x*x*x).

  2. ZX sine | Arkeoblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s