Cordic-based Sine Computer

Introduction

On this page we will present the design of a sine and cosine computer.

Specification

We will assume that the input angle is represented in radians, and that it is in the range between -π/2 and π/2. Other angles can be handled by adding a factor π first (modulo 2π) and changing the sign of the sine and cosine result.

The floating point numbers are represented as integers. Technically, this means rounding them to a fixed number of bits, and then multiplying them with the factor 2**F, where F is the number of bits after the point.

Both the sine and the cosine of the input angle will be computed. The interface of the module looks as follows:

def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset):

    """ Sine and cosine computer.

    This module computes the sine and cosine of an input angle. The
    floating point numbers are represented as integers by scaling them
    up with a factor corresponding to the number of bits after the point.

    Ports:
    -----
    cos_z0: cosine of the input angle
    sin_z0: sine of the input angle
    done: output flag indicating completion of the computation
    z0: input angle; -pi/2 <= z0 <= pi/2
    start: input that starts the computation on a posedge
    clock: clock input
    reset: reset input

    """

Unit test

We will first write a unit test for the design. The idea is to use the cos and sin functions from the math module to compute the expected results on a number of input angles, and to compare them with the outputs from the design under test. Here is the code:

from math import pi, sin, cos, log
import random

from myhdl import *

from SineComputer import SineComputer, SineComputer_v


def bench(fractionSize, errorMargin, nrTests=100):

    """ Test bench for SineComputer.

    fractionSize: number of bits after the point
    errorMargin: margin for rounding errors on result
    nrTests: number of tests vectors

    """

    # scaling factor to represent floats as integers
    M = 2**fractionSize

    # maximum angle
    ZMAX = int(round(M*pi/2))

    # error margin shorthand
    D = errorMargin

    # signals
    cos_z0 = Signal(intbv(0, min=-D, max=M+D))
    sin_z0 = Signal(intbv(0, min=-M-D, max=M+D))
    z0 = Signal(intbv(0, min=-ZMAX, max=ZMAX+1))
    done = Signal(False)
    start = Signal(False)
    clock = Signal(bool(0))
    reset = Signal(True)

    # design under test
    dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)

    # clock generator
    @always(delay(10))
    def clockgen():
        clock.next = not clock

    # test vector setup
    testAngles = [-pi/2, -pi/4, 0.0, pi/4, pi/2]
    testAngles.extend([random.uniform(-pi/2, pi/2) for i in range(nrTests)])

    # actual test 
    @instance
    def check():
        yield clock.negedge
        reset.next = False
        for z in testAngles:
            yield clock.negedge
            z0.next = int(round(M*z))
            start.next = True
            yield clock.negedge
            start.next = False
            yield done.posedge
            exp_cos_z0 = int(round(cos(z)*M))
            exp_sin_z0 = int(round(sin(z)*M))
            assert abs(cos_z0 - exp_cos_z0) < D
            assert abs(sin_z0 - exp_sin_z0) < D

        raise StopSimulation

    return dut, clockgen, check

As the design will perform its calculations in finite precision, there will be rounding errors. Therefore, the comparisons between expected and actual results are performed using an error margin. We may want to use the test bench for designs with different characteristics; therefore, the error margin is made a parameter.

The precision is specified in terms of the number of bits after the point, using the parameter fractionSize.

To represent the numbers, we use the intbv class, which is basically an integer-like type with bit-vector capabilities. Note that we constrain the intbv instances by specifying the range of valid integer values, not by a bit width. For high-level, algorithmic work, this is much easier. Moreover, it enables fine-grained range error checking at run-time.

By filling in the parameters of the bench function, we can construct an actual test bench that runs a simulation, as follows:

def test_bench():
    fractionSize = 18
    errorMargin = fractionSize
    tb = bench(fractionSize, errorMargin)
    sim = Simulation(tb)
    sim.run()

A function with a name starting with test_ is automatically recognized and run by a unit testing framework such as py.test.

Algorithm

To implement the design, we will use the Cordic algorithm, a very popular algorithm to compute trigonometric functions in hardware. On this page, we are mainly interested in the mechanical characteristics of the algorithm and their hardware implications. For more information and background on the algorithm itself, please consult other sources, such as this paper by Ray Andraka.

The Cordic algorithm is an iterative algorithm based on vector rotations over elementary angles. The algorithm normally operates in one of two modes. In rotation mode, it rotates a vector (x₀, y₀) in the Cartesian plane over an input angle z₀. The Cordic equations for this mode are:

x_{i +1} = x_i - y_id_i2^-i
y_{i +1} = y_i - x_id_i2^-i
z_{i +1} = y_i - d_itan^-i(2^-i)

where

d_i = -1 if z_i < 0, else +1.

These equations can be implemented with relatively simple hardware. This is the characteristic that makes the Cordic algorithm attractive. In particular, multiplications with a factor 2^-i are simply shift-right operations. Also, the arctangent values tan^-i(2^-i) can be precomputed and stored in a small look-up table.

The Cordic equations can be used for a variety of computations. For our purposes, it can be shown that

x_n = cos z₀
y_n = sin z₀

for the following initial conditions:

x₀ = 1 / A_n
y₀ = 0

where

A_n = Product[ sqrt(1 + 2^-2i) ] with i = 0 ... n-1

Design

The Cordic algorithm can be implemented in many ways, with various characteristics and advantages. On this page, we will implement a parallel, iterative processor, which is a fairly straightforward mapping of the equations into a bit-parallel data path and a state machine.

from math import atan, sqrt, ceil, floor, pi

from myhdl import *

t_State = enum("WAITING", "CALCULATING")

def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset):

    """ Sine and cosine computer.

    This module computes the sine and cosine of an input angle. The
    floating point numbers are represented as integers by scaling them
    up with a factor corresponding to the number of bits after the point.

    Ports:
    -----
    cos_z0: cosine of the input angle
    sin_z0: sine of the input angle
    done: output flag indicated completion of the computation
    z0: input angle
    start: input that starts the computation on a posedge
    clock: clock input
    reset: reset input

    """

    # angle input bit width
    W = len(z0)

    # angle input z0 represents number between -pi/2 and pi/2
    # scaling factor corresponds to the nr of bits after the point
    M = 2 ** (W-2)

    # nr of iterations equals nr of significant input bits
    N = W-1

    # calculate X0
    An = 1.0
    for i in range(N):
        An *= (sqrt(1 + 2**(-2*i)))

    # X0
    X0 = int(round(M*1/An))

    # tuple with elementary angles
    angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])

    # iterative cordic processor
    @instance
    def processor():

        x = intbv(0, min=sin_z0.min, max=sin_z0.max)
        y = intbv(0, min=sin_z0.min, max=sin_z0.max)
        z = intbv(0, min=z0.min, max=z0.max)
        dx = intbv(0, min=sin_z0.min, max=sin_z0.max)
        dy = intbv(0, min=sin_z0.min, max=sin_z0.max)
        dz = intbv(0, min=z0.min, max=z0.max)
        i = intbv(0, min=0, max=N)       
        state = t_State.WAITING

        while True:
            yield clock.posedge, reset.posedge

            if reset:
                state = t_State.WAITING
                cos_z0.next = 1
                sin_z0.next = 0
                done.next = False
                x[:] = 0
                y[:] = 0
                z[:] = 0
                i[:] = 0

            else:
                if state == t_State.WAITING:
                    if start:
                        x[:] = X0
                        y[:] = 0
                        z[:] = z0
                        i[:] = 0
                        done.next = False
                        state = t_State.CALCULATING

                elif state == t_State.CALCULATING:
                    dx[:] = y >> i
                    dy[:] = x >> i
                    dz[:] = angles[int(i)]
                    if (z >= 0):
                        x -= dx
                        y += dy
                        z -= dz
                    else:
                        x += dx
                        y -= dy
                        z += dz
                    if i == N-1:
                        cos_z0.next = x
                        sin_z0.next = y
                        state = t_State.WAITING
                        done.next = True
                    else:
                        i += 1

    return processor

The actual computation is done by the processor generator. Note that outside the generator function, we calculate some data such as the X0 constant, and the look-up table of elementary arctangents, represented by the angles tuple.

The internal number variables are represented by intbv instances. The dual nature of this class comes in very handy. On the one hand, we can constrain the instances as integer subtypes by specifying the valid integer range at construction time. On the other hand, we can access their two's complement representation as a bit vector, for example for slicing or right-shifting.

It seems obvious that a type that unifies the integer and the bit vector views should be very useful for hardware design. One would therefore expect a similar feature in other HDLs. However, I believe that it is actually a unique capability offered by MyHDL. Other HDLs seem to try to solve the issues by creating more and more integer and bit-vector like types. In MyHDL, a single type does it all - the intbv class.

py.test confirms that this is a valid impementation:

> py.test
testing-mode: inprocess
executable:   /usr/local/bin/python  (2.4.2-final-0)
using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown>

test_SineComputer.py[1] .

Automatic conversion to Verilog

Now that we have a working design (and only now!) we can attempt to convert it to Verilog automatically. In MyHDL, this is done by using the toVerilog function. For example, consider the instantiation of the design under test in the test bench:

    # design under test
    dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)

To convert the design instance to Verilog, this line can be replaced by the following:

    # design under test
    dut = toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)

As before, the dut object is a simulatable design instance, but as a side effect of the instantiation, an equivalent Verilog module file will be generated. The Verilog output is as follows:

module SineComputer (
    cos_z0,
    sin_z0,
    done,
    z0,
    start,
    clock,
    reset
);

output signed [19:0] cos_z0;
reg signed [19:0] cos_z0;
output signed [19:0] sin_z0;
reg signed [19:0] sin_z0;
output done;
reg done;
input signed [19:0] z0;
input start;
input clock;
input reset;



always @(posedge clock or posedge reset) begin: _SineComputer_processor
    reg [5-1:0] i;
    reg [1-1:0] state;
    reg signed [20-1:0] dz;
    reg signed [20-1:0] dx;
    reg signed [20-1:0] dy;
    reg signed [20-1:0] y;
    reg signed [20-1:0] x;
    reg signed [20-1:0] z;
    if (reset) begin
        state = 1'b0;
        cos_z0 <= 1;
        sin_z0 <= 0;
        done <= 0;
        x = 0;
        y = 0;
        z = 0;
        i = 0;
    end
    else begin
        // synthesis parallel_case full_case
        casez (state)
            1'b0: begin
                if (start) begin
                    x = 159188;
                    y = 0;
                    z = z0;
                    i = 0;
                    done <= 0;
                    state = 1'b1;
                end
            end
            1'b1: begin
                dx = $signed(y >>> $signed({1'b0, i}));
                dy = $signed(x >>> $signed({1'b0, i}));
                // synthesis parallel_case full_case
                case (i)
                    0: dz = 205887;
                    1: dz = 121542;
                    2: dz = 64220;
                    3: dz = 32599;
                    4: dz = 16363;
                    5: dz = 8189;
                    6: dz = 4096;
                    7: dz = 2048;
                    8: dz = 1024;
                    9: dz = 512;
                    10: dz = 256;
                    11: dz = 128;
                    12: dz = 64;
                    13: dz = 32;
                    14: dz = 16;
                    15: dz = 8;
                    16: dz = 4;
                    17: dz = 2;
                    default: dz = 1;
                endcase
                if ((z >= 0)) begin
                    x = x - dx;
                    y = y + dy;
                    z = z - dz;
                end
                else begin
                    x = x + dx;
                    y = y - dy;
                    z = z + dz;
                end
                if ((i == (19 - 1))) begin
                    cos_z0 <= x;
                    sin_z0 <= y;
                    state = 1'b0;
                    done <= 1;
                end
                else begin
                    i = i + 1;
                end
            end
        endcase
    end
end

endmodule

A discussion about some convertor features

With this example, some interesting features of the Verilog convertor can be illustrated.

Taking advantage of the elaboration phase

It is important to realize that the conversion occurs on a design instance. This means that the code has already been elaborated by the Python interpreter. Therefore, the convertor works on the simulatable data structure, which is a (hierarchical) list of generators. This means that only the source code of generator functions is converted. The pleasant consequence is that the restrictions of the "convertible subset" apply only to the code inside generator functions, not to any code outside them.

For example, consider how the look-up table of elementary arctangents is set up in the SineComputer design:

    # tuple with elementary angles
    angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])

This line uses things like a list comprehension and a call to the trigonometric function atan from the math library, At this point, this is beyond the scope of the convertible subset, and it may stay like that forever. But as this code is outside a generator function, it doesn't matter.

Inside the processor function, the lookup-up table is used as follows:

     z[:] = angles[int(i)]

This is just fine for the convertor. Note how this single-line lookup is expanded into a case statement in the Verilog output.

Note that we have been talking about the convertible subset, and not about the "synthesizable subset". The reason is that the convertible subset is much less restrictive. You can find more information about the convertible subset in the MyHDL manual.

Obviously, MyHDL code intended for synthesis also has to take synthesis-related restrictions into account. But again, these restrictions only apply to the code inside generator functions, because only that code is actually converted.

The described behavior is a unique feature of the MyHDL design flow. Outside generator functions, you can use Python's full power to describe designs. As long as the code inside them obeys the constraints of the convertible subset, the design instance can always be converted to Verilog. And if that code also obeys the constraints of the synthesizable subset, the result will be synthesizable Verilog.

Handling negative numbers

One important feature of the convertor is that it handles the details of signed and unsigned representations automatically. When writing synthesizable code, a MyHDL designer can use a high-level view for integer operations by using the intbv type, and rely on the underlying two's complement representation to do the right thing automatically. In contrast, a Verilog designer is forced to deal with low-level representational issues explicitly. This can become very tricky, especially with negative numbers and the signed representation.

First of all, note in the Verilog output that the convertor infers which variables have to be declared as signed. This is the easy part.

Now consider the following line in the MyHDL code of SineComputer:

    dx[:] = y >> i

I believe it's quite clear what this is supposed to do. With the underlying two's complement representation, it works for positive and negative values of y.

Now consider how this has been translated into Verilog:

dx = $signed(y >>> $signed({1'b0, i}));

The convertor has to deal with several potential pitfalls. The fundamental problem is that Verilog uses an unsigned interpretation by default, which is the opposite from what you should do to get the naturally expected results.

First, the convertor uses arithmetic shift (>>>) instead of bit-wise shift (>>), to have the sign bit shifted in instead of a zero.

Secondly, the second (unsigned) operand is typecasted to $signed, after adding a sign bit. Otherwise, Verilog will interprete all operands in a mixed expression as unsigned. Having said that, the Verilog literature seems to indicate that a shift operation is an exception to this rule. But the convertor doesn't take risks and inserts the typecast as a general measure. It may be redundant in this case.

Finally, the whole expression is typecasted to $signed. Actually, I would have expected that this typecast would not be necessary - after all, we are shifting a signed number. However, my current simulator (Cver) tells me that it is. It may be wrong, I don't know. Anyway, the cast is an additional safety net.

The message you should get from this discussion is the following: working with the signed representation in Verilog is tricky. I believe that writing the code in a natural, high-level way in MyHDL, and letting the convertor take care of the low-level representation issues, is a better option. (Of course, we still need to make sure that the convertor gets it right, which is hard enough.)

Verilog co-simulation

Clearly we will want to verify that the Verilog output from the convertor is correct. For this purpose, MyHDL supports co-simulation with Verilog.

To set up a co-simulation, we need to create a Cosimulation object for the Verilog design. The Verilog convertor makes this task easier. In addition to the Verilog code for the design itself, it also generates a Verilog test bench stub that defines an interface between the Verilog design and a Cosimulation object.

The following function creates a Cosimulation object for our design:

def SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset):
    toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)
    cmd = "cver -q +loadvpi=myhdl_vpi:vpi_compat_bootstrap " + \
          "SineComputer.v tb_SineComputer.v"
    return Cosimulation(cmd, **locals())

We start by doing the Verilog conversion itself first. Then, we define the command to start up the Verilog simulator. This is of course simulator-specific. The command shown is for the open-source Cver simulator. It loads a vpi module that defines the interface between the MyHDL simulator and the Verilog simulator. Also, both the Verilog code for the design and the test bench stub are compiled.

The Cosimulation object is then constructed with the command as its first parameter, followed by a number of keyword arguments. The keyword arguments make the link between signal names declared in the Verilog test bench stub and signals in the MyHDL code. When the signal names in MyHDL and Verilog are identical we can use a little trick and simply pass the local namespace dictionary locals() to the constructor.

In the test bench code, we replace the instantiation of the MyHDL module with this function:

    dut = SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset)

When running py.test again, we now run a co-simulation between the MyHDL test bench and the Verilog code:

testing-mode: inprocess
executable:   /usr/local/bin/python  (2.4.2-final-0)
using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown>

test_SineComputer.py[1] Copyright (c) 1991-2005 Pragmatic C Software Corp.
  All Rights reserved.  Licensed under the GNU General Public License (GPL).
  See the 'COPYING' file for details.  NO WARRANTY provided.

So the test bench confirms that the Verilog code is correct.

Historical note When this example was developed with MyHDL 0.5, this test failed, showing that the convertor had bugs. It turned out that the handling of signed variables with shift operations had not been implemented and tested properly. Therefore, this example has been the trigger to fix these bugs and develop MyHDL 0.5.1.

Implementation

To confirm synthesizablity, this example was synthesized with Xilinx ISE and targeted to a Spartan FPGA. For detailed information, you can review the synthesis report.