## Introduction

On this page we will present the design of a sine and cosine computer.

## Specification

We will assume that the input angle is represented in radians, and that it is in the range between `-π/2` and `π/2`. Other angles can be handled by adding a factor `π` first (modulo `2π`) and changing the sign of the sine and cosine result.

The floating point numbers are represented as integers. Technically, this means rounding them to a fixed number of bits, and then multiplying them with the factor `2**F`, where `F` is the number of bits after the point.

Both the sine and the cosine of the input angle will be computed. The interface of the module looks as follows:

```def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset):

""" Sine and cosine computer.

This module computes the sine and cosine of an input angle. The
floating point numbers are represented as integers by scaling them
up with a factor corresponding to the number of bits after the point.

Ports:
-----
cos_z0: cosine of the input angle
sin_z0: sine of the input angle
done: output flag indicating completion of the computation
z0: input angle; -pi/2 <= z0 <= pi/2
start: input that starts the computation on a posedge
clock: clock input
reset: reset input

"""
```

## Unit test

We will first write a unit test for the design. The idea is to use the `cos` and `sin` functions from the `math` module to compute the expected results on a number of input angles, and to compare them with the outputs from the design under test. Here is the code:

```from math import pi, sin, cos, log
import random

from myhdl import *

from SineComputer import SineComputer, SineComputer_v

def bench(fractionSize, errorMargin, nrTests=100):

""" Test bench for SineComputer.

fractionSize: number of bits after the point
errorMargin: margin for rounding errors on result
nrTests: number of tests vectors

"""

# scaling factor to represent floats as integers
M = 2**fractionSize

# maximum angle
ZMAX = int(round(M*pi/2))

# error margin shorthand
D = errorMargin

# signals
cos_z0 = Signal(intbv(0, min=-D, max=M+D))
sin_z0 = Signal(intbv(0, min=-M-D, max=M+D))
z0 = Signal(intbv(0, min=-ZMAX, max=ZMAX+1))
done = Signal(False)
start = Signal(False)
clock = Signal(bool(0))
reset = Signal(True)

# design under test
dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)

# clock generator
@always(delay(10))
def clockgen():
clock.next = not clock

# test vector setup
testAngles = [-pi/2, -pi/4, 0.0, pi/4, pi/2]
testAngles.extend([random.uniform(-pi/2, pi/2) for i in range(nrTests)])

# actual test
@instance
def check():
yield clock.negedge
reset.next = False
for z in testAngles:
yield clock.negedge
z0.next = int(round(M*z))
start.next = True
yield clock.negedge
start.next = False
yield done.posedge
exp_cos_z0 = int(round(cos(z)*M))
exp_sin_z0 = int(round(sin(z)*M))
assert abs(cos_z0 - exp_cos_z0) < D
assert abs(sin_z0 - exp_sin_z0) < D

raise StopSimulation

return dut, clockgen, check
```

As the design will perform its calculations in finite precision, there will be rounding errors. Therefore, the comparisons between expected and actual results are performed using an error margin. We may want to use the test bench for designs with different characteristics; therefore, the error margin is made a parameter.

The precision is specified in terms of the number of bits after the point, using the parameter `fractionSize`.

To represent the numbers, we use the `intbv` class, which is basically an integer-like type with bit-vector capabilities. Note that we constrain the `intbv` instances by specifying the range of valid integer values, not by a bit width. For high-level, algorithmic work, this is much easier. Moreover, it enables fine-grained range error checking at run-time.

By filling in the parameters of the `bench` function, we can construct an actual test bench that runs a simulation, as follows:

```def test_bench():
fractionSize = 18
errorMargin = fractionSize
tb = bench(fractionSize, errorMargin)
sim = Simulation(tb)
sim.run()
```

A function with a name starting with `test_` is automatically recognized and run by a unit testing framework such as `py.test`.

## Algorithm

To implement the design, we will use the Cordic algorithm, a very popular algorithm to compute trigonometric functions in hardware. On this page, we are mainly interested in the mechanical characteristics of the algorithm and their hardware implications. For more information and background on the algorithm itself, please consult other sources, such as this paper by Ray Andraka.

The Cordic algorithm is an iterative algorithm based on vector rotations over elementary angles. The algorithm normally operates in one of two modes. In rotation mode, it rotates a vector (x0, y0) in the Cartesian plane over an input angle z0. The Cordic equations for this mode are:

xi +1 = xi - yidi2-i
yi +1 = yi - xidi2-i
zi +1 = yi - ditan-i(2-i)

where

di = -1 if zi < 0, else +1.

These equations can be implemented with relatively simple hardware. This is the characteristic that makes the Cordic algorithm attractive. In particular, multiplications with a factor 2-i are simply shift-right operations. Also, the arctangent values tan-i(2-i) can be precomputed and stored in a small look-up table.

The Cordic equations can be used for a variety of computations. For our purposes, it can be shown that

xn = cos z0
yn = sin z0

for the following initial conditions:

x0 = 1 / An
y0 = 0

where

An = Product[ sqrt(1 + 2-2i) ] with i = 0 ... n-1

## Design

The Cordic algorithm can be implemented in many ways, with various characteristics and advantages. On this page, we will implement a parallel, iterative processor, which is a fairly straightforward mapping of the equations into a bit-parallel data path and a state machine.

```from math import atan, sqrt, ceil, floor, pi

from myhdl import *

t_State = enum("WAITING", "CALCULATING")

def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset):

""" Sine and cosine computer.

This module computes the sine and cosine of an input angle. The
floating point numbers are represented as integers by scaling them
up with a factor corresponding to the number of bits after the point.

Ports:
-----
cos_z0: cosine of the input angle
sin_z0: sine of the input angle
done: output flag indicated completion of the computation
z0: input angle
start: input that starts the computation on a posedge
clock: clock input
reset: reset input

"""

# angle input bit width
W = len(z0)

# angle input z0 represents number between -pi/2 and pi/2
# scaling factor corresponds to the nr of bits after the point
M = 2 ** (W-2)

# nr of iterations equals nr of significant input bits
N = W-1

# calculate X0
An = 1.0
for i in range(N):
An *= (sqrt(1 + 2**(-2*i)))

# X0
X0 = int(round(M*1/An))

# tuple with elementary angles
angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])

# iterative cordic processor
@instance
def processor():

x = intbv(0, min=sin_z0.min, max=sin_z0.max)
y = intbv(0, min=sin_z0.min, max=sin_z0.max)
z = intbv(0, min=z0.min, max=z0.max)
dx = intbv(0, min=sin_z0.min, max=sin_z0.max)
dy = intbv(0, min=sin_z0.min, max=sin_z0.max)
dz = intbv(0, min=z0.min, max=z0.max)
i = intbv(0, min=0, max=N)
state = t_State.WAITING

while True:
yield clock.posedge, reset.posedge

if reset:
state = t_State.WAITING
cos_z0.next = 1
sin_z0.next = 0
done.next = False
x[:] = 0
y[:] = 0
z[:] = 0
i[:] = 0

else:
if state == t_State.WAITING:
if start:
x[:] = X0
y[:] = 0
z[:] = z0
i[:] = 0
done.next = False
state = t_State.CALCULATING

elif state == t_State.CALCULATING:
dx[:] = y >> i
dy[:] = x >> i
dz[:] = angles[int(i)]
if (z >= 0):
x -= dx
y += dy
z -= dz
else:
x += dx
y -= dy
z += dz
if i == N-1:
cos_z0.next = x
sin_z0.next = y
state = t_State.WAITING
done.next = True
else:
i += 1

return processor
```

The actual computation is done by the `processor` generator. Note that outside the generator function, we calculate some data such as the `X0` constant, and the look-up table of elementary arctangents, represented by the `angles` tuple.

The internal number variables are represented by `intbv` instances. The dual nature of this class comes in very handy. On the one hand, we can constrain the instances as integer subtypes by specifying the valid integer range at construction time. On the other hand, we can access their two's complement representation as a bit vector, for example for slicing or right-shifting.

It seems obvious that a type that unifies the integer and the bit vector views should be very useful for hardware design. One would therefore expect a similar feature in other HDLs. However, I believe that it is actually a unique capability offered by MyHDL. Other HDLs seem to try to solve the issues by creating more and more integer and bit-vector like types. In MyHDL, a single type does it all - the `intbv` class.

`py.test` confirms that this is a valid impementation:

```> py.test
testing-mode: inprocess
executable:   /usr/local/bin/python  (2.4.2-final-0)
using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown>

test_SineComputer.py[1] .
```

## Automatic conversion to Verilog

Now that we have a working design (and only now!) we can attempt to convert it to Verilog automatically. In MyHDL, this is done by using the `toVerilog` function. For example, consider the instantiation of the design under test in the test bench:

```    # design under test
dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)
```

To convert the design instance to Verilog, this line can be replaced by the following:

```    # design under test
dut = toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)
```

As before, the `dut` object is a simulatable design instance, but as a side effect of the instantiation, an equivalent Verilog module file will be generated. The Verilog output is as follows:

```module SineComputer (
cos_z0,
sin_z0,
done,
z0,
start,
clock,
reset
);

output signed [19:0] cos_z0;
reg signed [19:0] cos_z0;
output signed [19:0] sin_z0;
reg signed [19:0] sin_z0;
output done;
reg done;
input signed [19:0] z0;
input start;
input clock;
input reset;

always @(posedge clock or posedge reset) begin: _SineComputer_processor
reg [5-1:0] i;
reg [1-1:0] state;
reg signed [20-1:0] dz;
reg signed [20-1:0] dx;
reg signed [20-1:0] dy;
reg signed [20-1:0] y;
reg signed [20-1:0] x;
reg signed [20-1:0] z;
if (reset) begin
state = 1'b0;
cos_z0 <= 1;
sin_z0 <= 0;
done <= 0;
x = 0;
y = 0;
z = 0;
i = 0;
end
else begin
// synthesis parallel_case full_case
casez (state)
1'b0: begin
if (start) begin
x = 159188;
y = 0;
z = z0;
i = 0;
done <= 0;
state = 1'b1;
end
end
1'b1: begin
dx = \$signed(y >>> \$signed({1'b0, i}));
dy = \$signed(x >>> \$signed({1'b0, i}));
// synthesis parallel_case full_case
case (i)
0: dz = 205887;
1: dz = 121542;
2: dz = 64220;
3: dz = 32599;
4: dz = 16363;
5: dz = 8189;
6: dz = 4096;
7: dz = 2048;
8: dz = 1024;
9: dz = 512;
10: dz = 256;
11: dz = 128;
12: dz = 64;
13: dz = 32;
14: dz = 16;
15: dz = 8;
16: dz = 4;
17: dz = 2;
default: dz = 1;
endcase
if ((z >= 0)) begin
x = x - dx;
y = y + dy;
z = z - dz;
end
else begin
x = x + dx;
y = y - dy;
z = z + dz;
end
if ((i == (19 - 1))) begin
cos_z0 <= x;
sin_z0 <= y;
state = 1'b0;
done <= 1;
end
else begin
i = i + 1;
end
end
endcase
end
end

endmodule
```

## A discussion about some convertor features

With this example, some interesting features of the Verilog convertor can be illustrated.

### Taking advantage of the elaboration phase

It is important to realize that the conversion occurs on a design instance. This means that the code has already been elaborated by the Python interpreter. Therefore, the convertor works on the simulatable data structure, which is a (hierarchical) list of generators. This means that only the source code of generator functions is converted. The pleasant consequence is that the restrictions of the "convertible subset" apply only to the code inside generator functions, not to any code outside them.

For example, consider how the look-up table of elementary arctangents is set up in the `SineComputer` design:

```    # tuple with elementary angles
angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])
```

This line uses things like a list comprehension and a call to the trigonometric function `atan` from the `math` library, At this point, this is beyond the scope of the convertible subset, and it may stay like that forever. But as this code is outside a generator function, it doesn't matter.

Inside the `processor` function, the lookup-up table is used as follows:

```     z[:] = angles[int(i)]
```

This is just fine for the convertor. Note how this single-line lookup is expanded into a case statement in the Verilog output.

Note that we have been talking about the convertible subset, and not about the "synthesizable subset". The reason is that the convertible subset is much less restrictive. You can find more information about the convertible subset in the MyHDL manual.

Obviously, MyHDL code intended for synthesis also has to take synthesis-related restrictions into account. But again, these restrictions only apply to the code inside generator functions, because only that code is actually converted.

The described behavior is a unique feature of the MyHDL design flow. Outside generator functions, you can use Python's full power to describe designs. As long as the code inside them obeys the constraints of the convertible subset, the design instance can always be converted to Verilog. And if that code also obeys the constraints of the synthesizable subset, the result will be synthesizable Verilog.

### Handling negative numbers

One important feature of the convertor is that it handles the details of signed and unsigned representations automatically. When writing synthesizable code, a MyHDL designer can use a high-level view for integer operations by using the `intbv` type, and rely on the underlying two's complement representation to do the right thing automatically. In contrast, a Verilog designer is forced to deal with low-level representational issues explicitly. This can become very tricky, especially with negative numbers and the signed representation.

First of all, note in the Verilog output that the convertor infers which variables have to be declared as `signed`. This is the easy part.

Now consider the following line in the MyHDL code of `SineComputer`:

```    dx[:] = y >> i
```

I believe it's quite clear what this is supposed to do. With the underlying two's complement representation, it works for positive and negative values of `y`.

Now consider how this has been translated into Verilog:

```dx = \$signed(y >>> \$signed({1'b0, i}));
```

The convertor has to deal with several potential pitfalls. The fundamental problem is that Verilog uses an unsigned interpretation by default, which is the opposite from what you should do to get the naturally expected results.

First, the convertor uses arithmetic shift (`>>>`) instead of bit-wise shift (`>>`), to have the sign bit shifted in instead of a zero.

Secondly, the second (unsigned) operand is typecasted to `\$signed`, after adding a sign bit. Otherwise, Verilog will interprete all operands in a mixed expression as unsigned. Having said that, the Verilog literature seems to indicate that a shift operation is an exception to this rule. But the convertor doesn't take risks and inserts the typecast as a general measure. It may be redundant in this case.

Finally, the whole expression is typecasted to `\$signed`. Actually, I would have expected that this typecast would not be necessary - after all, we are shifting a signed number. However, my current simulator (Cver) tells me that it is. It may be wrong, I don't know. Anyway, the cast is an additional safety net.

The message you should get from this discussion is the following: working with the signed representation in Verilog is tricky. I believe that writing the code in a natural, high-level way in MyHDL, and letting the convertor take care of the low-level representation issues, is a better option. (Of course, we still need to make sure that the convertor gets it right, which is hard enough.)

## Verilog co-simulation

Clearly we will want to verify that the Verilog output from the convertor is correct. For this purpose, MyHDL supports co-simulation with Verilog.

To set up a co-simulation, we need to create a Cosimulation object for the Verilog design. The Verilog convertor makes this task easier. In addition to the Verilog code for the design itself, it also generates a Verilog test bench stub that defines an interface between the Verilog design and a Cosimulation object.

The following function creates a Cosimulation object for our design:

```def SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset):
toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)
cmd = "cver -q +loadvpi=myhdl_vpi:vpi_compat_bootstrap " + \
"SineComputer.v tb_SineComputer.v"
return Cosimulation(cmd, **locals())
```

We start by doing the Verilog conversion itself first. Then, we define the command to start up the Verilog simulator. This is of course simulator-specific. The command shown is for the open-source Cver simulator. It loads a vpi module that defines the interface between the MyHDL simulator and the Verilog simulator. Also, both the Verilog code for the design and the test bench stub are compiled.

The Cosimulation object is then constructed with the command as its first parameter, followed by a number of keyword arguments. The keyword arguments make the link between signal names declared in the Verilog test bench stub and signals in the MyHDL code. When the signal names in MyHDL and Verilog are identical we can use a little trick and simply pass the local namespace dictionary `locals()` to the constructor.

In the test bench code, we replace the instantiation of the MyHDL module with this function:

```    dut = SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset)
```

When running py.test again, we now run a co-simulation between the MyHDL test bench and the Verilog code:

```testing-mode: inprocess
executable:   /usr/local/bin/python  (2.4.2-final-0)
using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown>

test_SineComputer.py[1] Copyright (c) 1991-2005 Pragmatic C Software Corp.