Citation
Bus interfaces for soc design

Material Information

Title:
Bus interfaces for soc design
Creator:
Kathuria, Charu
Publication Date:
Language:
English
Physical Description:
vii, 114 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Systems on a chip ( lcsh )
Microcomputers -- Buses ( lcsh )
Microcomputers -- Buses ( fast )
Systems on a chip ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 112-114).
General Note:
Department of Electrical Engineering
Statement of Responsibility:
by Charu Kathuria.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
262684231 ( OCLC )
ocn262684231
Classification:
LD1193.E53 2008m K37 ( lcc )

Full Text
BUS INTERFACES FOR SOC DESIGN
by
CHARU KATHURIA
B.E., Prabhu Dayal College of Engineering, 2004
A thesis submitted to the
University of Colorado at Denver
And Health Sciences Center
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering
2007


This thesis for the Master of Science
degree by
CHARU KATHURIA
has been approved
by
Robert D. Grabbe

Date


Charu Kathuria (MS, Electrical Engineering)
Bus Interface for SoC Design
Thesis directed by Associate Professor Robert Grabbe
ABSTRACT
System-on-a-chip or System On Chip (SoC or SOC) technology enables
the integration of all components of a computer or other electronic system
into a single integrated circuit, a single silicon chip. A SoC may include
subsystems such as digital, analog, mixed-signal, and radio-frequency
functions. These functions or a block connects by either a proprietary or
industry-standard bus interface.
SoC technology has helped cut development time while increasing
product functionality, performance and quality. SoC is more cost effective
since it increases the yield of the fabrication and because its packaging is
simpler compared to multi chip solution. However, there is continued need
for more performance, such as fairness, efficiency and response time.
While there are numerous ways to optimize SoC design performance, the
objective of this thesis is to review various bus-architecture interfaces and


their performance that can assist optimization. Bus interface allows the
transfer of data from one subsystem to the other subsystem. To
accomplish these objectives, different types of bus interface techniques
are studied in comparison with WISHBONE architecture, an open source,
public domain bus architecture.
WISHBONE is the most portable and flexible to design and provides good
bandwidth. Its performance is evaluated through simulations to show it s
advantages over other interface techniques.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
Robert D. Grabbe


DEDICATION
I dedicate this thesis to my parents, Mr. Mahesh Kathuria and Mrs. Uma
Kathuria who gave me an appreciation of learning and taught me the
value of perseverance and resolve.
vii


ACKNOWLEDGEMENT
Thanks to my advisor, Robert Grabbe, at the University of Colorado at Denver
and Health Science Center for his contribution and support to my research. None
of this would have been possible without his motivation and his inspiring
assistance. I also wish to thank all the members of my committee for their
valuable participation and insights. I wish to express my gratitude to all my
family and friends for their support and encouragement.


2. Performance Criteria and Limitations
To optimize system performance, the bus arbiter should carefully
decide that how many bus cycles will it take to complete the process.
Various performance criteria considered when optimizing SoC design.
Performance includes fairness, efficiency and response time. Fairness
refers to ensuring that every master gets optimal access to the control
bus. Efficiency refers to ensuring that the control bus is 100% utilized by
not having wait states. Response Time refers to ensuring that the control
bus is used for the minimum time slice.
SoC performance depends on the multiple time slice used by each
master. Lets assume, if there are n periodic events and event i occurs with
period Pt and requires C,. seconds of the control bus to handle each time
slice, then the master can only be handled if schedulability equals
Pi = master is average period between control bus use
C, = master is control bus use in period P(
6


The shorter the time slice then the shorter the response time, which will in-
turn, increases the performance of the SoC.
2.1 LIMITATIONS:
It has been observed that the shared bus when combined with a
synchronous handshake in an asynchronous setup gives rise to timing
constraints. Due to this performance of the SoC degrades. The reliability
of the synchronous handshake can be determined using the mean-time
between failure of the synchronizer used in the interface.
Interfacing asynchronous input to the synchronous logic has always been
a problem. Communication between unlocked clock or multi-clock
domains requires synchronization to avoid metastability but results in
some uncertainty in timing. Metastability is the act of transitioning into a
relatively unstable state of a flip-flop or latch. Reducing metastability
failure without adding latency is more difficult as clock frequencies
continue to increase.
SoC integrates a variety of cores and I/O interfaces, which usually operate
by different clock frequencies. Data transfer is implemented using high-
speed serial communication lines and asynchronous block interfaces,
7


which in themselves represent an IP with flexible interconnect
implementation.
Asynchronous inputs to synchronous digital networks arriving from other
clocked domains or asynchronous events may force the synchronous
circuits flip-flop into a temporary metastable state. In this metastable
state, the flip-flop outputs may linger somewhere between logical high and
low values. The outputs of other digital circuits connected to these
metastable outputs may become unreliable themselves, resulting in state
machines that enter into inappropriate states or combination logic circuits
that have erroneous outputs. Several digital circuit configurations are used
as synchronizers. They are analyzed along with the MTBF model and the
data sheet parameters that are provided by the programmable logic
vendors, as shown in Figure 2.1.
Figure 2.1: A shift register Synchronizer
8


The derivation of the MTBF is straight forward, but the probabilistic
reasoning techniques presented by Jaynes shows a consistent approach
for solving probability problems that provides a deeper insight using
probabilistic reasoning [4], When the data input to the flip-flop is
asynchronous to the clock, it may create set-up and hold time violations
that cause system failure. The industry standard formula for MTBF due to
metastability for a flip/flop is given by: [7]
MTBF = (e(-C2',MEJ))l{C\ *Fc*Fd)
Where:
e = 2.718281828,
tmet = Time-Delay for the metastability to resolve itself,
Fc = Sampling clocking frequency (flip-flops clock frequency),
Fd = frequency of changes in the input data,
Cl = a constant representing the metastability catching setup time
window, and
C2 = a constant describing the speed with which the metastable condition
is resolve.
By calculating the number of failures and the time delay, we can find the
MTFB of the system.
9


Metastability cannot be completely removed but can be decreased by
cascading the flip-flops together as shown in Figure 2.2, which increases
C2. This considerably improves the performance of the system.
Synchronizing
Flip-Flop
Stable State
Figure 2.2: Synchronizing with a Cascaded Flip-Flop
Calculating MTBF using Inductive Reasoning is shown in Appendix A.
The result after calculating is,
MTBF =
/?
F W F
1 d*11 1 clock
The above equation shows a the Mean Time Between Failure (MTBF) of
synchronous circuits having asynchronous inputs derived with the help of
conditional joint probability. To overcome this problem of metastable
failure within systems, synchronizers have adopted this as an interface
10


between the asynchronous input and the synchronous system. The
synchronizer samples the asynchronous input at the rate of the system
clock. The output of the synchronizer is thus synchronous with the rest of
the system. However, the use of a synchronizer does not eliminate the
possibility of a metastable failure, it is only possible to limit its occurrence
to within the synchronizer and thereby minimize its effect on the system.
The synchronizer brings a simple solution to the problem by partially
ordering transitions with respect to a succession of clock signals in order
avoid conflicting between read/write actions. When characterization is
required, it reduces to consideration of two flip-flops that decrease reliance
on restrictive assumptions of flip-flop behavior in the shift register
synchronizer with the same fast sampling and settling times. Thus, when
designing synchronizers, the aim should be to provide an adequate
settling time. It may be possible to improve the performance of the
synchronizer with known techniques, such as using asymmetric threshold
devices interspersed between the appropriate flip-flops. However, the
study of these techniques is beyond the scope of this correspondence.
[13]
11


3. Overview of Bus Arbitration Techniques used in SoC
Designing a SoC involves integrating one or more microprocessor,
DSP units, on-chip bus architecture, memory system, and peripheral
blocks on a single chip. There are many techniques used in SoC design
that aim to provide improved performance and reliability by addressing key
design issues such as power consumption, bandwidth, cost effectiveness,
transmission loss, and cross-talk.
The major bus arbitration techniques used in SoC, which are listed below,
will be discussed in Sections 3.1 -- 3.5
3.1 Priority-based selection
3.2 TDMA Bus
3.3 Round-Robin
3.4 Lottery Bus Architecture
3.5 Dynamic Parallel Fraction Control Bus
3.1 Priority-based selection:
Prior to the development of priority-based selection techniques, bus
control was given only to the master having highest priority, leading to the
12


problem of a low-priority master never being able to gain control of the bus
(unless all higher masters had completely finished). The lower priority
master would then be unable to transfer its data. With priority-based
selection, the highest priority master will get the control of the bus first but
then the control bus is available in a time-slice manner to other masters
with lower priority tasks. To the user, this gives the appearance of
simultaneous access to the bus of multiple functions. However, some
master would receive more bus bandwidth than other due to higher
priority.
Considering the performance criteria, priority-based selection is fairer to
the low priority master. The lower priority master can now gain access to
the bus but only after higher priority tasks are completed. It has a very
good efficiency, as there is no wastage of time-slices. The control bus is
accessible to all masters so response time is uniformly low.
3.2 Time Division Multiplexing Access (TDMA):
TDMA uses a shared bus concept. This allows the gap between one
master, M1, controlling of the bus and another master, M2, taking control
of the bus to be utilized by M2. TDMA allows several masters to use the
same bus cycle by dividing the signal into two halves. The TDMA arbiter
grants the bus access to the master every time-period that increases the
13


bandwidth of the SoC. While this avoids some time latency, the other
master(s) do have to wait for the control bus until the first master
completes the cycle, and so on. TDMA is fair to the lower priority masters.
The improvement in efficiency and response time is directly proportional to
the time-slice.
3.3 Round-Robin (RR) and Weighted Round Robin (WRR):
In RR designs control of the bus is granted in a sequential manner. RR
designs grant the control bus to all masters equally after every time-slice
but does not prioritize this access. This leads to an imbalance in the
bandwidth of the system. RR methods decrease the priority value for each
master and tend to affect the performance of the system. In order to
address this deficit, WRR was developed. In WRR, each master can be
assigned a weight, an integer value that indicates the processing capacity.
Masters with higher weights acquire more connections than those with
lower weights and masters with equal weights acquire equal connections.
[1]
This technique prioritizes the masters and grants bus control first to the
highest priority master. RR is fair in granting access to the control bus but
it does not prioritize the masters. The improvement in efficiency and
response time is directly proportional to the time-slice.
14


3.4 Lottery Bus Architecture (LBA):
Lottery Bus techniques have a lottery manager that assigns the lottery
number to all masters with the request simultaneous access of the control
bus. The Lottery Bus then obtains the probability for all the lottery
numbers and grants access of the control bus to the winning lottery
number, i.e. unique master. Lottery Bus represents a significant
improvement on Priority-based selection, TDMA bus or RR, as it increases
the performance of the SoC by increasing response time. However, it is
very difficult to be consistent with the bandwidth as a control bus depends
on the lottery system. This type of probabilistic distribution will not work
well with competing masters that are all high priority or masters needing
the control bus right away. [T]
3.5 Dynamic Parallel Fraction Control (DPFC):
DPFC Bus consists of Data Pre-fetch Core Interface (DPCI) and
Fraction Control Bus (FCB) shown in Figure 3.1. FCB can be static or
dynamic, i.e. Static Fraction Control Bus (SFCB) or Dynamic Fraction
Control Bus (DFCB). Control and access of the bus will depend on the
higher priority and differences between the actual and final fraction given
to the masters. This increases the performance and the bandwidth of the
system. [1]
15


DPFC Bus removes the disadvantage of failing to utilize the down time
and the bandwidth between one master using the control bus and the
other waiting for the first one to complete. Another disadvantage DPFC
addressed was that of masters not having control over the bandwidth and
time latency. This creates problems with system data corruption and
timing factors.
Data Pre-fetch Core Interface (DPCI) is a bridge between the master and
the shared bus. DPCI consists of a write buffer, a read buffer and a
configuration unit. DPCI stores the data temporarily while various masters
request bus access. As soon as the bus request has been granted, DPCI
transfers the data using the control bus to shared memory. Similarly, when
the control bus is available and the master is ready to accept the data,
DPCI retrieves the data from the read buffer and transfers it to the master.
DPCI systems can be synchronized with the frequency of the master and
the shared memory to optimize system performance.
16


(a)
(b)
Figure 3.1: (a) Dynamic Parallel Fraction Control Bus using DPCI
(b) Inside DPCI
17


4 Bus Interface Architecture
As shown in the previous section the different types of bus
arbitration techniques can be used to increase the performance of the
SoC. Since the cores are designed with different bus arbitration criteria,
there can be inconsistency between their input/output port specifications.
To solve this problem, many companies devise bus interface architectures
that integrate several cores from different core designs. The objective of
this core architecture is to provide a reasonable definition of bus signals
that can be used in the integration of cores. Hence, each of these
architectures has a different set of signals. Following are the three
different types of bus architecture available.
4.1 On-chip Peripheral Bus (OPB)
OPB is a synchronous bus used to connect on-chip peripheral devices
without directly connecting with the processor core. OPB version 2.1 was
developed in 1996 and copyrighted by IBM Corporation. The processor
core can access the slave peripheral devices on this bus through a PLB
(Processor Local Bus) to an OPB Bridge unit, which is a separate core.
Used the OPB document available through IBM and study its functionality
[8]
18


Figure 4.1: On-chip Peripheral Bus Interconnection
A PLB is used to access memory through the bus interface units as PLB is
a high performance bus. Figure 4.1 illustrates the OPB interconnections
and shows how the components connect with each other. PLB has a Data
Cache Unit (DCU) and an Instruction Cache Unit (ICU) as its two masters.
It also has an External Peripheral Controller (EPC) and Memory Controller
(MC) as its two slaves. The OPB bus is a completely synchronous bus,
working independently of a master or a slave. A bridge between the PLB
and OPB enables data transfer by PLB masters to and from OPB slaves.
19
I


A Direct Memory Access (DMA) bus directly links with the PLB and
increases the data transfer rate for data intensive applications. The OPB
master, OBP slaves and internal peripheral devices communicate through
the OPB bus with the PLB master. The Device Control Register (DCR)
bus monitors the control bus status between the PLB and OPB masters
and slaves.
4.1.1 OPB Bus Signals
Master and Slave signals connect directly to the OPB bus logic with
unique addresses. OPB consist of four different signal types:
4.1.1.1 Arbitration Signals
4.1.1.2 Bus Signals
4.1.1.3 Data Transfer Control Signals
4.1.1.4 DMA Peripheral Support Signals
4.1.1.1 Arbitration Signals:
OPB master and slave devices are linked by bus logic, i.e. AND and
OR gates. An OPB arbiter handles every master request on the OPB bus.
When an OPB master raises the demand for the bus to transfer data it
must wait for an approval signal generated by the OPB arbiter. Sometimes
more than one master requests access to the control bus or there may be
pending requests. At this point, all signal requests made of the master are
20


linked together in sequence, i.e. ORed. This helps the control bus to
determine prolonged requests by the master.
DMA transfers data without the use of master address bus or master
select to the OPB. DMA uses the master bus lock signal to possess the
control bus until the setup time and hold time of data transfer is finished.
The OPB Arbiter asserts the OPB master bus grant (OPB_MnGrant)
signal to grant the control bus to the requesting master. If one master has
the access to the bus then OPB Arbiter will lock the bus by using OPB bus
lock (OPB_busLock) and hold until it transfers the data. At this point, no
other master can access the bus until the previous master leaves the bus
control or unlocks the OPB bus lock.
4.1.1.2 Bus Signals:
An OPB Address Bus (OPB_ABus) is to transfer data from a master to
a slave. Masters uses OPB Address Bus to select an exclusive slave
attached to the OPB Bus. The address bus is differentiated by its most
significant bit and least significant bit. The most significant bit address is
defined by bit 0 and least significant bit is defined by bit 31.
21


4.1.1.3 Data Transfer Control Signals:
Data Transfer Control Signals allow a master to discontinue or cancel
an access request prior to transferring the data to a slave by disserting
Mn_select. This causes all the slaves that are receiving the data from the
master to discontinue the transfer and reset their states. In addition, the
master has to leave the bus in order for the other masters to be able to
connect.
When a master is transferring the data to a slave in a sequential manner,
an Mn_seqAddr signal is provided. The slave will receive an
OPB_seqAddr signal, which ensures no other master can interrupt the
signal until the data transfer is complete and there is another transfer in
sequence in the same direction with the same master.
Transfer acknowledgement means that the slave has accepted the data
transfer from the master or placed the data on the control bus during the
write process as well as read process. The slave can also generate an
error acknowledgement for any transfer or received data. An OPB Read
Not Write signal (OPB_RNW) provides to determine whether the master
has to send or receive the data.
22


4.1.1.4 DMA Peripheral Support Signals:
A DMA Peripheral Support Signal is an optional support for OPB. A
DMA peripheral support signal can transfer to a DMA peripheral attached
to the bus controller, or it can transfer directly through the OPB bus to the
DMA peripherals. Before requesting a bus controller to transfer data the
DMA should be programmed with the devices width, device location,
transfer direction, target address and timing parameters.
4.1.2 OPB Interfaces
The OPB Interface connects input and output signals through the OPB
bus. We can attach masters and slaves in various styles, regardless of
size, through the OPB Bus with few limitations. The types of interfaces
that will now be discussed are:
4.1.2.1 OPB Master Interface
4.1.2.2 OPB Slave Interface
4.1.2.3 OPB Arbiter Interface
4.1.2.4 Optional DMA Interface
4.1.2.1 OPB Master Interface:
An OPB Master Interface connects with different types of signals, e.g.
OPB master interface signal and OPB signals, to the OPB Bus Logic.
23


Sometimes the OPB master device can act as OPB slave for the other
OPB bus master request.
4.1.2.2 OPB Slave Interface:
In this type of interface, all the OPB slave interface signals and OPB
signals connect the OPB slave to the OPB Bus Logic. The OPB slave
device can also transfer 8 bit or 16 bit packets of data.
4.1.2.3 OPB Arbiter Interface:
In this type of interface, OPB arbiter interface input/output signals and
the OPB signals, connects the OPB arbiter to the OPB Bus Logic. Some
signals that are used to implement the OPB arbiter interface function
include OPB_MnGrant, OPB_timeout and Mn_request.
4.1.2.4 Optional DMA Interface:
The Optional DMA Interface supports the request and
acknowledgement functions. The DMA Interface has direct access to the
DMA channel on the DMA controller, meaning that all the signal requests
and signal acknowledgements are directly linked with the channel. Below
Figure 4.2 shows the connections between the DMA controller and OPB
DMA peripheral devices.
24


OPB DMA DMA Core
Peripheral Device Sln_dmaReq
DMA SlnAck

Figure 4.2 Optional DMA Interface
4.1.3 OPB Timing Specifications
The OPB Timing Specifications clearly illustrate the operation of master
and slave signals on the OPB bus. Every signal is triggered at the positive
edge of OPB clock. All the signals are captured in the OPB master and
OPB slave on the rising edge of the OPB clock. Setup and Hold times are
dependent on the technology and the physical implementation of the bus
for the OPB input and output delays. Below are the six different OPB
Operations that will be explained in brief.
4.1.4 OPB Operations
4.1.4.1 OPB Bus Arbitration Protocol
4.1.4.2 Data Transfer Protocol
25


4.1.4.3 Dynamic Bus Sizing
4.1.4.4 Connection of 32-bit and 64-bit devices
4.1.4.5 OPB Master Latency
4.1.4.6 Optional OPB DMA Transfers
4.1.4.1 OPB Bus Arbitration Protocol:
In this operation, the OPB Arbiter receives the request from the OPB
master to transfer the data to a slave. See Figure 4.3. At the positive/rising
edge of the OPB clock, the arbiter will grant access to the OPB bus for
data transfer on a highest priority basis. In order to avoid access conflicts
while the data transfer is taking place, the OPB bus lock activates so that
other masters cannot take over the control bus.
The OPB Arbiter controls the access of a master to the OPB bus for data
transfer. Even if an OPB master asserts a request for sequential transfer,
the OPB arbiter will not necessary grant the bus to the same OPB master.
When a master gets the OPB bus, the OPB bus lock will activate. The
OPB Arbiter will unlock the OPB bus lock after one whole cycle is
complete and only then pursue the other masters requests for the bus.
This arrangement increases the available bandwidth of the system and
uses the OPB bus efficiently.
26


Cycles
OPB_Clk
M1_request
0PB_M1 Grant
M1_busLock
M1_select
OPB_xferAck
Figure 4.3: OPB Bus Arbitration Bus Lock Signal
4.1.4.2 Data Transfer Protocol
Similar to OPB Bus Arbitration Protocol discussed above, in the OPB
Data Transfer Protocol, the arbiter also grants access to the control bus
based on highest priority at the rising edge of the OPB clock. Once
granted access of the bus, the masters bus lock will activate so that no
other master will be able to use the bus. This is critical for the preservation
of data. As soon as data transfer to the bus occurs, a slave will retrieve
the data and sends an acknowledgement to the master via a transfer
acknowledge signal, xferAck.
27


To end a data transfer session, the active master can terminate the OPB
bus connection in between the data transfer to the slave by simply
deasserting its select signal for the OPB bus for the next cycle. This will
force slave to abort the transfer and to return to its idle condition. OPB
Arbiter then grants the control bus to another master with higher priority.
4.1.4.3 Dynamic Bus Sizing
Dynamic Bus Sizing allows different types of data bus width. Dynamic
Bus Sizing utilized when the master request bigger data widths from the
slave, or smaller data width from the slave or the slave has data width that
differs from the master. Dynamic Bus Sizing is controlled by six different
operations:
1. Half-word Transfer,
2. Full-word Transfer,
3. Double-word Transfer,
4. Half-word Acknowledge,
5. Full-word Acknowledge, and
6. Double-word Acknowledge.
If the slaves bandwidth is smaller than the masters bandwidth, the master
must divide the transfer into two or more halves, which are referred to as
28


conversion cycles. The master will perform the number of conversion
cycles needed to complete the data transfer i.e. Half-word Transfer, Full-
word Transfer and Double-word Transfer. During the write cycles, masters
are required to mirror write data to byte lanes onto which smaller width
slaves may be attached. During the read cycles, Byte, Half-word, and Full-
word slaves provide read data to the relevant byte lanes.
This will take place in separate cycles so there is a possibility of
interruption by another master having a higher priority request. Even using
busLock, a master device cannot guarantee an uninterrupted dynamic bus
sizing operation. Therefore, it a Master should activate the busLock at
every cycle of transferring the data to a slave to avoid overlapping
arbitration capability of OPB and reduce overall bandwidth.
For conservation of power consumption, when data is not being
transferred the slave itself should not activate a Slave Data Bus
(SI_DBusEn).
4.1.4.4 Connection of 32-bit and 64-bit devices with byte
enables
This type of OPB operation connects a 32-bit master device with a 64-
bit slave device, or a 32-bit of slave device with a 64-bit master device
29


with byte enable. (Byte enable means it requires unique wiring.) Figure
4.4 and Figure 4.5 show the two different types of connections used to
transfer data from an OPB Master to an OPB Bus Logic.
In this operation, a Master can transfer data from a 32-bit OPB to a 64-bit
OPB depending on the Master Address Bus, which provides the signal to
the 4-bit Demultiplexer. Figure 4.4 and Figure 4.5 show an OPB signal for
a 32-bit Master with a 64-bit OPB byte enabled connection and a 64-bit
Master with 32-bit OPB byte enabled connection, respectively.
Figure 4.4: 32-bit Master with 64-bit OPB byte enable connection
30


Figure 4.5: 64-bit Master with 32-bit OPB byte enable connection
Also, the master can transfer data from a 64-bit OPB to a 32-bit OPB bus
logic thus providing signal to a 4-bit multiplexer depending on Master
Address Bus.
4.1.4.5 OPB Master Latency
OPB Master Latency controls the time in order to optimize the amount
of bandwidth for a given master. When long-locked sequences are used to
transfer data, then master latency counters are used. Master latency
counters are a type of programmable register, which limits a master tenure
on the OPB bus. Two registers are required to implement the master
31


latency function, a Latency Register (LR) and a Latency Counter (LC).
Both are designed in such a way that they will be cleared at reset. The LR
contains the count value (i.e. the permitted number of transfers allowed
when master locks the bus) and the optional enable function for memory
mapping. This is programmed to read and write by either DCR bus or
memory mapped. On the other hand, the LC implements a transfer-
acknowledge and are not accessible through code.
4.1.4.6 Optional OPB DMA Transfers
As mentioned earlier, DMA peripherals are directly in contact with the
OPB via the DMA master. There are four types of optional DMA transfers:
1. DMA peripheral read or write,
2. DMA peripheral burst read or write,
3. DMA flyby read or write transfers, and
4. DMA flyby burst read or write transfers.
In Optional OPB DMA Transfers, Request and Acknowledge of
handshaking is used. Using flyby terms implies that the data flows
directly from the source to the destination. Only memory slaves and DMA
peripheral slaves of the same size can perform DMA flyby transfer. DMA
flyby operations are used in designs that have a low bandwidth peripheral
slave device with rare requests. Otherwise, the data transfer will take one
32


extra cycle each time resulting in a noticeable decrease in speed.
Peripheral slave devices with higher bandwidth should use burst operation
performing multiple accesses per DMA master bus grant.
The data transfer process between DMA, DMA peripheral slave and
memory are carefully monitored to assure that timing of both ends (source
and destination) are compatible with each other.
33


4.2 WISHBONE
WISHBONE was released in 2001 as an open source bus
architecture in the public domain with no copyright. It is maintained by
Richard Herveillel and recommended by OPEN CORES. (OPEN CORES
is a community of people who develop digital open source software and
hardware for academic and personal use.) WISHBONE is a flexible design
methodology for the interconnection of Portable IP Cores and can be used
with soft core, firm core or hard core IP. WISHBONE enhances the
flexibility, reliability, robustness, and portability of the system by
eliminating SoC integration problems. Used the WISBONE document
available through open cores and study its importance over other bus
architecture [6].
In the past, microcomputer buses used printed circuit backplanes with
hardwired connection that set the boundary for the interface. This
prohibited the use of different types of interfaces according to the
requirement of master and slave. WISHBONE surmounts this limitation by
facilitating the use of easily adjustable interconnection paths. It opens the
option of using different interconnections such as Point-To-Point Interface,
Data Flow, Shared Bus and Crossbar Switch, to the desired requirement
of master and slave interface. Interfaces are globally available for the user
and easily modified according to requirement of the SoC used.
34


WISHBONE, in combination with publicly available open sources cores,
has created an entirely new genre of chip-level systems. WISHBONE
public domain library for VHDL is available online. Then user can modify it
according to the specific requirement for SoC. In a field previously
dominated by highly secure, embedded IP chips, the open source model
has actually proven superior to proprietary bus architecture. WISHBONE-
based designs can be visually inspected, provide greater flexibility for
customization and typically use the standard interconnections. They
connect an IP core to the surrounding interface logic more quickly and
easily compared to IP cores that require logic to connect cores together. In
contrast, the OPB bus is copyright by IBM and is not compatible with
different types of IP core. The OPB bus is discussed in Section 4.1.
WISHBONE is the bus architecture interface that connects digital circuits
together to form an integrated circuit chip. It connects the individually
developed IP cores and allows users to control the whole design.
WISHBONE creates an architecture that has a smooth transition path to
support new bus arbitration techniques. Different types of bus arbitration
techniques such as Priority Arbiter, Round-Robin Arbiter can be used in
shared bus or crossbar switch configuration.
35


WISHBONE works with the master and slave concept, meaning the
master introduces data transactions involving the slave. WISHBONE
creates a flexible architecture that allows address, data and bus cycles to
be tagged. Tags are user-defined signal that allow users to modify a bus
cycle with addition information.
WISHBONE designs work independently of the defined frequency range
and act as one large synchronous circuit that operates, ideally, with an
infinite frequency bandwidth. However, since every integrated circuit has
its own maximum frequency WISHBONE is limited by this variable. Refer
to Section 2.2 for a description of this limitation.
WISHBONE is used with FPGA and ASIC devices using hardware
description languages like VHDL and Verilog. These factors help to make
modification and optimization of SOC interface / interconnection design
relatively easy to manage using WISHBONE. WISHBONE public domain
library is available online
http://www.pldworld.com/ hdl/2/ ip/-silicore.net/wishbone.htm. and is used
and modified according to the requirements.
36


4.2.1 Interconnections
WISHBONE imposes and develops compatibility between IP core
using master/slave topology. The interconnection must be capable of
supporting multiple masters and multiple slaves with an effcient arbitration
technique. It supports the following:
4.2.1.1 Point-to Point Interconnections IP cores
4.2.1.2 Data flow Interconnections IP cores
4.2.1.3 Shared Bus Interconnections IP cores
4.2.1.4 Crossbar Switches/lnterconnection IP cores
4.2.1.1 Point-to-point Interconnection:
The term point-to-point interconnection simply means connecting
the master and slave interface in a very simple, one-to-one manner. Point-
to-Point Interconnection connects only one master to one slave. This is
the simplest, most concise and efficient way to connect two digital circuits
or WISHBONE IP cores together.
Point-to-point helps to establish the speed and size of the FPGA or ASIC
WISHBONE
MASTER
WISHBONE
SLAVE
Figure 4.6: Point-to-Point Interconnection
37


devices. Using synthesis tools like routers (INTERCONN or TRI State
Buffer) help to connect the logic gates. Register Transfer Logic (RTL)
transfers data effectively from one end to the other.
4.2.1.2 Data Flow Interconnection:
In a Data Flow Interconnection, similar to pipelining, connects date
in series or in sequence. Data flows from one master and slave to the next
master and slave in a continuous manner. See Figure 4.7.
Figure 4.7: Data Flow Interconnection
This approach increases the execution time as compared to the
Parallelism. If all the IP Cores act as a floating-point processor and if it is
assumed that each IP core will take the same time to solve the problem in
a sequential manner, then it will increase the speed by three (3) times as
compared to the single processor.
38


4.2.1.3 Shared Bus Interconnection:
In a Shared Bus Interconnection, a master are interfaced with many
slaves and can communicate through addressable bus cycles. For
example, Versa Modular Eurocard bus (VMEbus) and PCI (Peripheral
Component Interconnects) use Shared Bus configuration. Figure 4.8
shows the Shared Bus Interconnection.
Figure 4.8: Shared Bus Interconnection
In Shared Bus Interconnections, a System Arbiter manages the control of
the bus so that only one master can access the control at a time. No other
master can interrupt the data transfer. This has fewer logic gates (as
compared to other interconnection approaches), prevents loss of data and
reduces cost by sharing the resources. Shared Bus Interconnection has
developed in different ways by using either three-state buses (TRI State
Buffer) or multiplexers. This provides flexibility to the system integrator as
more circuits are compatible with multiplexers than with three state buses
i.e. TRI State Buffer.
39


The major drawback to Shared Bus Interconnection is that the master has
to wait to acquire the control bus, which depends on the bus arbitrations
techniques used. This decreases the overall speed and performance of
the system.
4.2.1.4 Crossbar Switch:
The Crossbar Switch depends on a master and slave architecture
concept in which channels connected in parallel provide a higher data rate
for the entire system. A Crossbar Switch increases the data transfer rate
by adding the individual data rates together. A Crossbar Switch allows
more than one master to use the communications link as long as two
masters do not access the same slave at the same time. These results in
increased data transfer rates compared to the other three
Interconnections.
Cross Bar Switch does have one disadvantage in that it requires more
logic gates and routers to connect the masters and slaves at the same
time. Figure 4.9 shows the Crossbar switch interconnection.
40


Figure 4.9: Crossbar Switch Interconnection
4.2.2 System Arbiter
System Arbiter determines which master can use the bus cycle.
WISHBONE allows the usage of different types of arbiters. As mentioned
earlier, WISHBONE offers a flexible integration solution that is easily
modifiable to a specific requirement. In Chapter 3, we described different
types of bus arbitration techniques using the Round-Robin and Priority-
based bus arbitration techniques in WISHBONE examples. Later in this
chapter, these examples will be explained. WISHBONE offers Point-to-
Point configuration usage inside the shared bus configuration along with
arbiter, which will help to use the architecture in different ways with less
modification.
41


4.2.3 Bus Signals
WISHBONE master and slave interfaces are connectable in various
configurations. This allows WISHBONE interface signals and bus cycles to
be very flexible and reusable. These signals support point-to-point, data
flow, shared bus and crossbar switch interconnections. These signals
work with three basic bus cycles, i.e. Single Read/Write, Block Read/Write
and Read Modify Write (RMW). They use the handshaking method that
allows master or slave to modify data transfer rates during the bus cycle.
RST_I (reset) and CLKJ (clock) on master as well as slave is generated
by SYSCON.
SVSCON
WISHBONE MASTER
RST_I
CLKJ
ADR_0()
DATJ()
DAT_0()
WE_0()
SEL_0()
STB_0()
ACKJ()
CYC_0()
WISHBONE SLAVE
RSTJ
CLKJ
ADR J()
DAT JO
DAT.O0
WEJO
SELJO
STB JO
ACK_0()
CYC J()
Figure 4.10: Simple WISHBONE master/slave bus signal
42


The ADR_0 (address byte) from master to slave is independent of the
number of bytes. DATJ and DAT_0 (data transfer) signals the link master
and slave. When the WE_0 (write) signal is high, it will transfer data from
master to slave. The SEL_0 (select) signal is optional and depends on the
interconnection used. The master can confirm the valid data transfer by
observing at the ACKJ (acknowledgement). This is activated by the slave
ACK_0 (acknowledgement by slave). Cycle signal (CYC_0) refers to the
starting and stopping point in Block Bus Cycle. Also, CYC_0 is used in
shared bus and crossbar interconnection as it informs system logic that
the MASTER wishes to use the bus or its through with the bus.
The slave use one of the following signals ACK_0, ERR_0 or RTY_0 to
accept a data transfer, reject a data transfer with error or retry
acknowledgement signal, respectively. ERR_0 and RTY_0 signals are
optional in slave configurations.
These are the few basic bus signals used in different types of WISHBONE
Interconnections. In Section 4.2.4 the implementation of these bus signals
in different type of bus cycles are shown.
43


4.2.4 WISHBONE Bus Cycles
WISHBONE supports three types of bus cycles that help the master
and slave transfer data. These all use the handshaking concept, which
helps the slave to accept, reject or request a data transfer from the
master. All signals are positive logic and asserted by setting to logic 1.
The bus cycles used by master and slave to transfer the data are:
4.2.4.1 Single Read/Write
4.2.4.2 Block Read/Write
4.2.4.3 Read Modify Write (RMW)
4.2.4.1 Single Read/Write:
The Single Read/Write bus cycle transfers a single datum between
the master and slave. This acts as a point-to-point configuration in which
one master connects with one slave.
In the Single-Read Bus cycle, at the rising clock edge, the Master asserts
the signals ADR_0(), WE_0(), SEL_0(), STB_0() and CYC_0(). The Slave
recognizes the request by checking STB_I and the address inputs and
places a valid bit on DAT_0() line. The slave also responds on the control
bus by uploading a bit on the Master Acknowledge signal, ACK_I. The
Master also checks its ACK I to see whether the slave received the data
44


or not. After the data is acknowledge by the slave, the master locks
DATJO and deasserts the STB_0 signal by setting to logic 0.
A Signal-Write Bus Cycle works in the same manner except the master
initiates it with WE_0 and places data on DATA_0. After recognizing the
data, the slave asserts ACK_0.
4.2.4.2 Block Read/Write:
Block Read/Write cycles are essentially a string of two or more
SINGLE cycles strung together which allows multiple data transfers.
CYC_0 identifies the starting and stopping point in the Block cycle.
CYC_0 enables shared bus or crossbar interconnections and inform
system logic that the master wishes to use the bus or is finished using the
bus.
4.2.4.3 Read Modify Write (RMW)
A RMW cycle allows a read and a write in one bus cycle to a memory
location for a multiprocessor or a multitasking system. RMW cycle allows
multiple processes to share common resources. WISHBONE stops the
interference between two masters for the bus by using system arbiter. If
two-masters are trying to access the bus controller at the same time this
may cause the controller to fail.
45


In an RMW cycle, if one master is using the bus control for reading,
manipulating and then writing back the data to the bus, the second master
is prevented from using the control bus during that period. The second
master can check through the 0 or 1 bit on the bus control. If the bus
controller CYC_0() has 1 bit then second master cannot take the bus, but
if the bus controller has 0 bit then second master can access the bus for
reading, modifying and writing (in RMW cycle) the data. In addition, if the
first master leaves the bus after reading, modifying and writing to the bus,
it must set the bus controller CYC_0() to 0, allowing the second master
bus access. Otherwise, it will retrieve the data incorrectly.
4.2.5 Feedback Bus Cycles Used in WISHBONE
WISHBONE has a unique quality called a Feedback Bus cycle. It
uses an INTERCONN as a mediator to and from the signal between
master and slave and transforming an loop between each other. This way
the gap in bandwidth is utilize and helps increase system performance in
small SoC devices. INTERCONN helps to decrease the number of cycles
used to transfer data from master to slave. This works with an
asynchronous loop from the master to slave, passing through the
INTERCONN, and from slave back to the master in the same way, again
passing through the INTERCONN. Figure 4.11 below shows the
INTERCONN in asynchronous cycle feedback path.
46


Figure 4.11: Asynchronous cycle Feedback path
Synchronous cycle termination comes with some delay and can be
reduced by using an INTERCONN, which decreases a wait state for every
transfer. In this instance, INTERCONN informs the slave about the data
transfer in sequence, so that the slave does not take an extra cycle to
read the data. This is called advanced synchronous cycle termination that
has been shown to increase the performance of SoCs by 60% and
improves the timing by 1.5ns. This in turn provides more frequency and
better bandwidth [6].
4.2.6 WISHBONE Timing Specification
WISHBONE specification design is to provide the end user with
very simple timing constraints. The maximum clock frequency is needed at
this point to set up the logical signal path. When designing a circuit all
47


input and output signals should be synchronized and stable with the
positive rising edge of clock input (CLK_I). This requires only a single
timing delay specification for Tpd (progational delay), clk-su (clock to the
setup). Tpd,clk-su. CLKJ has to maintain order with all IP core inside the
WISHBONE Interface.
Tpd, Clk-su =1/Fclk

Figure 4.12: Timing specification for Tpd, clk-su
This comes with some limitation but can be modified and corrected. Figure
4.12 above shows the timing specification for WISHBONE.
4.2.7 WISHBONE Slave Input/Output Port Example
The design elegance of WISHBONE interfaces and how they are
able to use fewer logic gates is perhaps best illustrated via an example.
A WISHBONE interface using a slave output port example is shown in
Figure 4.13.
48


In Figure 4.13, a WISHBONE Slave Output Port uses an 8-bit WISHBONE
slave output port with a D type flip-flop register. It includes a synchronous
reset and a single AND gate. The AND gate is used to hold the erroneous
data from being latched into the register, during the read cycles. During
the write cycle, data is acquired at the rising edge of clock [CLK_I] when
both strobe signal [STBJ] and write signal [WE_I] are asserted. The
Master can then observe the output data lines [DAT_0]. WISHBONE
interface does not require any extra logic gates as WISHBONE is
designed to work with standard, synchronous and combinational logic that
are available in most FPGA and ASIC target devices.
Figure 4.13: Simple 8-bit WISHBONE Slave output port
49


-I DAT_i(7:0) DAT_O(7:0)
; CLKJ
1 RST_I
i
1 STBJ
WE I
Figure 4.14: Simple 8-bit WISHBONE Slave output port, RTL Schematic
With a WISHBONE Interface, input and output width can be changed
according to the requirement of the application.
For reference, VHDL Coding of the test bench for Simple 8-bit
WISHBONE slave output done on Xilinx Software is illustrated in VHDL
codel in Appendix A.
4.2.8 Interfacing of WISHBONE with Memory
A WISHBONE interface is compatible with Random Access
Memory (RAM) or Read Only Memory (ROM). If the RAM or ROM setup is
similar to the WISHBONE interface then there will be higher bandwidth
available with greater efficiency (as compared to a typical non-
50


WISHBONE setup). That is because there were very few RAMs or ROMs
used across the FPGA or ASIC devices.
WISHBONE bus cycles such as Single Read/Write (SRW), Block
Read/Write (BRW) and Read Modify Write (RMW) are used to interface
between FPGA and ASIC devices and RAM. The FASM synchronous
RAM model conforms to the connection shown in Figure 4.16. The
WISHBONE bus cycles all are designed to interface directly to this type of
RAM. During a read cycle, synchronous RAM acts as an asynchronous
ROM. It pulls the data addressed by the ADR () inputs without an asserted
clock. In contrast, the Write cycle functions when write enable (WE) input
is present and a rising clock edge is recognized. Then RAM stores the
input data at the indicated address.
The various types of WISHBONE Bus Cycles (such as Single Read/Write,
Block Read/Write and Read Modify write (RMW)), works similar to FASM
Synchronous RAM mode. FASM Synchronous RAM mode is compatible
with WISHBONE interfaces as it is designed to work with standard,
synchronous and combinational logic available on most FPGA and ASIC
devices. WISHBONE uses one AND gate to interconnect FPGA and ASIC
devices with RAM.
51


DAT_O(31:0) I----
ACKJD ;---
Figure 4.15: FASM synchronous RAM, RTL Schematic
There are two types of RAM primitives that are generally found on FPGA
devices and on ASIC devices. First is when synchronous data at the
output of the rising clock edge, and second, when there is asynchronous
data at the output after the address is presented to the RAM element.
' 0) SPO(S1'0)
0(51.0)
CLK
WE
ACK 0
GUI
Figure 4.16: FASM synchronous RAM connection in detail
52


Waveforms 1 & 2 provide two different results for connection of RAM
inside the FPGAs and the target devices. For reference, VHDL Code 2 & 3
are shown at the in Appendix A.
During the read cycle in Figure 4.17 and 4.18, data is transferred from the
address that is indicated by ADR_0(). The clock input is ignored. Whereas
in write cycle, when write signal (WE_i()) and clock (CLK_i) is enable the
data will be retrieved from the indicated address by ADR()
53


Waveform 2:
Figure 4.18 Waveform for FASM synchronous RAM with ADR = 101
4.2.9 SIMULATION and BENCHMARK
As mentioned earlier, WISHBONE has portability and high speed with
FPGA and ASIC target devices. Lets take an example of the Point-To-
Point Interface with one master and one slave, Point-To-Point Interface
with one master and two slave and a Shared Bus Interface based on
Round-Robin Arbiter. Obtaining a benchmark for all three and simulating
with the Xilinx Software. WISHBONE is effectively operated with a Xilinx
block RAM, which provides the option to modify the register RAM to be a
distributed RAM. WISHBONE public domain library is available online
http://www.pldworld.com/ hdl/2/ ip/-silicore.net/wishbone.htm. and is used
54


and modified according to the requirements. Benchmark will prove
WISHBONE portability, speed and data transfer rate.
Point-To-Point Interface connects a DMA as a Master to an 8 x 32-bit
register memory as a slave is illustrated in Figure 4.19.
As mentioned, In Point-to-Point interface, one master is directly connected
with one slave. There is no need of arbiter. SYSCON is used to generate
reset and clock signals to DMA as well as RAM.
Point-To-Point Interface:
RTL Schematic:
elk dat_i_sample(15:0)-------------
i :
I ru n
i !
Figure 4.19 RTL Schematic for Point-To-Point Interface
In Figure 4.20, ADR (11.0) resembles address (ADRJ) signal in DMA,
data input (DAT_I) to RAM, data output (DAT_0) from the RAM. ACK_0
55


is the acknowledgment signal from DMA to RAM. STB and WE are the
strobe (STR_0) and write (WE_0) output signals respectively in RAM.
CLK is the clock (CLK_I) generates by SYSCON.
Figure 4.20 RTL Schematic for Point-To-Point Interface in detail
56


Waveform 3:
Waveform 3 provides result for Point-To-Point Interface. For reference,
VHDL Code 4 represents for Point-To-Point Interface is shown in the
Appendix.
57


Point-to-Point with 1 Master and 2 Slaves:
RTL Schematic:
Figure 4.22 Point-to-Point RTL Schematic
One master can connect with one or multiple slaves with the help of select
signal.
58


SYSCON is used to generate reset and clock signals to DMA as well as
RAM. Waveform 4 provides for Shared Bus example.
59


Waveform 4:
Figure 4.24 Waveform of Point-to-Point Interface with 1 master and 2
slave
VHDL code 5 represents VHDL for Round-Robin Arbiter of Shared Bus
Interface and shown in Appendix A.
60


Shared Bus Interface with 2 Master and 2 Slave
RTL Schematic:
elk dat_i_sample(15:0)----------
i
run
Figure 4.25 Shared Bus with Priority based Arbiter RTL Schematic
In Shared Bus Interface with 2 Master and 2 Slave is based on Round-
Robin Arbiter, arbiter will not grant the bus to the highest priority master as
RR depends on the sequence and time-slice.
61


Figure 4.26 Shared Bus Interface with Priority based Arbiter RTL
Schematic in detail
SYSCON is used to generate reset and clock signals to DMA as well as
RAM. Waveform 5 & 6 provides results for Shared Bus based on priority
example.
62


Waveform 5:

.=AksJ!_
0' <*_2.
, JfMgg,
AaSBj2,
. -jJjAMSS5,!.
P* Awrftt^.
.stejMWAwswf/sU)
,jlav^_Ww!(V ..H^^vWtet^^wnwiniat'
'pitye
.(^.vM.Ajui/ssaatliWae
iiaVT^jA^iAVOoAftfl 'tbiogood
jHas^AiAMAsaMwI/eslaitJ
iteNmjWuiimMH&jijttmt

6' ImStsJj&mjhdlsitntasivVmrt
o;i
' 2n^J.to^.MAJtAaslt2nsi9te
£2
lirr;*-
|ipr!j
Now i 720000200 ps
Figure: 4.27 Waveform of Shared Bus Interface with 2 master and 2
slave
Most of the signals are explained in Point-to-Point interface example
above. Only difference in the signals is the arbiter signal, i.e. GNT is a
grant signal and the CYC (cycle signal). These signals indicate which
master is granted the bus at which cycle.
63


Waveform 6:
Figure 4.28 Continuation Waveform of the Shared Bus Interface
with Interface with 2 master and 2 slave
VHDL code 6 represents VHDL for Priority based Arbiter of Shared Bus
Interface shown in Appendix A.
64


Benchmarking Results
Simulations have demonstrated that the Interface between the
WISHBONE and Xilinx tools, i.e. Spartan 2, gives a good data rate and
uses less space. Table 4.1 below shows the Benchmark for all three
Interfaces discussed, i.e. Point-to-Point interface, Point-to-Point interface
with 2 masters and 1 slave and Shared Bus Interface with 2 masters and 2
slaves.
Table 4.1: Benchmarking Results
Interface Speed Grade Minimum Period Maximum Frequency
Point-to- Point -5 6.444 ns 155.172 MHz
Point-to- Point with 2 Master and 1 Slave -5 5.321 ns 187.926 MHz
Shared Bus with 2masters and 2 slaves -5 8.532 ns 117.204 MHz
In Shared bus with priority arbiter, we have taken only two masters and
two slaves. Above table proves that WISHBONE time delay is very less
65


and can be modified according to the requirement, which will decrease the
time-slice.
An advantage of the WISHBONE Interface is that it can be compatible
with many different systems. The use of WISHBONE to reduce the
number of backplane techniques (such as Point-To-Point, Data Flow,
Shared Bus and Crossbar Switch Interconnection) is readily observed in
the microcomputer bus industry. WISHBONE also uses Tri-State buffers
that are consistent with WISHBONE Interconnections.
In addition, WISHBONE supports the system integrator to alter the
interconnection logic and signal paths and is much simpler to customize.
Its interconnections are programmable in FPGA and ASIC target devices
and supports hardware description languages like VHDL or Verilog. It has
been observed that the shared bus when combined with a synchronous
handshake in an asynchronous setup gives rise to timing constraints.
This can be corrected by calculating the metastability factor using
Inductive reasoning as discussed in the next section.
66


6 Conclusion
In order to achieve the performance required for systems which
produce results in real-time with minimum latency, developers are seeking
ways to consolidate interconnecting layers across the system. Data often
passes through many interconnect layers and protocols which creates the
undesirable latency, complexity, smaller bandwidth and higher cost. SoC
bus architecture must perform efficiently for the data to transport with
optimal interface. Utilizing WISHBONES capabilities in particular its
different interconnection and techniques it is generally simpler to
customize according to the particular system requirements.
Comparing WISHBONE with OPB specifications, WISHBONE provides
the facility to modify, use different components according to the
requirement of the user. Hence gives the flexibility to the system design
whereas OPB is a standard architecture and cannot modify easily.
WISHBONE provides with the option of different types of interconnection
as compared to OPB structure. WISHBONE interconnection can use any
type of bus arbitration techniques explained in section 3 but OPB is only
priority based bus arbiter.
67


Further comparing WISHBONE with OPB, provides the simple and easy
interface signals. These signals can be used according to the requirement
of the bus cycle or the interconnections. All these factors shows the
flexibility, portability and reliability of WISHBONE architecture over OPB
and other related architecture.
68


APPENDIX A.
A.1 Calculating MTBF using Inductive Reasoning
A conditional joint probability is used to derive the Mean Time
Between Failure (MTBF) of synchronous circuits having asynchronous
inputs. With parameters from vendor data sheets for programmable logic,
the derived model is used to evaluate the MTBF for several synchronizer
configurations [4],
The outputs of sequential circuit devices are not guaranteed to operate
properly unless the input to them is constant during a window of time
before the rising clock edge Tsetup and a period of time after the rising
edgeTAHW. Within a clock domain, this has accomplished by making the
clock period Tdock > ThM + Tselup + Tpropagation where Tpropaijaljon accounts for the
propagation delay through circuit elements and the transport delays
encountered between them to the next clocked input.
However, when inputs arrive from different frequency clock domains or
asynchronous events, they will randomly change state in the Tsetup to
69


Thold window possibly causing system failures. Input synchronizers used
to reduce the MTBF of these inputs.
The event of the input occurring during the window W Thold+Tsaup isnt
the only criteria of those times that the input is in this interval does a flip-
flop (or any general registers) output move to an improper level, called
metastable state, prior to being used by subsequent logic element in the
system. When the enable input of a particular flip-flop is not activated, the
output of that flip-flop is routed to its input. Thus, at every clock edge while
its enable is not active, the flip-flop resamples its own output. Sequential
logic errors are from incorrect logic level outputs and metastable outputs
to state machines entering the wrong state or and unintended state.
elk _________ ___________ ___________ ___________ ___________ ___________ ____________
eik | '
ervabte(i)
onabte(ii1)
asynchronous input-------------
D(i) ----------------------------------------------------------------------
Figure A.1: Timing Diagram for the implementation
70


Conditional Joint Probability
A basic circuit connection will be used to discuss the time changes called
Tinpu, and the time that the circuit output requires to recover from
metastability Tm.. Also shown, is the resolution time available t for the
circuit to recover without any problems occurring. Typically designed or
leftover, r is the clock period less the setup, hold, and propagation
delays. Synchronizers seek to maximize the recovery time. The timing
waveforms show the relationship discussed.
The window W Thold + Tselup combined into one region of the Tcloi.k period
for simplicity of illustration. When the input falls into the window w, the
sequential elements output becomes metastable, but the time that it stays
metastable is random.
The metastable phenomena are often likened to a smooth hill that a ball
must roll over with the energy from the input. If the input provides
inadequate energy, the ball may roll up the hill and then back down. With
more energy, it may roll to the top of the hill and balance until random
noise makes it roll down one side or the other, or have just enough to
clear the top of the hill and accelerate to the bottom. In normal operation,
the input supplies adequate energy that quickly moves the ball to a stable
position.
71


To set this problem up following probalistic reasoning, the random events
must identified, the probability space identified, and then the axioms and
theorems, and algebras allowed on the space applied.
The probability of an output error consists of the following events and their
probability spaces.
The event Tinpul occurring in window w within clock period Tclock. The
probability space is the period Tclock.
The event 7> occurring after resolution time Tq, Tq greater in time than
the rising clock edge plus any propagation delay. The probability space is
Tq >0
As a joint conditional probability, this can stated as
P(TdeW,Tq>t\Tde Tc]ock,Tq>0)
Probability of synchronization failure after the clock period
Rate of Asynchronous Input Errors:
The probability that the input is in the error time window W T +Tselup
given the clock period Tcloi.k is
72


Rate = Fj P(Td e W Jq > x I Td e TMJq > 0)
These two events will consider independent, since a change in the
probability of the input occurring and causing the output to become
metastable does not change the probability of the output remaining
unstable.
Rate Fd P{Td eW\Tde Tchck)* P{Tq >t\Tde W ,Tq> 0)
The conditional part of the probability defines the experiment space. The
input event assumed to occur equally likely at any point in the clock
interval giving probability
P(Td eW\Tde Tclock) = W/Tdock
* * W /Tclock P(Tq > x I Td e W,Tq > 0)
Now considering,
(1)
P(Tq>r\TdeW,Tq>0) =
73


(2)
P(re At) = P(Tq r)
= P(Tq r)*P(Tq >r\Tq When density function at the average point where metastability ends,
P(7;r) = -^-
T +
And
P(Tq >T\Tq Calculating we get,
At
- At
T +
2

*P(re At) = F(t + At) F(t) = - F(t))
T +
2
Resolving the above equation, we get
a<-*> Ar r + At T + At
74


dF(T)
dt
+ -F(r) = =
T T
F(t) = K,+ K2e~x/x
F(T) = \-e~xrT = E{F{t)} = t
P(Tq >r\Tq From (1) & (3), we get
w
Rate of errors = Fd.--*(1-F(r))
^clock
MTBF = 1/Rate Errors

MTBF =
F W F
rJ*yr r dock
(3)
75


WISHBONE public domain library is available online
http://www.pldworld.com/_hdl/2/_ip/~
silicore.net/wishbone.htm, and is used and modified
according to the requirements below.
VHDL Code 1: Simple 8-bit WISHBONE Slave output port
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY 8bitSlave IS
port (
ACK_ 0: out std_logic;
CLK_ .1: in std_logic;
DAT_ .1: in std_logic_vector(7 downto 0)
DAT_ 0: out std_logic_vector(7 downto 0)
RST_ .1: in std_logic;
STB_ .1: in std_logic;
WE_I in std_logic;
PRT_ 0: out std_logic_vector(7 downto 0)
);
END 8bitSlave;
ARCHITECTURE behavioral OF 8bitSlave IS
signal Q:std_logic_vector(7 downto 0);
BEGIN
76


CLK_I<='1' after 10 ns
when CLK_I/='l' else 'O' after 10 ns;
8bitSlave : PROCESS ( CLK_I )
BEGIN
if( rising_edge( CLK_I )) then
if(RST_I = '1') then
Q <= B"00000000";
elsif ( (STB_I and WE_I) = '1') then
Q <= DAT_I(7 downto 0);
else
Q <= Q;
end if;
end if;
END PROCESS 8bitSlave;
ACK_0 <= STB_I;
DAT_0 <= Q;
PRT_0 <= Q;
END 8bitSlave;
77


VHDL CODE 2: FASM synchronous RAM
Test Bench Created
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY testbench IS
END testbench;
ARCHITECTURE behavior OF testbench IS
Component Declaration
COMPONENT MEMO00la
PORT ( ACK_0: : out std_logic;
ADR_I: in std_logic_vector( 2 downto 0 ) ;
CLK_I: in std_logic;
DAT_I: in std_logic_vector( 31 downto 0 ) ;
DAT_0: out std_logic_vector( 31 downto 0 );
STB_I: in std_logic;
WE_I: in std_logic
) ;
END COMPONENT;
signal ACK_0: std_logic;
signal ADR_I: std_logic_vector( 2 downto 0 );
78


signal eCLK_I:
std_logic;
signal DAT_I:
signal DAT_0:
signal STB_I:
signal WE_I:
std_logic_vector( 31 downto 0 );
std_logic_vector( 31 downto 0 );
std_logic;
std_logic;
BEGIN
-- Component Instantiation
uut: MEMO00la PORT MAP(
ACK_0=> ack_o,
ADR_I=>adr_i,
CLK_I=>clk_i,
DAT_I=>dat_i,
DAT_0=>dat_0,
STB_I=>stb_i,
WE_I=>we_i );
clk_i<='l' aftpr 10 ns when clk_i/='l'
else 'O' after 10 ns;
Test Bench Statements
tb : PROCESS
BEGIN
wait until global set/reset completes
wait for 100 ns;
79


adr_i<="110";dat_i <=x"abcdO123";
stb_i<='1';we_i<='1';
wait for 20 ns;
adr_i<="110";
dat_i<=x"abcd0124";stb_i<='O';
we_i<='O';
wait for 20 ns;
adr_i<="110" ;
dat_i<=x"abcd0125";
stb_i<='1';
we_i<='1';
wait for 20 ns;
adr_i<="110";
dat_i<=x"abcd0126";
stb_i<='O';
we_i<='O';
wait for 20 ns;
adr_i<="110";
dat_i<=x"abcd0127";
stb_i<='1';
we_i<='1';
wait for 20 ns;
wait; will wait forever
80


END PROCESS tb;
End Test Bench
END;
VHDL Code 3:FASM synchronous RAM
RAM created
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;
entity ram08x32 is
port (
a: in std_Logic_vector(3 downto 0);
d: in std_Logic_vector(31 downto 0);
elk,we: in std_logic;
spo: out std_logic_Vector(31 downto 0));
end ram08x32;
architecture Behavioral of ram08x32 is
signal temp: std_logic;
begin
spo <= d( 31 downto 1) & temp;
temp <= elk and we;
81


end Behavioral;
VHDL Code 4: Point-to-Point
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;
ENTITY Point_to_Point_vhd IS
port(elk,run:in std_logic;
dat_i_sample:out std_logic_vector(15 downto 0));
END Point_to_Point_vhd;
ARCHITECTURE behavior OF Point_to_Point_vhd IS
Component Declaration for the Unit Under Test (UUT)
COMPONENT SYSCON
PORT (
elk : IN std_logic;
run : IN std_logic;
clk_i : OUT std_logic;
rst_i : OUT std_logic
) ;
82


END COMPONENT;
COMPONENT Master_RWTest_4k
PORT (
clk_i : IN std_logic;
rst_i : IN std_logic;
dat_i : IN std_logic_vector(15 downto 0);
ack_i : IN std_logic;
stb_o : OUT std_logic;
we_o : OUT std_logic;
dat_o : OUT std_logic_vector(15 downto 0)
adr_o ); : OUT std_logic_vector(11 downto 0)
END COMPONENT;
Inputs SIGNAL clk_ _i : std_logic := 'O';
SIGNAL rst_ _i : std_logic := 'O';
SIGNAL ack_ _i : std_logic := 'O';
SIGNAL dat_ (others = >'0 ) ; _i : std_logic_vector(15 downto 0)
Outputs
SIGNAL stb_o : std_logic;
SIGNAL we_o : std_logic;
SIGNAL dat_o : std_logic_vector(15 downto 0);
83


SIGNAL adr_o : std_logic_vector(11 downto 0);
Component Declaration for the Unit Under Test (UUT)
COMPONENT ram_4k_x_16
PORT (
stb_i : IN std_logic;
we_i : IN std_logic;
clk_i : IN std_logic;
rst_i : IN std_logic;
dat_i : IN std_logic_vector(15 downto 0);
adr_i : IN std_logic_vector(11 downto 0);
dat_o : OUT std_logic_vector(15 downto 0);
ack_o : OUT std_logic
);
END COMPONENT;
BEGIN
dat_i_sample<=dat_i;
-- Instantiate the Unit Under Test (UUT)
uutl: SYSCON PORT MAP(
elk => elk,
run => run,
clk_i => clk_i,
rst_i => rst_i
) ;
84


Instantiate the Unit Under Test (UUT)
uut2: Master_RWTest_4k PORT MAP(
clk_i => clk_i,
rst_i => rst_i,
stb_o => stb_o,
we_o => we_o,
dat_i => dat_i,
dat_o => dat_o,
adr_o => adr_o,
ack_i => ack_i
) ;
Instantiate the Unit Under Test (UUT)
uut3: ram_4k_x_16 PORT MAP(
stb_i => stb_o,
we_i => we_o,
clk_i => clk_i,
rst_i => rst_i,
dat_i => dat_o,
dat_o => dat_i,
adr_i => adr_o,
ack_o => ack_i
) ;
END;
85


SYSCON.VHD
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.ALL;
entity SYSCON is
port(elk, run: in std_logic;
clk_i, rst_i: out std_logic);
end SYSCON;
architecture Behavioral of SYSCON is
begin
rst_i <='0' when run = '1' else '1';
clk_i<=clk;
end Behavioral;
--------------MASTER_RWTEST_4K------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE nuxneric_std. ALL;
entity Master_RWTest_4k is
port(clk_i, rst_i:in std_logic;
stb_o, we_o:out std_logic;
86


dat_i:in std_logic_vector(15 downto 0);
dat_o:out std_logic_vector(15 downto 0);
adr_o:out std_logic_vector(11 downto 0);
ack_i: in std_logic);
end Master_RWTest_4k;
architecture Behavioral of Master_RWTest_4k is
signal count,countMinusl: std_logic_vector (15 downto
0) ;
signal run, count4095, resetCnt: std_logic;
type state_type is (init, write_l, wait_l,
wait_ack_l, read_l, error_l,wait_ack_2,
success_l);
signal pstate, nstate: state_type;
signal dataGood :std_logic;
begin
adr_o<=count(11 downto 0);
dat_o<= std_logic_vector( count);
process(pstate, count4095,dataGood, ack_i)
begin
default output values
stb_o <='01;we_o <='0';resetCnt<='O';run<='O';
87


case pstate is
when init =>
resetCnt <='!';
nstate <= write_l;
when write_l =>
stb_o<='1'; weo<='1';run<='1';
if count4095 ='1' then
nstate <= wait_l;
elsif ack_i='l' then
nstate <= write_l;
else
nstate <= wait_ack_l;
end if;
when wait_ack_l =>
if ack_i ='l' then
nstate<= write_l;
else
nstate <= wait_ack_l;
end if;
when wait_l =>
resetCnt <= '1' ;
nstate <= read_l;
when read_l=>
88


stb_o<='1'; run<='1'
if dataGood ='0' then
nstate <= error_l;
elsif count4095 ='l' then
nstate <= success_l;
elsif ack_i='l' then
nstate <=read_l;
else
nstate <= wait_ack_2
end if;
when wait_ack_2 =>
if ack_i ='l' then
nstate <= read_l;
else
nstate <= wait_ack_2
end if;
when error_l =>
nstate<=error_l;
when success_l =>
nstate <= success_l;
end case;
end process;
process (clk_i)
89


begin
if rising_edge (clk_i) then
if rst_i = '1' then
pstate <= init;
else
pstate <= nstate;
end if;
end if;
end process;
dataGood <='l' when (unsigned(countMinusl) =
unsigned(dat_i)) else 'O';
count4095 <='1' when count = x"0FFF" else 'O';
process(clk_i)
begin
if rising_edge (clk_i) then
countMinusl<= x"0" & count(11 downto 0);
end if;
end process;
process (clk_i)
begin
if rising_edge (clk_i) then
if resetCnt = '1' then
90


count <= (others =>'0');
elsif run='1' then
count <=
std_logic_vector(unsigned(count)+ to_unsigned(1, 16))
else
count<=count;
end if;
end if;
end process;
end Behavioral;
----------------RAM_4k_x_16---------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.ALL;
entity ram_4k_x_16 is
port(stb_i, we_i, clk_i, rst_i:in std_logic;
dat_i:in std_logic_vector(15 downto 0);
dat_o:out std_logic_vector(15 downto 0);
adr_i:in std_logic_vector(11 downto 0);
ack_o: out std_logic);
end ram_4k x 16;
architecture Behavioral of ram_4k_x_16 is
91


type ram_type is array (4095 downto 0) of
std_logic_vector(15 downto 0);
signal ram:ram_type;
begin
ack_o <= stb_i;
process(clk_i)
begin
if rising_edge(clk_i) then
if (stb_i and we_i and not rst_i) ='1' then
ram(to_integer(unsigned(adr_i)))<=dat_i;
end if;
dat_o<=ram(to_integer(unsigned(adr_i)));
end if;
end process;
end Behavioral;
VHDL Code 5: Point-to-Point with 2 Master to 1 Slave
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.ALL;
entity Master_RWTest_4k is
port(clk_i, rst_i:in std_logic;
92


stb_o, we_o:out std_logic;
dat_i:in std_logic_vector(15 downto 0);
dat_o:out std_logic_vector(15 downto 0);
adr_o:out std_logic_vector(11 downto 0);
ack_i: in std_logic;
sel_o : OUT std_logic_vector(1 downto 0));
end Master_RWTest_4k;
architecture Behavioral of Master_RWTest_4k is
signal count,countMinusl: std_logic_vector (15 downto
0) ;
signal run, count4095, resetCnt: std_logic;
type state_type is (init, write_l, wait_l,
wait_ack_l, read_l, error_l,wait_ack_2,
success_l);
signal pstate, nstate: state_type;
signal dataGood :std_logic;
type secondStateType is (s0,sl);
signal pState2,nState2:secondStateType;
signal sel_o_internal:std_logic_vector(1 downto 0);
begin
sel_o<=sel_o_internal ;
process(pstate2,pstate)
93


begin
sel_o_internal<="00";
case pstate2 is
when sO =>
sel_o_Internal<="01
if pstate=success_l
nstate2<=sl;
else
nstate2<=s0;
end if;
when sl=>
sel_o_Internal<="10
nstate2<=sl;
end case;
end process;
process(clk_i)
begin
if rising_edge(clk_i) then
pState2<=nState2;
end if;
end process;
adr_o<=count(11 downto 0);
then
94


dat_o<= std_logic_vector( count);
process(pstate, count4095,dataGood, ack_i)
begin
--default output values
stb_o <='0';we_o <='0';resetCnt<='0';run<='0';
case pstate is
when init =>
resetCnt <='!';
nstate <= write_l;
when write_l =>
stb_o<='1'; we_o<='1';run<='1';
if count4095 ='l' then
nstate <= wait_l;
elsif ack_i='l' then
nstate <= write_l;
else
nstate <= wait_ack_l;
end if;
when wait_ack_l =>
if ack_i ='1' then
nstate<= write_l;
else
95


nstate <= wait_ack_l;
end if;
when wait_l =>
resetCnt <= '1' ;
nstate <= read_l;
when read_l=>
stb_o<='1'; run<=' 1' ;
if dataGood ='0' then
nstate <= error_l;
elsif count4095 ='l' then
nstate <= success_l;
elsif ack_i='l' then
nstate <=read_l;
else
nstate <= wait_ack_2;
end if;
when wait_ack_2 =>
if ack_i ='1' then
nstate <= read_l;
else
nstate <= wait_ack_2;
end if;
when error_l =>
96


nstate<=error_l;
when success_l =>
if sel_o_Internal="10"
nstate <= success.
else
resetCnt <='1';
nstate <= write_l
end if;
end case;
end process;
process (clk_i)
begin
if rising_edge (clk_i) then
if rst__i = ' 1 then
pstate <= init;
else
pstate <= nstate;
end if;
end if;
end process;
then
l;
97


dataGood <='l' when (unsigned(countMinusl) =
unsigned(dat_i)) else 'O';
count4095 <='1' when count = x"0FFF" else 'O';
process(clk_i)
begin
if rising_edge (clk_i) then
countMinusl<= x"0" & count(11 downto 0);
end if;
end process;
process (clk_i)
begin
if rising_edge (clk_i) then
if resetCnt = '1' then
count <= (others =>'0');
elsif run='l' then
count <=
std_logic_vector(unsigned(count)+ to_unsigned(1, 16))
else
count<=count;
end if;
end if;
98


end process;
use IEEE.STD_LOGIC_1164.ALL;
end Behavioral;
-----------------RAM_4k_x_16--------------------
LIBRARY ieee;
use IEEE.numeric_std.ALL;
entity ram_4k_x_16 is
port(stb_i, we_i, clk_i, rst_i:in std__logic;
dat_i:in std_logic_vector(15 downto 0);
dat_o:out std_logic_vector(15 downto 0);
adr_i:in std_logic_vector(11 downto 0);
ack_o: out std_logic;
sel_i : IN std_logic);
end ram_4k_x_16;
architecture Behavioral of ram_4k_x_16 is.
type ram_type is array (4095 downto 0) of
std_logic_vector(15 downto 0);
signal ram:ram_type;
begin
ack_o <= stb_i;
process(clk_i)
begin
99