Lesson 10: Reset, Registers, and Counters in Verilog

10.1 Introduction to Reset, Registers, and Counters

Introduction to Reset, Registers, and Counters

In digital design, three building blocks appear in virtually every FPGA project: resets, registers, and counters. Before writing complex state machines or communication protocols, a designer must have a solid understanding of how these primitives behave, how they are described in Verilog, and how synthesis tools map them onto FPGA hardware resources.

This chapter introduces each concept from first principles and progressively builds toward practical, production-ready design patterns used in real FPGA development.

Register

What Is a Register?

A register is a storage element that captures and holds a binary value on the active edge of a clock signal. In FPGA devices, registers are implemented using dedicated flip-flops (FFs) embedded throughout the logic fabric.

In Verilog, a register is inferred whenever a variable of type reg is assigned inside an always @(posedge clk) block. The synthesis tool recognizes the edge-triggered assignment and maps it to a physical flip-flop on the target device.

A minimal D flip-flop in Verilog looks like this:

module dff (
    input  wire clk,
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin
        q <= d;
    end
endmodule

Notice the use of the non-blocking assignment <= inside the always block. This is the correct operator for sequential (clocked) logic and is essential for correct simulation and synthesis behavior.

Reset

Counter

What Is a Counter?

A counter is a register whose stored value increments (or decrements) by a fixed amount on each active clock edge. Counters are among the most frequently used circuits in digital design and appear in applications such as:

Generating timing delays and periodic events
Addressing memory locations sequentially
Controlling loop iterations in hardware state machines
Baud rate generation in communication interfaces (UART, SPI, I²C)
PWM signal generation for motor and LED control

A counter is structurally just a register with its output fed back into an adder. In Verilog, this feedback is expressed naturally within the always block:

module counter_4bit (
    input  wire       clk,
    input  wire       rst_n,
    output reg  [3:0] count
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= 4'b0000;
        else
            count <= count + 4'b0001;
    end
endmodule

How These Concepts Relate

Resets, registers, and counters are not independent topics — they form a hierarchy of abstraction:

Flip-Flop (FF) — the physical primitive provided by the FPGA fabric. A single-bit storage cell clocked by the system clock.

Register — one or more flip-flops grouped together to store a multi-bit value, usually sharing a common clock and reset signal.

Counter — a register augmented with combinational feedback logic (an adder or subtractor) that modifies its own value on each clock cycle.

Reset — a mechanism applied at the flip-flop level that initializes all registers and counters to a known state at startup or on demand.

Understanding this hierarchy helps designers reason about timing, area usage, and reliability of their FPGA implementations.

FPGA Hardware Context

On Intel MAX-10 FPGAs (used in this course with the Terasic DE10-Lite board), each Logic Element (LE) contains one dedicated flip-flop. The Quartus Prime synthesis tool automatically infers flip-flops from your Verilog code and maps them to these hardware resources.

Key facts relevant to this chapter:

MAX-10 flip-flops support both synchronous and asynchronous clear (reset) natively — choosing the right reset style affects how efficiently the LE is utilized.
Multi-bit registers consume one LE per bit. A 16-bit register requires at least 16 LEs.
Counters synthesize efficiently because Quartus recognizes the increment pattern and applies carry-chain optimization across adjacent LEs.
The RTL Viewer and Technology Map Viewer in Quartus Prime are valuable tools for verifying that your Verilog code infers the hardware structure you intend.

A Note on the initial Block and Register Initialization

Designers familiar with simulation often ask whether the initial block can be used to set a register's starting value in synthesized designs. The answer depends on the target technology and requires careful understanding.

On FPGAs

On FPGAs (Including Intel MAX-10)

Most modern FPGA synthesis tools — including Quartus Prime, Vivado, and ISE — do recognize initial blocks for register initialization. The specified value is programmed into the flip-flop's configuration bitstream and applied at power-up.

reg [7:0] count;

initial begin
    count = 8'hFF;   // Quartus will honor this as the power-up value
end

However, there are important limitations:

The initial block is not a reset — it sets the register value only once at power-up or FPGA configuration. It is not re-applied when a reset signal is asserted during normal operation.
It does not replace a proper reset circuit. If the board is reset via a reset pin rather than a full power cycle, the initial value will not be re-applied.
Behavior can vary between FPGA families and tool versions. Always verify the inferred hardware using the RTL Viewer or Technology Map Viewer in Quartus Prime.

On ASICs

Recommended Practice

Even on FPGAs, always use an explicit reset signal rather than relying on an initial block for register initialization:

// Recommended: explicit reset, works correctly in all contexts
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        count <= 8'hFF;
    else
        count <= count + 1;
end

// Avoid in synthesizable RTL: simulation-only habit
initial begin
    count = 8'hFF;
end

The following table summarizes support across different contexts:

Context	`initial` Supported?	Recommended?
Icarus Verilog (simulation)	✓ Yes	✓ Yes
Quartus Prime / Intel MAX-10	✓ Yes (power-up only)	⚠ Use with caution
Vivado / Xilinx	✓ Yes (power-up only)	⚠ Use with caution
ASIC Synthesis	✗ No	✗ Never

Safe rule for designers: use the initial block only in testbenches, and always use an explicit reset signal in synthesizable RTL. This habit ensures your code is portable, predictable, and professionally correct across both FPGA and ASIC flows.

The reg Keyword: Combinational vs Sequential Logic

One of the most common points of confusion in Verilog is the meaning of the reg keyword. Many designers initially assume that reg always implies a flip-flop — this is incorrect. Understanding the distinction is fundamental to writing correct, synthesizable Verilog.

The Key Insight: `reg` Is a Data Type, Not a Hardware Primitive

In Verilog, reg simply means "a variable that can be assigned inside an always block." It does not automatically infer a flip-flop. What determines whether the synthesized hardware is combinational or sequential is the sensitivity list of the always block, not the reg keyword itself.

Combinational Logi

Combinational Logic Using `reg`

When a reg variable is assigned inside an always @(*) block, the synthesis tool infers combinational logic gates — no flip-flop is created. The output updates immediately whenever any input signal changes.

// reg here → NO flip-flop inferred, purely combinational
always @(*) begin
    if (sel)
        y = a;   // blocking assignment
    else
        y = b;
end

• always @(*) instructs the simulator to re-evaluate the block whenever any signal in the expression changes.

• The blocking assignment = executes immediately, top to bottom, within the same simulation time step.

• The synthesis tool infers a multiplexer, not a flip-flop.

• The output y does not hold its value — it is purely a function of the current inputs.

Sequential Logic

Sequential Logic Using `reg`

When a reg variable is assigned inside an always @(posedge clk) block, the synthesis tool infers a D flip-flop. The output updates only on the active clock edge and holds its value between edges.

// reg here → flip-flop IS inferred
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        q <= 8'h00;   // non-blocking assignment
    else
        q <= d;
end

• always @(posedge clk) instructs the simulator to evaluate the block only on the rising clock edge.

• The non-blocking assignment <= schedules the update to occur at the end of the current simulation time step, correctly modeling concurrent flip-flop behavior.

• The synthesis tool infers a D flip-flop with an asynchronous active-low reset.

• The output q holds its value between clock edges.

The Danger Zone

The Danger Zone: Unintended Latches

If combinational logic is written with an incomplete conditional statement — for example, an if without a matching else — the synthesizer is forced to infer a latch. A latch is a level-sensitive storage element that is neither a clean combinational gate nor a proper flip-flop, and it typically indicates a design error.

// Latch inferred — missing else branch!
always @(*) begin
    if (en)
        y = d;
    // When en = 0, what is y? The tool must hold the value → latch
end

// Correct — output is defined in every branch, no latch
always @(*) begin
    if (en)
        y = d;
    else
        y = 8'h00;   // explicit default eliminates the latch
end

Quartus Prime will issue a warning whenever a latch is inferred. Treat all unintended latch warnings as errors and resolve them before proceeding.

Side-by-Side Comparison

	Combinational	Sequential
Sensitivity list	`always @(*)`	`always @(posedge clk)`
Assignment operator	`=` (blocking)	`<=` (non-blocking)
Hardware inferred	Multiplexer/logic gates	Flip-flop (register)
Output updates when	Any input changes	Clock edge arrives
Holds value?	✗ No	✓ Yes
Clock needed?	✗ No	✓ Yes

Design rule: the reg keyword tells Verilog how the variable is driven — the sensitivity list tells the synthesizer what hardware to build. Always use always @(*) with blocking = for combinational logic, and always @(posedge clk) with non-blocking <= for sequential logic. Mixing these conventions is one of the most frequent sources of simulation-synthesis mismatch encountered in FPGA design.

10.2 Reset Fundamentals: Synchronous vs Asynchronous

Reset Fundamentals: Synchronous vs Asynchronous

Reset is one of the most fundamental — and most frequently underestimated — mechanisms in digital system design. A poorly designed reset strategy can cause a system to start from an illegal state after power-up, fail to recover from runtime errors, or even result in bus lockups and damage to external devices. This section examines the behavioral differences, synthesis implications, and practical use cases of synchronous and asynchronous reset from first principles.

Why Is Reset Necessary?

Flip-flops on an FPGA power up in an undefined state. Although some FPGAs — including the Intel MAX-10 — allow the power-up value of each flip-flop to be programmed into the configuration bitstream, this only applies at the moment of power-up and does not address the following runtime scenarios:

A runtime error has occurred and the control logic must be forced back to a known safe state.
An external device requires reinitialization of a communication protocol (e.g., I2C bus reset, UART framing error recovery).
A watchdog timer has detected a system fault and triggered a system-wide reset.
A multi-clock-domain system requires all clock domains to be cleared simultaneously.

An explicit reset signal is therefore a mandatory requirement of any reliable digital system design, not an optional feature.

Synchronous Reset

The defining characteristic of a synchronous reset is that its effect takes place only on the active clock edge. In Verilog, a synchronous reset does not appear in the always sensitivity list — it is treated as an ordinary conditional input:

// Synchronous reset example
module sync_reset_dff (
    input  wire clk,
    input  wire rst_n,   // Active-low synchronous reset
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin      // Only clk in sensitivity list
        if (!rst_n)
            q <= 1'b0;
        else
            q <= d;
    end
endmodule

After synthesis, a synchronous reset is implemented as a 2-to-1 multiplexer inserted before the D input of the flip-flop — not as a connection to the flip-flop's hardware CLR pin. The hardware structure is as follows:

• A 2-to-1 Mux is placed in front of the flip-flop's D input.

• When rst_n = 0, the Mux selects 0 and drives it to the D input.

• When rst_n = 1, the Mux selects the original data signal d.

• The flip-flop's dedicated CLR/PRE pins are not used, regardless of the reset state.

Timing behavior of synchronous reset:

• Even if rst_n is asserted in the middle of a clock cycle, the output q does not change immediately.

• The output is cleared only on the next rising clock edge after reset is asserted.

• The reset pulse width must be longer than one clock period; otherwise, the reset event may be missed entirely.

Asynchronous Reset

The defining characteristic of an asynchronous reset is that the reset takes effect immediately when asserted, without waiting for a clock edge. In Verilog, an asynchronous reset must appear in the always sensitivity list:

// Asynchronous reset example
module async_reset_dff (
    input  wire clk,
    input  wire rst_n,   // Active-low asynchronous reset
    input  wire d,
    output reg  q
);
    always @(posedge clk or negedge rst_n) begin   // rst_n is in the sensitivity list
        if (!rst_n)
            q <= 1'b0;   // Takes effect immediately, no clock required
        else
            q <= d;
    end
endmodule

After synthesis, an asynchronous reset is connected directly to the flip-flop's hardware CLR (Clear) pin — a feature natively supported by every Logic Element (LE) in the Intel MAX-10. No additional Mux logic is required.

• The reset signal drives the flip-flop's CLR pin directly.

• The CLR pin is asynchronous: it responds immediately, independent of the clock.

• No additional LUT resources are consumed compared to the synchronous Mux approach.

Timing behavior of asynchronous reset:

• As soon as rst_n is pulled low, the output q is cleared to 0 after the flip-flop's propagation delay (typically a few nanoseconds).

• The reset takes effect even if no clock edge is present.

• The minimum reset pulse width is determined by the flip-flop's CLR minimum pulse width specification — typically far shorter than one clock period.

The Risk of Asynchronous Reset

The Risk of Asynchronous Reset: Metastability at Deassertion

The primary risk of asynchronous reset occurs not during assertion, but during deassertion — when rst_n transitions from 0 back to 1. If this transition occurs too close to a rising clock edge — violating the flip-flop's recovery time specification — the flip-flop may enter metastability. This can cause individual flip-flops to exit reset at different times, leaving an FSM in an illegal initial state.

The standard solution is a Reset Synchronizer, which preserves the immediate assertion behavior of the asynchronous reset while ensuring that deassertion is synchronized to the clock:

// Reset Synchronizer: asynchronous assert, synchronous deassert
module reset_synchronizer (
    input  wire clk,
    input  wire async_rst_n,    // Asynchronous reset from external source
    output wire sync_rst_n      // Synchronized reset for internal logic
);
    reg meta_ff, sync_ff;

    always @(posedge clk or negedge async_rst_n) begin
        if (!async_rst_n)
            {sync_ff, meta_ff} <= 2'b00;          // Asynchronous assert: clear immediately
        else
            {sync_ff, meta_ff} <= {meta_ff, 1'b1}; // Synchronous deassert: two-stage chain
    end

    assign sync_rst_n = sync_ff;
endmodule

This two-stage synchronizer chain eliminates the metastability risk during deassertion. It is the industry-standard reset circuit and should be included in any design that uses asynchronous reset.

When to Use Each Reset Style

Choosing between synchronous and asynchronous reset is an engineering decision driven by design requirements, not personal preference. The following guidelines identify the appropriate choice for each scenario.

Use Synchronous Reset When:

Designing datapath modules — pipeline registers, data buffers, DSP accumulators, and similar modules do not require a reset in the absence of a clock. Synchronous reset integrates cleanly into Static Timing Analysis (STA), allowing the tool to fully characterize the reset path delay.
The reset signal source may contain glitches — synchronous reset naturally filters out reset pulses narrower than one clock period, making it more robust when the reset source is noisy or asynchronous to the system clock.
Designing purely for FPGA with no ASIC portability requirement — Quartus Prime optimizes the LUT-Mux structure for synchronous reset efficiently, and the resulting design is straightforward to analyze with timing tools.
Implementing counters and sequence generators — these modules are typically reset by internal control logic (e.g., when the count reaches a terminal value), which is inherently a synchronous operation.

Use Asynchronous Reset When:

Designing system control FSMs (CPU datapath controller, bus arbiter) — if the control logic enters an illegal state, it must be cleared immediately without waiting for the next clock edge.
Implementing communication protocol controllers (I2C, UART, SPI, CAN) — protocol errors such as framing errors, bus lockups, and arbitration loss require recovery within nanoseconds. A synchronous reset delay could result in a protocol violation.
Handling power-on reset (POR) — the system clock is not yet stable at power-up. Only an asynchronous reset can initialize flip-flops before the clock is running.
Working with multi-clock-domain designs — a global reset must clear all clock domains simultaneously. Asynchronous reset guarantees this; synchronous reset cannot, because each domain responds only on its own clock edge.
Integrating with commercial IP cores (ARM Cortex-M, AXI, PCIe) — most commercial IP uses asynchronous reset interfaces conforming to the AMBA specification. Synchronous reset in custom modules requires additional reset bridging logic.
Watchdog timer-triggered system resets — a watchdog fires precisely because the system (including possibly the clock) has malfunctioned. Asynchronous reset is the only reliable choice.

Practical Case Studies

Case 1: UART Receive Controller (Asynchronous Reset)

The UART receiver FSM detects the start bit, samples data bits, and verifies the stop bit. Line noise can drive the FSM into an illegal state, locking up the entire receive process. An external reset signal must be able to clear the FSM immediately:

// UART RX FSM — Asynchronous reset
// Protocol errors require immediate recovery; cannot wait for a clock edge
module uart_rx_fsm (
    input  wire clk,
    input  wire rst_n,       // Asynchronous reset from system Reset Synchronizer
    input  wire rx,
    output reg  frame_error
);
    localparam IDLE  = 2'd0;
    localparam START = 2'd1;
    localparam DATA  = 2'd2;
    localparam STOP  = 2'd3;

    reg [1:0] state;
    reg [2:0] bit_cnt;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            state       <= IDLE;    // Immediately return to safe state
            bit_cnt     <= 3'd0;
            frame_error <= 1'b0;
        end else begin
            case (state)
                IDLE:  if (!rx) state <= START;
                START: state <= DATA;
                DATA:  begin
                    bit_cnt <= bit_cnt + 1;
                    if (bit_cnt == 3'd7) state <= STOP;
                end
                STOP:  begin
                    frame_error <= !rx;
                    state <= IDLE;
                end
            endcase
        end
    end
endmodule

Case 2: I2C Bus Arbitration Controller (Asynchronous Reset)

In the I2C protocol, if the master loses bus arbitration, it must release SDA and SCL immediately. Holding the bus beyond the allowed window locks the entire I2C bus. The reset response must occur well within one SCL period (10 µs at 100 kHz standard mode):

// I2C Master Arbitration Control — Asynchronous reset
// The bus must be released immediately upon arbitration loss
module i2c_arb_ctrl (
    input  wire clk,
    input  wire rst_n,       // Asynchronous reset
    input  wire arb_lost,    // Arbitration lost flag
    output reg  sda_oe,      // SDA output enable (0 = release bus)
    output reg  scl_oe       // SCL output enable
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            sda_oe <= 1'b0;   // Release SDA immediately
            scl_oe <= 1'b0;   // Release SCL immediately
        end else begin
            if (arb_lost) begin
                sda_oe <= 1'b0;
                scl_oe <= 1'b0;
            end
        end
    end
endmodule

Case 3: Pipelined Multiplier Stage Register (Synchronous Reset)

Intermediate pipeline registers in a DSP datapath only need to be initialized at system startup. There is no requirement to reset them while the clock is stopped. Synchronous reset allows the STA tool to fully characterize the reset path and eliminates the timing uncertainty associated with asynchronous deassertion:

// Pipeline Multiplier Stage 1 Register — Synchronous reset
// Datapath registers do not require emergency clearing;
// synchronous reset provides cleaner timing analysis
module pipe_mult_stage1 (
    input  wire        clk,
    input  wire        rst_n,    // Synchronous reset
    input  wire [15:0] a,
    input  wire [15:0] b,
    output reg  [31:0] product_r
);
    always @(posedge clk) begin     // Only clk in sensitivity list
        if (!rst_n)
            product_r <= 32'd0;
        else
            product_r <= a * b;
    end
endmodule

Case 4: Baud Rate Generator Counter (Synchronous Reset)

The counter inside a baud rate generator is cleared by internal logic when it reaches its terminal count. This is inherently a synchronous operation, making synchronous reset the natural and correct choice:

// Baud Rate Generator Counter — Synchronous reset
// Counter rollover is a synchronous operation; synchronous reset is most appropriate
module baud_gen #(
    parameter CLK_FREQ  = 50_000_000,
    parameter BAUD_RATE = 115200,
    localparam DIVIDER  = CLK_FREQ / BAUD_RATE - 1
) (
    input  wire clk,
    input  wire rst_n,
    output reg  baud_tick
);
    reg [$clog2(DIVIDER)-1:0] count;

    always @(posedge clk) begin    // Synchronous reset
        if (!rst_n) begin
            count     <= 'd0;
            baud_tick <= 1'b0;
        end else if (count == DIVIDER) begin
            count     <= 'd0;
            baud_tick <= 1'b1;
        end else begin
            count     <= count + 1;
            baud_tick <= 1'b0;
        end
    end
endmodule

Comparison Summary

Criteria	Synchronous Reset	Asynchronous Reset
Trigger condition	Rising clock edge + rst_n = 0	rst_n = 0 (immediate)
Sensitivity list	`always @(posedge clk)`	`always @(posedge clk or negedge rst_n)`
Synthesized hardware	Mux before D input (uses LUT)	Directly drives FF CLR pin (no extra LUT)
Effective when clock is stopped	✗ No	✓ Yes
Power-on reset support	✗ Unreliable	✓ Reliable
Reset response time	Up to one full clock period	Immediate (nanoseconds)
Glitch filtering	✓ Natural filtering	✗ Requires external filtering
Multi-clock-domain consistency	✗ Each domain responds independently	✓ All domains cleared simultaneously
Static Timing Analysis (STA)	✓ Fully supported by tools	⚠ Requires recovery/removal constraints
Metastability risk	✗ None	⚠ Requires Reset Synchronizer at deassertion
IP interoperability	⚠ Depends on IP specification	✓ Conforms to AMBA/AXI standard
Typical applications	Datapath, counters, DSP pipelines	Control FSMs, protocol controllers, POR

Design Rules Summary

The following four principles guide reset strategy selection in professional FPGA design:

Rule 1: Use asynchronous reset for control logic; use synchronous reset for datapath logic.
Control modules (FSMs, protocol controllers) must recover to a safe state under any condition, including clock failure. Datapath modules only require initialization at startup, and synchronous reset produces cleaner timing analysis results.

Rule 2: Always pair asynchronous reset with a Reset Synchronizer.
The industry-standard pattern is "asynchronous assert, synchronous deassert." This preserves the immediate clearing capability of the asynchronous reset while guaranteeing stable, glitch-free deassertion synchronized to the clock.

Rule 3: Use active-low reset consistently throughout the entire design.
The naming convention rst_n is the industry standard. The hardware CLR pin of the Intel MAX-10 flip-flop is also active-low. Maintaining this convention throughout the design eliminates polarity confusion errors.

Rule 4: Verify that the reset pulse width meets device specifications.
A synchronous reset pulse must be wider than one clock period. An asynchronous reset pulse must meet the flip-flop's minimum CLR pulse width requirement. Both values can be found in the Intel MAX-10 Device Handbook.

10.3 Reset Best Practices in FPGA Design

Reset Best Practices in FPGA Design

The previous section established the behavioral and structural differences between synchronous and asynchronous reset. This section translates that theory into concrete, actionable design rules for professional FPGA development. Following these practices consistently will produce reset networks that are reliable, timing-clean, and maintainable across the full project lifecycle.

Active-Low vs Active-High Reset

A reset signal can be designed to assert on a logic low (active-low, rst_n = 0 triggers reset) or on a logic high (active-high, rst = 1 triggers reset). Both are functionally valid, but the industry standard — and the recommendation for this course — is active-low reset for the following reasons:

Hardware alignment: The dedicated CLR pin of the Intel MAX-10 flip-flop is natively active-low. Using active-low reset in RTL maps directly to this hardware pin without requiring an inverter, saving logic resources and avoiding additional propagation delay.
Fail-safe behavior: An undriven or floating reset line defaults to logic low (0), asserting the reset and holding the system in a safe, initialized state. With an active-high reset, a floating line would release the reset and allow the system to run from an undefined state.
Industry convention: The naming suffix _n (e.g., rst_n, nreset) is universally recognized as an active-low signal. Most commercial IP cores, AMBA bus interfaces, and reference designs follow this convention.

The following example demonstrates both polarities side by side. Note that the only difference is the condition used to check the reset state:

// Active-low reset (recommended)
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)          // Reset is active when rst_n = 0
        q <= 8'h00;
    else
        q <= d;
end

// Active-high reset (avoid unless required by IP interface)
always @(posedge clk or posedge rst) begin
    if (rst)             // Reset is active when rst = 1
        q <= 8'h00;
    else
        q <= d;
end

Design rule: adopt active-low reset as the project-wide standard. If an external IP core uses active-high reset at its interface, add a single inversion at the boundary rather than changing the internal reset polarity of your design.

Reset Synchronizer Design

As established in Section 10.2.4, asynchronous reset deassertion can cause metastability if it occurs too close to a clock edge. The solution is a Reset Synchronizer that implements the industry-standard pattern of asynchronous assert and synchronous deassert.

Why Two Stages?

A single flip-flop synchronizer reduces but does not eliminate the probability of metastability. The two-stage chain reduces the probability to an acceptable level for virtually all practical designs. Adding a third stage provides only marginal improvement and is generally not required for clock frequencies below 500 MHz.

// Standard two-stage Reset Synchronizer
// Place one instance per clock domain in the design
module reset_synchronizer (
    input  wire clk,
    input  wire async_rst_n,   // Raw asynchronous reset input
    output wire sync_rst_n     // Synchronized reset output for this clock domain
);
    // Declare as registers with no initial value — reset handles initialization
    reg stage1_ff;
    reg stage2_ff;

    always @(posedge clk or negedge async_rst_n) begin
        if (!async_rst_n) begin
            stage1_ff <= 1'b0;   // Asynchronous assert: both stages clear immediately
            stage2_ff <= 1'b0;
        end else begin
            stage1_ff <= 1'b1;          // Deassert propagates through stage 1
            stage2_ff <= stage1_ff;     // Then through stage 2 on next cycle
        end
    end

    assign sync_rst_n = stage2_ff;
endmodule

The behavior of this circuit is as follows:

• Assert (rst_n goes low): Both flip-flops are cleared immediately via the asynchronous CLR path. The output sync_rst_n goes low within nanoseconds regardless of the clock state.

• Deassert (rst_n goes high): The value 1'b1 propagates through stage1_ff on the first clock edge, then through stage2_ff on the second. The output sync_rst_n is guaranteed to be stable and to synchronize with the clock edge.

• Metastability handling: If stage1_ff enters metastability during deassertion, it has one full clock period to resolve before its output is sampled by stage2_ff. The probability that metastability persists for a full clock period is negligibly small at typical FPGA operating frequencies.

Multi-Clock-Domain Reset Distribution

In a design with multiple clock domains, each clock domain requires its own independent Reset Synchronizer instance. All synchronizer instances share the same external async_rst_n input but are clocked by their respective domain clocks:

// Top-level reset distribution for a two-clock-domain design
module top (
    input wire clk_50m,
    input wire clk_100m,
    input wire ext_rst_n     // Single external reset button
);
    wire rst_n_50m;    // Synchronized reset for the 50 MHz domain
    wire rst_n_100m;   // Synchronized reset for the 100 MHz domain

    // Independent synchronizer for each clock domain
    reset_synchronizer u_rst_sync_50m (
        .clk         (clk_50m),
        .async_rst_n (ext_rst_n),
        .sync_rst_n  (rst_n_50m)
    );

    reset_synchronizer u_rst_sync_100m (
        .clk         (clk_100m),
        .async_rst_n (ext_rst_n),
        .sync_rst_n  (rst_n_100m)
    );

    // Each sub-module receives the reset synchronized to its own clock domain
    // uart_ctrl u_uart (.clk(clk_50m),  .rst_n(rst_n_50m),  ...);
    // dsp_core  u_dsp  (.clk(clk_100m), .rst_n(rst_n_100m), ...);

endmodule

Reset Fan-out Management

A reset signal that must drive hundreds or thousands of flip-flops simultaneously creates a high-fan-out net. If this net is routed through the standard interconnect fabric, the routing delay can become so large that some flip-flops receive the reset signal significantly later than others — violating timing and potentially causing inconsistent reset behavior across the design.

Using Quartus Global Signals

Intel MAX-10 provides dedicated Global Signal routing resources that distribute a signal to every logic element on the device with minimal and uniform skew. Reset signals are ideal candidates for global routing. Quartus Prime automatically promotes high-fan-out nets to global signals in most cases, but this can also be specified explicitly in the .qsf assignment file:

# Quartus QSF: promote rst_n to a global signal
set_instance_assignment -name GLOBAL_SIGNAL GLOBAL_CLOCK -to rst_n

After compilation, verify the reset routing by opening the Chip Planner in Quartus Prime and confirming that rst_n is listed as a Global Signal in the Resource Usage Summary.

When to Insert Reset Buffers Manually

In very large designs where the global signal resources are already consumed by clock signals, manual reset buffering may be required. This involves inserting a buffer tree between the Reset Synchronizer output and the downstream logic. However, for the scale of designs in this course (Intel MAX-10, DE10-Lite), automatic global signal promotion is sufficient and manual buffering is not necessary.

Reset Constraints in Quartus Prime

Static Timing Analysis (STA) must be correctly configured to handle reset signals. Without proper constraints, the timing tool may either report false violations on the asynchronous reset path or — more dangerously — fail to check the recovery/removal timing altogether.

Verifying Reset Inference in RTL Viewer

Before applying timing constraints, confirm that Quartus has correctly inferred the intended reset hardware:

• Open Tools → Netlist Viewers → RTL Viewer after compilation.

• For synchronous reset: verify that a Mux is present before the D input of the flip-flop. The reset signal should appear as a Mux select input, not connected to the CLR pin.

• For asynchronous reset: verify that the reset signal connects directly to the CLR pin of the flip-flop symbol, with no Mux on the D path.

• Open Tools → Netlist Viewers → Technology Map Viewer (Post-Fitting) to confirm the same structure after place-and-route.

SDC Timing Constraints for Asynchronous Reset

The asynchronous reset path from the Reset Synchronizer input (async_rst_n) to the CLR pins of the downstream flip-flops is not a standard data path. It must be declared as a false path in the SDC constraints file to prevent the timing tool from reporting incorrect violations:

# SDC constraints file (.sdc)
# Declare the raw asynchronous reset input as a false path
# The Reset Synchronizer handles the timing; no setup/hold check is needed here
set_false_path -from [get_ports {ext_rst_n}] -to [all_registers]

The recovery and removal times of the Reset Synchronizer's internal flip-flops are checked automatically by the timing tool because they are clocked elements. No additional SDC entry is required for the synchronizer itself.

Checking Recovery and Removal Times

After applying the SDC constraints, verify the asynchronous reset timing as follows:

• Open Tools → Timing Analyzer.

• Run the Report Recovery and Report Removal analyses.

• Recovery time is the minimum time the reset must be deasserted before the next clock edge for the flip-flop to exit reset reliably.

• Removal time is the minimum time the reset must remain asserted after a clock edge to guarantee that the reset is recognized.

• Both values must show a positive slack. A negative slack indicates a timing violation that must be resolved before the design is considered timing-clean.

Common Reset Design Errors

The following errors are common in practice and among the most difficult to diagnose because they often cause intermittent, non-reproducible failures rather than obvious, immediate faults.

Error 1: Reset Polarity Inversion

Mixing active-low and active-high reset signals without explicit inversion at the boundary is a common source of a system that never initializes correctly. The symptom is typically that the design works normally when the reset button is not pressed, but fails or locks up when it is pressed.

// ERROR: rst_n is active-low but is used without inversion
// This module is always in reset when it should be running
always @(posedge clk or negedge rst_n) begin
    if (rst_n)       // Bug: should be (!rst_n)
        q <= 8'h00;
    else
        q <= d;
end

// CORRECT: check the inverted value for active-low reset
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        q <= 8'h00;
    else
        q <= d;
end

Error 2: Asynchronous Reset Crossing Clock Domains Without a Synchronizer

Connecting a raw asynchronous reset directly from one clock domain to registers in another clock domain without a Reset Synchronizer is one of the most serious reset design errors. The symptom is intermittent failure at power-up or after reset release, particularly at higher operating frequencies or across temperature and voltage corners.

// ERROR: raw async reset driven directly into a different clock domain
module domain_b (
    input wire clk_b,
    input wire raw_rst_n,   // This signal is asynchronous to clk_b — DANGEROUS
    output reg q
);
    always @(posedge clk_b or negedge raw_rst_n) begin
        if (!raw_rst_n) q <= 1'b0;
        else            q <= ~q;
    end
endmodule

// CORRECT: pass raw_rst_n through a Reset Synchronizer clocked by clk_b first
// reset_synchronizer u_sync (.clk(clk_b), .async_rst_n(raw_rst_n), .sync_rst_n(rst_n_b));
module domain_b (
    input wire clk_b,
    input wire rst_n_b,     // Synchronized reset — safe to use
    output reg q
);
    always @(posedge clk_b or negedge rst_n_b) begin
        if (!rst_n_b) q <= 1'b0;
        else          q <= ~q;
    end
endmodule

Error 3: Reset Pulse Too Narrow

Error 4: Using Reset Inside a Combinational always @(*) Block

Reset is a sequential concept — it initializes storage elements. It has no meaning inside a purely combinational always @(*) block. Placing a reset check inside combinational logic infers an unintended latch or produces incorrect simulation behavior.

// ERROR: reset check inside combinational always block — infers latch
always @(*) begin
    if (!rst_n)
        y = 8'h00;
    else if (sel)
        y = a;
    // Missing else: latch inferred for the case where rst_n=1 and sel=0
end

// CORRECT: reset belongs in the clocked always block
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        y <= 8'h00;
    else if (sel)
        y <= a;
    else
        y <= b;
end

Error 5: FSM Has No Default State After Reset

An FSM that does not define a valid initial state in its reset condition may power up or restart in an unencoded state — one that does not correspond to any defined state in the case statement. This causes the FSM to remain inactive or behave unpredictably until an external stimulus forces it into a known state.

// ERROR: reset does not initialize the state register
// After reset, 'state' is undefined — FSM behavior is unpredictable
always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
        output_reg <= 8'h00;   // Data registers cleared, but state is not set
    end else begin
        case (state)
            // ...
        endcase
    end
end

// CORRECT: always initialize the state register explicitly in the reset branch
always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
        state      <= IDLE;    // FSM enters a defined, valid state after reset
        output_reg <= 8'h00;
    end else begin
        case (state)
            IDLE: // ...
            // ...
        endcase
    end
end

Best Practices Summary

The following table consolidates all reset best practices covered in this section:

Practice	Rule
Reset polarity	Use active-low reset (rst_n) throughout the design. Invert at IP boundaries only.
Reset Synchronizer	Always use a two-stage Reset Synchronizer for asynchronous reset. One instance per clock domain.
Assert / Deassert pattern	Asynchronous assert, synchronous deassert. Never deassert an asynchronous reset without clock synchronization.
Fan-out management	Use Quartus Global Signal routing for the reset net. Verify promotion in the Resource Usage Summary.
SDC constraints	Apply set_false_path to the raw asynchronous reset input. Check recovery and removal slack in Timing Analyzer.
RTL verification	Confirm correct reset inference using RTL Viewer and Technology Map Viewer after every compilation.
FSM initialization	Always assign the state register to a defined initial state in the reset branch. Never leave state undefined after reset.
Reset pulse width	Verify that the reset pulse is wider than the minimum required by the slowest clock domain in the reset network.
Reset in combinational logic	Never place reset checks inside always @(*) blocks. Reset belongs exclusively in clocked always @(posedge clk) blocks.

10.4 Register Design Patterns (Single, Enable, Reset, Shift)

Register Design Patterns (Single, Enable, Reset, Shift)

A register is the fundamental storage primitive of every synchronous digital system. In practice, registers are rarely used in their bare form — they are almost always augmented with control signals such as reset, clock enable, or shift capability to meet the requirements of the surrounding logic. This section presents six progressive register design patterns, from the simplest single-bit flip-flop to the full-featured shift register, each with complete Verilog implementations and their corresponding synthesized hardware structures on the Intel MAX-10 FPGA.

Single-Bit Register (Basic D Flip-Flop)

The most fundamental register is a single D flip-flop — a one-bit storage element that captures its input on every rising clock edge and holds that value until the next active edge. It is the building block from which all other register patterns are derived.

// Single-bit D flip-flop
// Captures input d on every rising clock edge
module dff_single (
    input  wire clk,
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin
        q <= d;
    end
endmodule

On the Intel MAX-10, this maps directly to one Logic Element (LE) using its embedded flip-flop. The LUT portion of the LE is unused for this pattern.

Typical applications:

• Pipeline stage separator between two combinational logic blocks

• Single-bit flag storage (interrupt pending, status bit)

• Input signal registration to eliminate combinational glitches

Register with Synchronous Reset

Adding a synchronous reset gives the register a deterministic initial state that is applied on the next rising clock edge after the reset signal is asserted. The reset condition must appear as the first branch in the if statement to establish its highest priority.

// Register with synchronous active-low reset
// Reset takes effect on the next rising clock edge after rst_n is asserted
module dff_sync_rst (
    input  wire clk,
    input  wire rst_n,   // Active-low synchronous reset
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin   // Only clk in sensitivity list
        if (!rst_n)
            q <= 1'b0;            // Reset has highest priority
        else
            q <= d;
    end
endmodule

Synthesized hardware structure:

• A 2-to-1 Mux is inserted before the flip-flop's D input. When rst_n = 0, the Mux drives 0 to the D input; when rst_n = 1, it passes the data signal d.

• The flip-flop's dedicated CLR pin is not used. The reset is implemented entirely in the LUT logic feeding the D input.

• One LE is consumed: one LUT (for the Mux) plus one flip-flop.

Typical applications:

• Pipeline datapath registers where reset only occurs at startup

• DSP accumulators and intermediate result registers

• Registers driven by internal control logic where the clock is always active

Register with Asynchronous Reset

An asynchronous reset clears the register immediately when asserted, without waiting for the next clock edge. As established in Sections 10.2 and 10.3, this pattern should always be used together with a Reset Synchronizer at the top level to ensure safe deassertion behavior.

// Register with asynchronous active-low reset
// Reset takes effect immediately when rst_n is asserted, regardless of clk
module dff_async_rst (
    input  wire clk,
    input  wire rst_n,   // Active-low asynchronous reset
    input  wire d,
    output reg  q
);
    always @(posedge clk or negedge rst_n) begin   // rst_n in sensitivity list
        if (!rst_n)
            q <= 1'b0;   // Immediate clear — no clock edge required
        else
            q <= d;
    end
endmodule

Synthesized hardware structure:

• The reset signal connects directly to the flip-flop's hardware CLR pin — a feature natively supported by every LE in the Intel MAX-10.

• No LUT logic is consumed for the reset path. Only one flip-flop resource is used, making this the most area-efficient reset pattern.

• Compared to the synchronous reset pattern, this implementation saves one LUT input per bit.

Typical applications:

• All control FSM state registers

• Communication protocol controllers (UART, SPI, I2C)

• Any register that must be cleared at power-up or on watchdog timeout

10.4.4 Register with Clock Enable

A clock enable signal (en) allows the register to selectively ignore clock edges — the flip-flop only captures its input when en = 1. When en = 0, the output holds its current value.

// Register with clock enable
// Captures input d only when en = 1; holds current value when en = 0
module dff_enable (
    input  wire clk,
    input  wire en,    // Clock enable
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin
        if (en)
            q <= d;
        // Implicit else: q retains its value when en = 0
    end
endmodule

Important: Never Gate the Clock Signal Directly

A common mistake among designers new to FPGA design is to control when a register captures data by gating the clock signal itself. This is a serious design error that must be avoided:

// WRONG: gating the clock with combinational logic
// This creates a glitchy, skewed clock — never do this in FPGA design
wire gated_clk = clk & en;   // Dangerous: combinational glitch on clock

always @(posedge gated_clk) begin
    q <= d;
end

// CORRECT: use a clock enable signal inside the always block
// The flip-flop clock is always the clean system clock
always @(posedge clk) begin
    if (en)
        q <= d;
end

Clock gating with combinational logic introduces glitches onto the clock network, which causes the flip-flop to trigger at unpredictable times and produces simulation-synthesis mismatches. On Intel MAX-10, the correct approach is to use the en signal inside the always block, which the synthesis tool maps directly to the LE's dedicated Clock Enable (ENA) input pin — a zero-LUT, zero-skew hardware feature available on every flip-flop in the device.

Typical applications:

• Registers that should only update when valid data is available

• Baud rate sampling — capture RX data only on the correct sampling clock tick

• Power optimization — disable unnecessary register toggling

• Write-enable controlled configuration registers

10.4.5 Register with Enable and Reset (Full-Featured Register)

Combining reset and clock enable produces the most commonly used register pattern in real FPGA designs. The priority ordering is critical: reset must always take precedence over enable. If enable were checked first, a disabled register could not be reset — leaving it in an unknown state after system initialization.

Version A: Asynchronous Reset with Enable (Recommended for Control Logic)

// Full-featured register: asynchronous reset + clock enable
// Priority: reset > enable > hold
module dff_async_rst_en (
    input  wire clk,
    input  wire rst_n,   // Active-low asynchronous reset (highest priority)
    input  wire en,      // Clock enable
    input  wire d,
    output reg  q
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)      // Reset: immediate, unconditional
            q <= 1'b0;
        else if (en)     // Enable: capture data only when enabled
            q <= d;
        // Implicit else: hold current value when en = 0
    end
endmodule

Version B: Synchronous Reset with Enable (Recommended for Datapath Logic)

// Full-featured register: synchronous reset + clock enable
// Priority: reset > enable > hold
module dff_sync_rst_en (
    input  wire clk,
    input  wire rst_n,   // Active-low synchronous reset (highest priority)
    input  wire en,      // Clock enable
    input  wire d,
    output reg  q
);
    always @(posedge clk) begin   // Only clk in sensitivity list
        if (!rst_n)      // Reset: cleared on next clock edge
            q <= 1'b0;
        else if (en)     // Enable: capture data only when enabled
            q <= d;
        // Implicit else: hold current value when en = 0
    end
endmodule

Synthesized hardware structure (asynchronous version):

• The rst_n signal drives the flip-flop's hardware CLR pin directly.

• The en signal drives the flip-flop's dedicated ENA (Clock Enable) pin directly.

• Both control signals are handled entirely in dedicated LE hardware — no LUT logic is consumed for either. This is the most resource-efficient implementation of a controlled register on the MAX-10.

Typical applications:

• Configuration registers in peripheral controllers

• Data registers in FIFO write/read pointers

• Accumulator registers with conditional update logic

• Any register requiring both initialization guarantee and selective update

10.4.6 Shift Register

A shift register is a chain of flip-flops in which the output of each stage connects to the input of the next. On each clock edge, the stored bit pattern shifts one position along the chain. Shift registers are categorized by their input and output mode — serial or parallel — giving four standard types.

Type 1: SISO — Serial-In Serial-Out

The simplest shift register. Data enters one bit at a time and exits one bit at a time after propagating through the full chain. The primary application is a fixed-length delay line — the output is exactly N clock cycles behind the input.

// SISO Shift Register — 8-stage delay line
// Output is the input delayed by 8 clock cycles
module siso_shift_reg #(
    parameter DEPTH = 8
) (
    input  wire clk,
    input  wire rst_n,
    input  wire s_in,    // Serial input
    output wire s_out    // Serial output (delayed by DEPTH cycles)
);
    reg [DEPTH-1:0] shift_reg;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            shift_reg <= {DEPTH{1'b0}};
        else
            shift_reg <= {shift_reg[DEPTH-2:0], s_in};  // Shift left, insert at LSB
    end

    assign s_out = shift_reg[DEPTH-1];   // Output from MSB (oldest bit)
endmodule

Typical applications:

• Fixed-latency pipeline delay compensation

• Edge detection (compare current and delayed versions of a signal)

• Pseudo-Random Bit Sequence (PRBS) generator with feedback taps

Type 2: SIPO — Serial-In Parallel-Out

Data enters one bit per clock cycle (serial) and the full N-bit word is available simultaneously at the output (parallel) after N clock cycles. This is the core mechanism of a serial receiver.

// SIPO Shift Register — 8-bit serial receiver
// Converts a serial bit stream into an 8-bit parallel word
module sipo_shift_reg (
    input  wire       clk,
    input  wire       rst_n,
    input  wire       s_in,     // Serial input (LSB first)
    output reg  [7:0] p_out     // Parallel output — full 8-bit word
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            p_out <= 8'h00;
        else
            p_out <= {p_out[6:0], s_in};   // Shift in from LSB
    end
endmodule

Typical applications:

• UART receiver — assembles 8 serial data bits into one parallel byte

• SPI receiver — shifts in MISO bits and presents the full word to the data bus

• I2C byte receiver

Type 3: PISO — Parallel-In Serial-Out

A parallel N-bit word is loaded into the register in one clock cycle, then shifted out one bit per clock cycle. This is the core mechanism of a serial transmitter. A load control signal selects between loading new parallel data and shifting out existing data.

// PISO Shift Register — 8-bit serial transmitter
// Loads an 8-bit parallel word and shifts it out one bit at a time
module piso_shift_reg (
    input  wire       clk,
    input  wire       rst_n,
    input  wire       load,     // 1 = load parallel data; 0 = shift out
    input  wire [7:0] p_in,     // Parallel data input
    output wire       s_out     // Serial output (MSB first)
);
    reg [7:0] shift_reg;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            shift_reg <= 8'h00;
        else if (load)
            shift_reg <= p_in;              // Load parallel data in one cycle
        else
            shift_reg <= {shift_reg[6:0], 1'b0};  // Shift left, output MSB
    end

    assign s_out = shift_reg[7];   // MSB is transmitted first
endmodule

Typical applications:

• UART transmitter — loads a byte and shifts out each bit at the baud rate

• SPI transmitter — loads a word and shifts out MOSI bits on each SCK edge

• LED driver chain (e.g., 74HC595 serial interface)

Type 4: PIPO — Parallel-In Parallel-Out

All bits are loaded and read simultaneously. While this does not shift data in the traditional sense, it forms the basis of a standard multi-bit pipeline register — the most commonly instantiated register type in FPGA datapath design.

// PIPO Register — 8-bit pipeline stage register
// Captures all 8 input bits simultaneously on the clock edge
module pipo_register (
    input  wire       clk,
    input  wire       rst_n,
    input  wire       en,       // Clock enable
    input  wire [7:0] d,        // Parallel data input
    output reg  [7:0] q         // Parallel data output
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            q <= 8'h00;
        else if (en)
            q <= d;
    end
endmodule

Typical applications:

• Pipeline stage separators in multi-stage arithmetic units

• Input/output port registers in bus interfaces

• Data holding registers in memory-mapped peripheral controllers

10.4.7 Design Pattern Summary

The following table provides a quick reference for all six register patterns covered in this section:

Pattern	Sensitivity List	Control Signals	Synthesized Hardware	Typical Use Case
Single-Bit DFF	posedge clk	None	1 FF, 0 LUT	Signal registration, pipeline separator
Synchronous Reset	posedge clk	rst_n	1 FF + Mux (LUT)	Datapath registers, DSP pipeline
Asynchronous Reset	posedge clk or negedge rst_n	rst_n (CLR pin)	1 FF, 0 LUT	Control FSM state registers
Clock Enable	posedge clk	en (ENA pin)	1 FF, 0 LUT	Selective capture, write-enable registers
Enable + Async Reset	posedge clk or negedge rst_n	rst_n, en	1 FF, 0 LUT	Peripheral config registers, FIFO pointers
SISO Shift Register	posedge clk or negedge rst_n	rst_n	N FFs, 0 LUT	Delay line, edge detection, PRBS
SIPO Shift Register	posedge clk or negedge rst_n	rst_n	N FFs, 0 LUT	UART/SPI receiver, serial-to-parallel
PISO Shift Register	posedge clk or negedge rst_n	rst_n, load	N FFs + Mux (LUT)	UART/SPI transmitter, parallel-to-serial
PIPO Register	posedge clk or negedge rst_n	rst_n, en	N FFs, 0 LUT	Pipeline stages, bus interface registers

Design rule: always verify the synthesized hardware structure using the RTL Viewer and Technology Map Viewer in Quartus Prime after compilation. Confirm that control signals are mapped to the correct dedicated LE hardware pins (CLR for reset, ENA for clock enable) rather than being absorbed into LUT logic unnecessarily.

10.5 Parameterized Registers (parameter / localparam)

All register examples in Section 10.4 used fixed bit-widths. A 1-bit flip-flop, an 8-bit register, a 16-bit shift register — each was written as a separate, non-reusable module. In professional FPGA design, this approach is impractical. A parameterized module is written once and instantiated at any width, depth, or configuration required by the surrounding design, without modifying the source code.

10.5.1 Syntax Quick Reference

This section assumes familiarity with the parameter, localparam, and `define constructs introduced in Lesson 05. The table below provides a brief recap of their key differences for quick reference. For full syntax details and scope rules, refer back to Lesson 05.

Construct	Scope	Overridable at Instantiation?	Typical Use
`define	Global (entire compilation)	✗ No	Global constants, conditional compilation
parameter	Module-level	✓ Yes — via #( ) at instantiation	Module configuration (width, depth, value)
localparam	Module-level	✗ No — internal constant only	Derived constants computed from parameters

The key design rule for this section is:

• Use parameter for values that the instantiating module must be able to configure — such as data width, register depth, or reset value.

• Use localparam for constants derived from those parameters — such as address bit-width calculated from depth using $clog2. These must not be overridden externally because they would break the internal logic.

• Avoid `define for module-level configuration. Its global scope makes it unsuitable for per-instance customization.

10.5.2 Parameterized N-bit Register

The first application is to rewrite the full-featured register from Section 10.4.5 as a parameterized module. The data width is exposed as a parameter so that any instantiating module can specify the exact width it needs without modifying the register source file.

// Parameterized N-bit register with asynchronous reset and clock enable
// Default width is 8 bits; override at instantiation as needed
module reg_n #(
    parameter WIDTH     = 8,              // Data width in bits
    parameter RST_VALUE = {WIDTH{1'b0}}   // Reset value — default all zeros
) (
    input  wire             clk,
    input  wire             rst_n,
    input  wire             en,
    input  wire [WIDTH-1:0] d,
    output reg  [WIDTH-1:0] q
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            q <= RST_VALUE;
        else if (en)
            q <= d;
    end
endmodule

This single module replaces every fixed-width register variant from Section 10.4. The width and reset value are both configurable at instantiation time.

Instantiation Examples

// 8-bit register with default reset value (all zeros)
reg_n #(
    .WIDTH     (8),
    .RST_VALUE (8'h00)
) u_data_reg (
    .clk   (clk),
    .rst_n (rst_n),
    .en    (wr_en),
    .d     (data_in),
    .q     (data_out)
);

// 16-bit register with non-zero reset value
reg_n #(
    .WIDTH     (16),
    .RST_VALUE (16'hFFFF)
) u_ctrl_reg (
    .clk   (clk),
    .rst_n (rst_n),
    .en    (ctrl_wr_en),
    .d     (ctrl_in),
    .q     (ctrl_out)
);

// 32-bit register — no source code changes required
reg_n #(
    .WIDTH (32)
) u_acc_reg (
    .clk   (clk),
    .rst_n (rst_n),
    .en    (acc_en),
    .d     (acc_in),
    .q     (acc_out)
);

10.5.3 Using localparam for Derived Constants

When a module contains internal constants that are mathematically derived from its parameters, those constants must be computed inside the module using localparam. This guarantees that derived values remain consistent with the primary parameter regardless of the width chosen at instantiation.

A common example is a parameterized shift register where the internal counter that tracks the shift position must be wide enough to count up to DEPTH - 1. The required counter width is $clog2(DEPTH) — the ceiling log base 2 of the depth, which gives the minimum number of bits needed to represent that count.

// Parameterized SISO shift register using localparam for derived width
module siso_n #(
    parameter DEPTH = 8    // Shift register depth (number of pipeline stages)
) (
    input  wire clk,
    input  wire rst_n,
    input  wire s_in,
    output wire s_out
);
    // localparam: derived from DEPTH, cannot be overridden externally
    // $clog2(DEPTH) gives the number of bits needed to index DEPTH stages
    localparam ADDR_W = $clog2(DEPTH);

    reg [DEPTH-1:0] shift_reg;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            shift_reg <= {DEPTH{1'b0}};
        else
            shift_reg <= {shift_reg[DEPTH-2:0], s_in};
    end

    assign s_out = shift_reg[DEPTH-1];
endmodule

If DEPTH is changed from 8 to 32 at instantiation, ADDR_W automatically recalculates from 3 to 5. No internal constants need to be manually updated.

Common localparam Patterns

// Common localparam derivation patterns used in register and counter design

// Pattern 1: counter bit-width from maximum count value
parameter  MAX_COUNT = 255;
localparam CNT_W     = $clog2(MAX_COUNT + 1);   // 8 bits for 0..255

// Pattern 2: byte count from total bit-width
parameter  DATA_W    = 32;
localparam BYTE_CNT  = DATA_W / 8;              // 4 bytes for 32-bit data

// Pattern 3: FSM state encoding width from number of states
parameter  NUM_STATES = 6;
localparam STATE_W    = $clog2(NUM_STATES);     // 3 bits for 6 states

// Pattern 4: terminal count for a divider counter
parameter  CLK_FREQ   = 50_000_000;
parameter  BAUD_RATE  = 115_200;
localparam DIVIDER    = CLK_FREQ / BAUD_RATE - 1;
localparam DIV_W      = $clog2(DIVIDER + 1);

10.5.4 Parameterized PIPO Register Bank

A register bank consists of multiple independently addressable registers sharing a common clock and reset. Parameterizing both the data width and the number of registers produces a fully reusable memory-mapped register block suitable for peripheral controller designs.

// Parameterized register bank
// NUM_REGS individually addressable registers, each WIDTH bits wide
module reg_bank #(
    parameter WIDTH    = 8,     // Data width of each register
    parameter NUM_REGS = 4      // Number of registers in the bank
) (
    input  wire                       clk,
    input  wire                       rst_n,
    input  wire                       wr_en,              // Write enable
    input  wire [$clog2(NUM_REGS)-1:0] addr,             // Register address
    input  wire [WIDTH-1:0]           wr_data,           // Write data
    output reg  [WIDTH-1:0]           rd_data            // Read data
);
    // localparam: address width derived from number of registers
    localparam ADDR_W = $clog2(NUM_REGS);

    // Register array: NUM_REGS registers, each WIDTH bits wide
    reg [WIDTH-1:0] regs [0:NUM_REGS-1];

    integer i;

    // Write port: synchronous write with asynchronous reset
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            for (i = 0; i < NUM_REGS; i = i + 1)
                regs[i] <= {WIDTH{1'b0}};   // Clear all registers on reset
        end else if (wr_en) begin
            regs[addr] <= wr_data;
        end
    end

    // Read port: combinational (asynchronous) read
    always @(*) begin
        rd_data = regs[addr];
    end

endmodule

Instantiation Example

// 16-bit wide bank of 8 registers — suitable for a simple peripheral CSR block
reg_bank #(
    .WIDTH    (16),
    .NUM_REGS (8)
) u_csr_bank (
    .clk     (clk),
    .rst_n   (rst_n),
    .wr_en   (csr_wr_en),
    .addr    (csr_addr),
    .wr_data (csr_wr_data),
    .rd_data (csr_rd_data)
);

10.5.5 Parameterized PISO Shift Register (Practical Example)

To demonstrate how parameterization applies to the shift register patterns from Section 10.4, the PISO transmitter is rewritten here as a fully parameterized module. Both the data width and the shift direction (MSB-first or LSB-first) are configurable at instantiation.

// Parameterized PISO shift register
// Configurable width and shift direction
// MSB_FIRST = 1: transmit MSB first (SPI default)
// MSB_FIRST = 0: transmit LSB first (UART default)
module piso_n #(
    parameter WIDTH     = 8,   // Data width
    parameter MSB_FIRST = 1    // Shift direction: 1 = MSB first, 0 = LSB first
) (
    input  wire             clk,
    input  wire             rst_n,
    input  wire             load,      // 1 = load parallel data; 0 = shift
    input  wire [WIDTH-1:0] p_in,      // Parallel data input
    output wire             s_out      // Serial output
);
    reg [WIDTH-1:0] shift_reg;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            shift_reg <= {WIDTH{1'b0}};
        else if (load)
            shift_reg <= p_in;
        else begin
            if (MSB_FIRST)
                shift_reg <= {shift_reg[WIDTH-2:0], 1'b0};  // Shift left
            else
                shift_reg <= {1'b0, shift_reg[WIDTH-1:1]};  // Shift right
        end
    end

    assign s_out = MSB_FIRST ? shift_reg[WIDTH-1] : shift_reg[0];

endmodule

Instantiation Examples

// SPI transmitter: 8-bit, MSB first
piso_n #(
    .WIDTH     (8),
    .MSB_FIRST (1)
) u_spi_tx (
    .clk   (clk),
    .rst_n (rst_n),
    .load  (spi_load),
    .p_in  (tx_byte),
    .s_out (mosi)
);

// UART transmitter: 8-bit, LSB first
piso_n #(
    .WIDTH     (8),
    .MSB_FIRST (0)
) u_uart_tx (
    .clk   (clk),
    .rst_n (rst_n),
    .load  (uart_load),
    .p_in  (tx_byte),
    .s_out (tx_serial)
);

10.5.6 Best Practices for Parameterized Design

The following practices apply to all parameterized modules in FPGA design:

• Always provide meaningful default values for every parameter. The default should represent the most commonly used configuration so that the module can be instantiated with minimal overrides for typical use cases.

• Never use magic numbers inside parameterized logic. Every constant that depends on a parameter must be derived from that parameter using localparam or a parameter expression. Hard-coded numbers inside parameterized logic will produce incorrect behavior when the parameter is changed.

• Use $clog2 for all address and counter width calculations. Manual calculation of bit-widths is a maintenance hazard — if the depth or count changes, the bit-width must be recalculated. Using $clog2 makes this automatic and error-free.

• Use named parameter assignment at instantiation. Always use the .PARAM_NAME(value) syntax rather than positional assignment. Named assignment is self-documenting and immune to ordering errors when the parameter list changes.

• Verify the synthesized bit-widths in the Compilation Report. After compiling a parameterized module in Quartus Prime, check the Resource Usage Summary and RTL Viewer to confirm that the synthesized register widths and counter widths match the intended parameter values.

• Add parameter range comments to document valid ranges. Verilog does not enforce parameter value constraints at compile time. Document the valid range of each parameter in a comment immediately above its declaration so that designers instantiating the module know the supported configuration space.

// Example: well-documented parameter declarations with range comments
module reg_n #(
    parameter WIDTH     = 8,             // Data width: 1 to 64 bits
    parameter RST_VALUE = {WIDTH{1'b0}}  // Reset value: must fit within WIDTH bits
) (
    // ...
);
    // Internal consistency check using localparam
    // WIDTH must be at least 1; RST_VALUE is automatically sized
    localparam ZERO = {WIDTH{1'b0}};
    // ...
endmodule

10.6 Multi-bit Registers

The register patterns introduced in Section 10.4 focused on single instances of fixed-width registers. In practice, designs frequently require collections of registers — arrays of storage elements that share a common clock and reset but are individually addressable or operated on as a group. This section covers the declaration, access, and synthesis of multi-bit register structures, from simple packed arrays to full register file designs.

10.6.1 Packed vs Unpacked Arrays

Verilog supports two distinct array declaration styles that are often confused with each other. Understanding the difference is essential for writing correct, synthesizable RTL and for predicting how Quartus Prime will map the declarations to hardware.

Packed Arrays

A packed array declares multiple bits as a single multi-bit variable. The bit dimensions appear to the left of the variable name. The entire packed array is treated as one contiguous register and is synthesized as a single group of flip-flops sharing a common clock and reset.

// Packed array examples
reg [7:0]  data_byte;     // One 8-bit register (8 flip-flops as one unit)
reg [15:0] data_word;     // One 16-bit register
reg [31:0] data_dword;    // One 32-bit register

// Packed array: entire variable assigned at once
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        data_byte <= 8'h00;
    else
        data_byte <= data_in;
end

// Packed array: individual bit access
wire msb = data_byte[7];          // Single bit select
wire [3:0] upper = data_byte[7:4]; // Part select (upper nibble)
wire [3:0] lower = data_byte[3:0]; // Part select (lower nibble)

Unpacked Arrays

An unpacked array declares multiple separate variables of the same type. The array dimensions appear to the right of the variable name. Each element is an independent register and is accessed individually using an index. Unpacked arrays are the standard way to declare register arrays and memory structures in synthesizable Verilog.

// Unpacked array examples
reg        flags [0:7];      // 8 separate 1-bit registers
reg [7:0]  regs  [0:15];    // 16 separate 8-bit registers
reg [31:0] mem   [0:255];   // 256 separate 32-bit registers (register file / RAM)

// Unpacked array: individual element access using an index
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        regs[0] <= 8'h00;       // Reset one specific register
    else if (wr_en)
        regs[addr] <= wr_data;  // Write to register selected by addr
end

// Unpacked array: read access is combinational
assign rd_data = regs[addr];

Note that Verilog does not allow a single assignment statement to initialize or clear an entire unpacked array. Each element must be accessed individually, typically using a for loop inside an always block:

// Correct: use a for loop to reset all elements of an unpacked array
integer i;
always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
        for (i = 0; i < 16; i = i + 1)
            regs[i] <= 8'h00;   // Reset each register individually
    end else if (wr_en) begin
        regs[addr] <= wr_data;
    end
end

Side-by-Side Comparison

Property	Packed Array	Unpacked Array
Dimension position	Left of variable name	Right of variable name
Treated as	Single multi-bit variable	Collection of independent variables
Whole-array assignment	data <= 8'h00 ✓ Allowed	✗ Not allowed — use for loop
Individual bit access	data[3], data[7:4]	regs[i], regs[i][3:0]
Synthesized as	Single register (flip-flop group)	Register array (multiple flip-flop groups)
Quartus inference	Logic Elements (FFs)	Logic Elements or Distributed RAM
Typical use	Data bus, status/control register	Register file, lookup table, small RAM

10.6.2 Multi-bit Register Operations

Several Verilog operators are specifically useful when working with multi-bit registers. Understanding these operators allows complex register manipulations to be expressed concisely and synthesized efficiently.

Bit-Select and Part-Select

reg [15:0] status_reg;

// Bit-select: access a single bit by index
wire tx_busy  = status_reg[0];   // Bit 0: TX busy flag
wire rx_ready = status_reg[1];   // Bit 1: RX ready flag
wire error    = status_reg[7];   // Bit 7: error flag

// Part-select: access a contiguous range of bits
wire [3:0] irq_flags  = status_reg[11:8];   // Bits 11..8: interrupt flags
wire [3:0] mode_field = status_reg[15:12];  // Bits 15..12: mode selection

// Part-select on left-hand side: write to a specific field
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        status_reg <= 16'h0000;
    else if (irq_wr_en)
        status_reg[11:8] <= irq_data;   // Update only the IRQ field
end

Concatenation

The concatenation operator {} joins multiple signals or constants into a single wider bus. It is one of the most frequently used operators in register and shift register design.

reg [7:0] high_byte, low_byte;
reg [15:0] combined;

// Concatenation: join two 8-bit registers into one 16-bit register
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        combined <= 16'h0000;
    else
        combined <= {high_byte, low_byte};  // high_byte becomes [15:8], low_byte [7:0]
end

// Concatenation in shift register: insert new bit at LSB, discard MSB
reg [7:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        shift_reg <= 8'h00;
    else
        shift_reg <= {shift_reg[6:0], serial_in};  // Shift left, insert at bit 0
end

// Concatenation to swap byte order (endian conversion)
wire [15:0] swapped = {combined[7:0], combined[15:8]};

Replication

The replication operator {N{x}} repeats a signal or constant N times. It is commonly used to initialize multi-bit registers and to perform sign extension.

// Replication for register initialization
reg [31:0] wide_reg;
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        wide_reg <= {32{1'b0}};   // Equivalent to 32'h00000000
    else
        wide_reg <= data_in;
end

// Replication for sign extension: extend an 8-bit signed value to 16 bits
wire [7:0]  signed_byte = 8'hA5;           // 1010_0101 = -91 in two's complement
wire [15:0] sign_extended = {{8{signed_byte[7]}}, signed_byte};
// signed_byte[7] = 1 (negative), so upper 8 bits = 8'hFF
// Result: 16'hFFA5

// Replication in parameterized reset
parameter WIDTH = 16;
reg [WIDTH-1:0] param_reg;
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        param_reg <= {WIDTH{1'b0}};   // Works for any WIDTH value
    else
        param_reg <= d;
end

10.6.3 Register Arrays

A register array is an unpacked array of multi-bit registers, declared and accessed as described in Section 10.6.1. It forms the basis of any addressable storage structure: configuration register banks, lookup tables, and small on-chip memories.

// Parameterized register array with synchronous write and asynchronous read
// Models a small block of addressable registers (e.g., peripheral CSR block)
module reg_array #(
    parameter DATA_W   = 8,     // Data width of each register
    parameter NUM_REGS = 16,    // Number of registers
    localparam ADDR_W  = $clog2(NUM_REGS)   // Address width
) (
    input  wire                clk,
    input  wire                rst_n,
    // Write port
    input  wire                wr_en,
    input  wire [ADDR_W-1:0]   wr_addr,
    input  wire [DATA_W-1:0]   wr_data,
    // Read port
    input  wire [ADDR_W-1:0]   rd_addr,
    output wire [DATA_W-1:0]   rd_data
);
    // Register array declaration
    reg [DATA_W-1:0] regs [0:NUM_REGS-1];

    integer i;

    // Synchronous write with asynchronous reset
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            for (i = 0; i < NUM_REGS; i = i + 1)
                regs[i] <= {DATA_W{1'b0}};   // Clear all registers on reset
        end else if (wr_en) begin
            regs[wr_addr] <= wr_data;
        end
    end

    // Asynchronous (combinational) read
    assign rd_data = regs[rd_addr];

endmodule

Quartus Synthesis Inference for Register Arrays

Quartus Prime infers different hardware structures for register arrays depending on the access pattern:

• Logic Elements (Flip-Flops): inferred when the array is small, has multiple write ports, or has access patterns that prevent RAM inference. Every bit occupies one dedicated flip-flop in an LE.

• Distributed RAM (MLAB): inferred when the array has one synchronous write port and one or more asynchronous read ports, and the size is within the MLAB block size on the target device. This is more area efficient than flip-flop inference for larger arrays.

• Block RAM (M9K): inferred when the array is large and both write and read are synchronous. For the register arrays in this section, asynchronous read is preferred to prevent unintended Block RAM inference.

Always verify the inferred hardware using the Resource Usage Summary and RTL Viewer in Quartus Prime after compilation to confirm whether the array was mapped to flip-flops or to RAM resources.

10.6.4 Register File Design

A register file is a structured collection of registers with one or more independent read and write ports, designed to be accessed by index (address). It is the central storage element of a processor datapath — the general-purpose registers of a CPU are implemented as a register file. Understanding register file design is therefore a prerequisite for any CPU or ALU implementation in later chapters.

The standard register file configuration for a simple RISC-style datapath has:

• One synchronous write port — data is written on the rising clock edge when write enable is asserted

• Two asynchronous read ports — two source operands can be read simultaneously in the same cycle, without waiting for a clock edge

• Register 0 hardwired to zero — a common convention in RISC architectures (MIPS, RISC-V) where register 0 always reads as zero regardless of writes

// Dual-read, single-write Register File
// Typical use: RISC processor general-purpose register bank
// Register 0 is hardwired to zero (writes to reg 0 are ignored)
module register_file #(
    parameter DATA_W  = 32,    // Data width (e.g., 32-bit RISC architecture)
    parameter NUM_REG = 32,    // Number of registers (e.g., 32 for MIPS/RISC-V)
    localparam ADDR_W = $clog2(NUM_REG)
) (
    input  wire                clk,
    input  wire                rst_n,

    // Write port
    input  wire                wr_en,       // Write enable
    input  wire [ADDR_W-1:0]   wr_addr,     // Write address (destination register)
    input  wire [DATA_W-1:0]   wr_data,     // Write data

    // Read port A (source operand 1, e.g., rs1)
    input  wire [ADDR_W-1:0]   rd_addr_a,
    output wire [DATA_W-1:0]   rd_data_a,

    // Read port B (source operand 2, e.g., rs2)
    input  wire [ADDR_W-1:0]   rd_addr_b,
    output wire [DATA_W-1:0]   rd_data_b
);
    // Register storage array
    reg [DATA_W-1:0] rf [0:NUM_REG-1];

    integer i;

    // Synchronous write port with asynchronous reset
    // Register 0 is never written — it is permanently zero
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            for (i = 0; i < NUM_REG; i = i + 1)
                rf[i] <= {DATA_W{1'b0}};
        end else if (wr_en && (wr_addr != {ADDR_W{1'b0}})) begin
            rf[wr_addr] <= wr_data;   // Write to any register except register 0
        end
    end

    // Asynchronous read ports
    // Register 0 always returns zero regardless of stored value
    assign rd_data_a = (rd_addr_a == {ADDR_W{1'b0}}) ? {DATA_W{1'b0}} : rf[rd_addr_a];
    assign rd_data_b = (rd_addr_b == {ADDR_W{1'b0}}) ? {DATA_W{1'b0}} : rf[rd_addr_b];

endmodule

Read-During-Write Behavior

When a read and a write occur to the same address in the same clock cycle, the behavior depends on whether the read is synchronous or asynchronous:

• Asynchronous read (this implementation): the read returns the old value stored before the write takes effect. The new value written on this clock edge will be available on the next cycle.

• Synchronous read: the read returns either the old or new value depending on the implementation — this must be explicitly specified and verified. Synchronous read also prevents asynchronous MLAB inference and may result in Block RAM inference instead.

For a processor datapath, asynchronous read with old-value behavior is the standard choice because it allows the read and write to be fully decoupled in the pipeline timing.

10.6.5 Common Mistakes with Multi-bit Registers

Mistake 1: Whole-Array Assignment to an Unpacked Array

Assigning a constant or expression directly to an entire unpacked array is not supported in synthesizable Verilog. Each element must be individually assigned, typically inside a for loop.

// ERROR: whole-array assignment — not supported in synthesis
reg [7:0] regs [0:15];
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        regs <= 0;   // Illegal: cannot assign to an unpacked array as a whole
end

// CORRECT: use a for loop to reset each element individually
integer i;
always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
        for (i = 0; i < 16; i = i + 1)
            regs[i] <= 8'h00;
    end
end

Mistake 2: Out-of-Range Array Index

Accessing an unpacked array with an index that exceeds the declared range produces undefined behavior in simulation and may cause incorrect synthesis results. Always ensure that the index signal is wide enough to address all elements but narrow enough to prevent out-of-range access.

// Potential issue: addr may be wider than needed, allowing out-of-range access
reg [7:0]  regs [0:15];   // 16 registers, valid index: 0..15
reg [7:0]  addr;          // 8-bit address — can represent 0..255, but only 0..15 are valid

// If addr > 15, behavior is undefined
assign rd_data = regs[addr];   // Risk: addr could be 16..255

// CORRECT: use a localparam-derived address width to prevent over-indexing
localparam NUM_REGS = 16;
localparam ADDR_W   = $clog2(NUM_REGS);  // 4 bits: can represent 0..15 exactly

reg [7:0]          regs [0:NUM_REGS-1];
reg [ADDR_W-1:0]   addr;   // 4-bit address: physically cannot exceed 15

assign rd_data = regs[addr];   // Safe: addr is bounded by its width

Mistake 3: Packed/Unpacked Dimension Confusion

Swapping the packed and unpacked dimensions produces a structurally different declaration that synthesizes differently and is accessed differently. This is a common source of subtle bugs that are difficult to trace.

// These two declarations look similar but are completely different:

reg [7:0] A [0:3];   // Unpacked: 4 separate 8-bit registers
                     // Access: A[0], A[1], A[2], A[3]
                     // A[0] = one 8-bit register; A[0][3] = bit 3 of register 0

reg [3:0] B [0:7];   // Unpacked: 8 separate 4-bit registers
                     // Access: B[0]..B[7]
                     // B[0] = one 4-bit register; B[0][3] = MSB of register 0

// DO NOT confuse with:
reg [7:0] C;         // Packed: one single 8-bit register
                     // Access: C[7:0], C[3], C[7:4]
                     // No unpacked dimension — cannot use C[0] as an array index

Mistake 4: Inferring Unintended Block RAM

If both the write and read of a register array are synchronous (clock-edge triggered), Quartus may infer Block RAM (M9K) instead of distributed flip-flops. Block RAM has a one-cycle read latency that can disrupt datapath timing if not accounted for. To prevent unintended Block RAM inference, use asynchronous (combinational) assign statements for the read port as shown in Sections 10.6.3 and 10.6.4.

// Risk of Block RAM inference: both write and read are synchronous
always @(posedge clk) begin
    if (wr_en)
        regs[wr_addr] <= wr_data;
end
always @(posedge clk) begin
    rd_data <= regs[rd_addr];   // Synchronous read — may infer Block RAM
end

// Prevents Block RAM inference: read is asynchronous
always @(posedge clk) begin
    if (wr_en)
        regs[wr_addr] <= wr_data;   // Synchronous write
end
assign rd_data = regs[rd_addr];     // Asynchronous read — infers flip-flops or MLAB

10.6.6 Summary

Structure	Declaration Style	Synthesized As	Typical Use
Packed register	reg [N-1:0] name	N flip-flops (single group)	Data bus, status register, counter
Register array	reg [W-1:0] name [0:N-1]	FFs or MLAB (address-selected)	CSR block, lookup table
Register file	Array + dual read ports	FFs or MLAB	CPU general-purpose registers
Concatenation	{a, b}	Wiring (no logic)	Byte assembly, shift register
Replication	{N{x}}	Wiring (no logic)	Reset initialization, sign extension
Part-select	name[h:l]	Wiring (no logic)	Field extraction, partial update

10.7 Counter Design Fundamentals

Section 10.8 presents a comprehensive set of counter implementations. Before working through those variants, this section establishes the theoretical and practical foundations that apply to every counter design: the hardware model of a counter, the implications of different count encodings, overflow and wrap-around mechanics, timing analysis of the counter critical path, and how to verify counter behavior using Quartus Prime tools. Designers who understand these fundamentals will be able to select, implement, and debug counter circuits with confidence.

10.7.1 What Is a Counter? (Hardware Perspective)

From a hardware perspective, a counter is not a fundamentally new type of circuit — it is a register augmented with a feedback path through combinational arithmetic logic. Every counter consists of exactly three elements:

• A register — stores the current count value across clock cycles. On the Intel MAX-10, each bit of the counter occupies one flip-flop within a Logic Element.

• An arithmetic unit — computes the next count value from the current count value. For a simple binary up counter, this is an adder that computes count + 1.

• A feedback path — connects the register output back to the arithmetic unit input, forming a closed loop. This feedback loop is the structural feature that distinguishes a counter from a plain register.

// The feedback loop is explicit in Verilog:
// count (register output) appears on both the left-hand side (storage)
// and the right-hand side (feedback into adder)
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        count <= {WIDTH{1'b0}};
    else if (en)
        count <= count + 1'b1;   // count feeds back into the adder
        //       ^^^^^
        //       This is the feedback path
end

This feedback loop is clearly visible in the RTL Viewer in Quartus Prime as a wire looping from the register output back to the adder input. Verifying the presence of this loop after synthesis is a reliable way to confirm that the synthesis tool has correctly inferred a counter rather than a combinational function.

Counter vs Register: The Critical Difference

Property	Register	Counter
Next-state logic	External — driven by surrounding logic	Internal — derived from current count value
Feedback path	None	Output feeds back to arithmetic input
Autonomous operation	No — holds value unless driven	Yes — self-advances on every enabled clock edge
RTL Viewer appearance	Linear data flow	Closed feedback loop visible

10.7.2 Binary Encoding vs Other Encodings

The choice of count encoding determines the hardware cost, switching activity, maximum operating frequency, and suitability for specific applications. Three encodings are relevant to FPGA counter design.

Binary Encoding

Binary encoding is the default and most common choice. Each count value is represented as its standard binary equivalent. It is the most area-efficient encoding because the increment operation maps directly to the dedicated carry-chain logic in the Intel MAX-10.

• Area: minimum — N flip-flops for an N-bit counter

• Switching activity: higher bits toggle less frequently than lower bits (bit 0 toggles every cycle, bit N-1 toggles once every 2^(N-1) cycles), but the MSB-to-LSB carry propagation means all bits may transition simultaneously at the rollover point

• Cross-domain safety: not safe — multiple bits can change simultaneously at rollover

• Use when: general-purpose counting, address generation, timing, and all applications where cross-domain transfer is not required

Gray Code Encoding

In Gray code, only one bit changes between consecutive count values. This property eliminates the multi-bit transition problem at rollover and makes Gray-coded counters the standard choice for pointers that must be read across a clock domain boundary.

• Area: N flip-flops plus XOR conversion logic

• Switching activity: minimum — exactly one bit changes per clock cycle at all times

• Cross-domain safety: safe — a single-bit transition sampled across a clock domain boundary always produces a valid adjacent count value

• Use when: asynchronous FIFO read/write pointers, any counter value that must be transferred between clock domains

One-Hot Encoding

In one-hot encoding, exactly one bit is set at any time. An N-state counter requires N flip-flops — one per state — rather than log2(N) flip-flops as in binary. Despite its higher flip-flop count, one-hot encoding often produces faster and simpler decode logic because state detection requires examining only one bit rather than a multi-bit comparison.

• Area: N flip-flops for N states (more than binary)

• Decode logic: trivial — no comparator required, each state output is directly one flip-flop output

• Maximum frequency: often higher than binary for small N because the next-state logic is simpler

• Use when: FSM state registers (Quartus syn_encoding attribute), ring counters, small N where decode speed matters more than flip-flop count

Encoding Comparison Summary

Encoding	Flip-Flops Required	Bits Changed per Step	CDC Safe?	Typical Application
Binary	N	1 to N (varies)	✗ No	General purpose, timing, address
Gray Code	N + XOR	Always exactly 1	✓ Yes	Async FIFO pointers, CDC counters
One-Hot	2^N (one per state)	Always exactly 2	✗ No	FSM states, ring counter, small N

10.7.3 Overflow and Wrap-around Behavior

When a counter reaches its maximum value and increments one more time, the result wraps around. The wrap-around behavior depends on whether the counter uses natural binary overflow or a forced terminal-count reset, and the choice has a direct impact on the amount of logic required.

Natural Wrap-around (Binary Overflow)

A binary counter naturally wraps from its maximum value (2^N - 1) back to zero due to arithmetic overflow — the carry out of the MSB is simply discarded. No comparison logic is needed. This is the most efficient counter structure because the adder already handles the wrap-around implicitly.

// Natural wrap-around: no comparator, just an adder
// 4-bit counter: counts 0, 1, 2, ..., 14, 15, 0, 1, 2, ...
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        count <= 4'b0000;
    else if (en)
        count <= count + 1'b1;   // At 4'b1111 + 1 = 4'b0000 (overflow discarded)
end

Natural wrap-around is only possible when the terminal count value is exactly 2^N - 1 (a power of two minus one). Any other terminal count requires a comparator.

Forced Reset (Modulo-N, Non-Power-of-Two)

When the desired count range is not a power of two (e.g., count 0 to 9 for BCD, or 0 to 433 for a baud rate generator), a comparator must be added to detect the terminal count and force the counter back to zero. This comparator adds logic to the critical path.

// Forced reset at terminal count: comparator required
// Modulo-10 counter: counts 0, 1, 2, ..., 9, 0, 1, 2, ...
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)
        count <= 4'b0000;
    else if (en) begin
        if (count == 4'd9)       // Comparator: detect terminal count
            count <= 4'b0000;    // Forced reset to zero
        else
            count <= count + 1'b1;
    end
end

The comparator (count == 4'd9) adds one level of combinational logic between the register output and the next-state Mux, which increases the critical path delay compared to natural wrap-around.

Hardware Cost Comparison

Wrap-around Type	Terminal Count	Comparator Needed?	Extra Logic	Critical Path Impact
Natural overflow	2^N - 1	✗ No	None	Minimal
Forced reset	Any value N-1	✓ Yes	Comparator + Mux	Increased

10.7.4 Timing Analysis of Counters

The counter is one of the most common sources of critical path violations in FPGA designs. Understanding why, and how Intel MAX-10 mitigates the problem, is essential for designing high-speed counters.

The Critical Path in a Binary Counter

The critical path of a binary counter runs through the carry chain of the adder. In a ripple-carry adder, the carry bit propagates serially from the LSB to the MSB — each stage must wait for the carry from the stage below it before its output is valid. The total combinational delay is therefore proportional to the counter width:

• An 8-bit counter has a carry chain of 8 stages — relatively fast

• A 32-bit counter has a carry chain of 32 stages — significantly slower

• The maximum clock frequency (Fmax) decreases as counter width increases, because a wider adder requires more carry propagation time within one clock period

Intel MAX-10 Carry Chain Optimization

The Intel MAX-10 Logic Element includes a dedicated carry-chain connection that links adjacent LEs in a column. The increment operation of a binary counter maps directly to this carry chain, bypassing the general-purpose routing fabric and achieving much lower carry propagation delay than would be possible with LUT-based logic alone.

Quartus Prime automatically applies this optimization when it recognizes the increment pattern (count + 1 in a clocked always block). The result can be verified by opening the Technology Map Viewer (Post-Fitting) and observing that the counter bits are placed in adjacent LEs within the same LE column, connected by the carry chain rather than by general routing.

Fmax vs Counter Width

Even with carry-chain optimization, a wider counter has a longer critical path. The following guidelines apply to Intel MAX-10 at typical operating conditions:

• Counters up to 16 bits typically meet timing at 50 MHz without any optimization effort.

• Counters of 32 bits may require timing-driven placement or pipelining to meet 100 MHz or higher targets.

• Counters wider than 32 bits should be split into cascaded stages (see Section 10.7.5) to keep the carry chain length within a manageable range for the target clock frequency.

The Comparator as an Additional Critical Path Element

In Modulo-N counters and BCD counters, the comparator that detects the terminal count adds additional combinational delay on top of the carry chain delay. The combined path — carry chain through the adder, then the equality comparator, then the reset Mux — is the new critical path and will have a lower Fmax than a plain binary counter of the same width.

One common optimization is to compute the terminal count detection one cycle early using a registered comparator, and use that registered signal to gate the reset on the following cycle. This breaks the critical path at the cost of one additional flip-flop and a one-cycle latency in the terminal count detection.

// Optimized Modulo-N counter: registered terminal count detection
// tc_reg is registered one cycle before the actual wrap, reducing critical path
module modulo_fast #(
    parameter MODULO = 100,
    localparam WIDTH = $clog2(MODULO)
) (
    input  wire             clk,
    input  wire             rst_n,
    input  wire             en,
    output reg  [WIDTH-1:0] count,
    output reg              tc       // Registered terminal count (one cycle early)
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            count <= {WIDTH{1'b0}};
            tc    <= 1'b0;
        end else if (en) begin
            if (tc) begin
                count <= {WIDTH{1'b0}};   // Reset triggered by registered tc
                tc    <= (MODULO == 1);   // tc stays high only if MODULO = 1
            end else begin
                count <= count + 1'b1;
                tc    <= (count == MODULO - 2);  // Detect terminal-1 one cycle early
            end
        end
    end
endmodule

10.7.5 Counter Cascading for Wide Counters

When a counter wider than 32 bits is required, or when a very large modulus is needed, the design can be split into cascaded stages. The terminal count output of the lower stage drives the enable input of the upper stage, effectively multiplying the count range without creating an excessively long carry chain.

// Cascaded 48-bit counter built from two 24-bit stages
// Lower stage counts 0..2^24-1; upper stage advances once per lower overflow
module counter_48bit (
    input  wire        clk,
    input  wire        rst_n,
    input  wire        en,
    output wire [47:0] count     // Full 48-bit count value
);
    wire [23:0] count_lo, count_hi;
    wire        ovf_lo;           // Overflow of lower 24 bits

    // Lower 24-bit stage: always enabled when en is active
    up_counter #(.WIDTH(24)) u_lo (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (en),
        .count (count_lo),
        .ovf   (ovf_lo)
    );

    // Upper 24-bit stage: advances only when lower stage overflows
    up_counter #(.WIDTH(24)) u_hi (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (ovf_lo),   // Enable from lower stage overflow
        .count (count_hi),
        .ovf   ()          // Upper overflow not used here
    );

    assign count = {count_hi, count_lo};

endmodule

Each stage has its own independent carry chain limited to 24 bits, keeping the critical path short. The enable signal (ovf_lo) between stages is a registered signal (the overflow flag of the lower counter), so it does not add combinational delay to either stage's critical path.

10.7.6 Verifying Counter Behavior in Quartus Prime

After synthesizing a counter module, four verification steps should be performed before proceeding to integration or lab testing.

Step 1: Confirm Feedback Loop in RTL Viewer

• Open Tools → Netlist Viewers → RTL Viewer.

• Locate the counter register. Confirm that a wire loops from the register output back through an adder and into the register input — this is the feedback path that defines a counter.

• If no feedback loop is visible, the counter has been incorrectly inferred as a combinational function. Review the always sensitivity list and ensure that the assignment is non-blocking (<=).

Step 2: Confirm Carry Chain in Technology Map Viewer

• Open Tools → Netlist Viewers → Technology Map Viewer (Post-Fitting).

• Locate the counter bits. Confirm that they are placed in adjacent LEs within the same LE column, connected by the dedicated carry chain (shown as a vertical carry connection between LEs).

• If the counter bits are scattered across non-adjacent LEs, the carry chain optimization has not been applied. Check that the increment expression is written as count + 1'b1 and not as a more complex expression that prevents recognition.

Step 3: Check Fmax in Timing Analyzer

• Open Tools → Timing Analyzer and run Report Fmax Summary.

• Locate the clock domain of the counter. The reported Fmax must exceed the target operating frequency (e.g., 50 MHz for the DE10-Lite) with positive slack.

• If Fmax is below the target, examine the Report Critical Path output to identify whether the bottleneck is the carry chain, the terminal count comparator, or the reset Mux.

Step 4: Simulate Count Sequence in Simulation Waveform Editor

• Open File → New → University Program VWF (Vector Waveform File) or use ModelSim with a testbench.

• Apply a clock signal and assert rst_n low for two to three clock cycles, then deassert.

• Assert en and verify that the count sequence advances correctly, produces the expected terminal count, and wraps around to zero at the correct value.

• Test the boundary conditions explicitly: verify the transition from the terminal count value back to zero, and verify that the counter holds its value when en = 0.

10.7.7 Design Rules Summary

Design Decision	Rule
Encoding choice	Use binary for general-purpose counting. Use Gray code when the counter value crosses a clock domain boundary. Use one-hot only for small FSM-style counters where decode speed is critical.
Wrap-around strategy	Use natural binary overflow when the terminal count is 2^N - 1. Use forced reset with a comparator for all other terminal count values.
Counter width	Use $clog2 to calculate the minimum required bit-width. Cascade stages for counters wider than 32 bits.
Increment expression	Always write the increment as count + 1'b1 to enable carry-chain optimization in Quartus Prime.
Critical path	Verify Fmax in Timing Analyzer. If the comparator is on the critical path, consider registering the terminal count detection one cycle early.
Synthesis verification	Confirm the feedback loop in RTL Viewer and the carry chain placement in Technology Map Viewer after every compilation.
Simulation	Always simulate the full count sequence including the wrap-around transition and the boundary conditions at terminal count ± 1.

10.8 Up, Down, Modulo, and Gray Code Counters

10.8 Up, Down, Modulo-N, Gray Code, and BCD Counters

This section implements the most commonly used counter types in FPGA design. Each counter is presented with a complete, synthesizable Verilog module, an explanation of its internal hardware structure, and the practical applications where it is typically deployed. All examples are parameterized using the conventions established in Section 10.5 and follow the reset best practices from Sections 10.2 and 10.3.

10.8.1 Up Counter

The up counter is the most fundamental sequential circuit in digital design. It increments its stored value by one on every active clock edge and wraps back to zero after reaching its maximum value. The wrap-around occurs naturally due to binary arithmetic overflow — no additional comparison logic is required, making this the most area-efficient counter type.

// Parameterized N-bit Up Counter with asynchronous reset
// Counts from 0 to (2^WIDTH - 1), then wraps back to 0 automatically
module up_counter #(
    parameter WIDTH = 8    // Counter width in bits; range: 1..32
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset
    input  wire             en,      // Count enable
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             ovf      // Overflow flag: pulses high for one cycle at wrap
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (en)
            count <= count + 1'b1;
    end

    // Overflow: asserted when count is at maximum AND enable is active
    assign ovf = en & (&count);   // (&count) = reduction AND = 1 only when all bits are 1
endmodule

Hardware structure:

• The increment logic (count + 1) is synthesized as an adder using the dedicated carry-chain logic within the Intel MAX-10 Logic Elements. Quartus automatically chains adjacent LEs to propagate the carry bit across the full WIDTH, producing a highly efficient ripple-carry or fast-carry adder depending on the device family.

• The overflow signal (ovf) is computed using a reduction AND (&count), which evaluates to 1 only when every bit of count is 1 (i.e., the count has reached 2^WIDTH - 1).

Typical applications:

• Clock divider and baud rate generator base counter

• Memory address sequencer for sequential read/write operations

• Event counter for measuring frequency or pulse count

• Pipeline stage cycle counter

10.8.2 Down Counter

The down counter decrements its stored value on every active clock edge. When the count reaches zero, it wraps back to its maximum value through natural binary underflow. A terminal count (TC) flag is commonly added to signal when the counter has reached zero, making it suitable for timeout and delay generation circuits.

// Parameterized N-bit Down Counter with asynchronous reset
// Counts from (2^WIDTH - 1) down to 0, then wraps back to (2^WIDTH - 1)
module down_counter #(
    parameter WIDTH = 8    // Counter width in bits; range: 1..32
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset (loads max value)
    input  wire             en,      // Count enable
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             tc       // Terminal count: high for one cycle when count = 0
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b1}};   // Reset to maximum value (all ones)
        else if (en)
            count <= count - 1'b1;
    end

    // Terminal count: asserted when count is zero AND enable is active
    assign tc = en & ~(|count);   // (|count) = reduction OR = 0 only when all bits are 0
endmodule

Typical applications:

• Countdown timer — assert TC when the programmed delay has elapsed

• Watchdog timeout detector

• Retry counter — count down the number of allowed retransmissions in a communication protocol

• Hardware loop counter for fixed-iteration operations

10.8.3 Up/Down Counter

The up/down counter counts in either direction under the control of a direction signal (dir). When dir = 1 the counter increments; when dir = 0 it decrements. Both overflow and underflow wrap around naturally.

// Parameterized N-bit Up/Down Counter with asynchronous reset
// dir = 1: count up; dir = 0: count down
module updown_counter #(
    parameter WIDTH = 8    // Counter width in bits; range: 1..32
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset
    input  wire             en,      // Count enable
    input  wire             dir,     // Direction: 1 = up, 0 = down
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             ovf,     // Overflow:  count was at max and counted up
    output wire             unf      // Underflow: count was at 0 and counted down
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (en) begin
            if (dir)
                count <= count + 1'b1;   // Count up
            else
                count <= count - 1'b1;   // Count down
        end
    end

    // Overflow: counting up from maximum value
    assign ovf = en &  dir & (&count);

    // Underflow: counting down from zero
    assign unf = en & ~dir & ~(|count);

endmodule

Typical applications:

• Quadrature encoder interface — increment on forward rotation, decrement on reverse rotation to track absolute position

• Volume or brightness control with up/down button inputs

• Servo motor position controller

• FIFO fill-level counter — increment on write, decrement on read to track occupancy

10.8.4 Modulo-N Counter

A Modulo-N counter counts from 0 to N-1 and then resets to 0 on the next clock edge. Unlike the natural wrap-around of a binary counter, the terminal value N-1 is arbitrary and does not need to be a power of two minus one. This requires a comparator to detect when the count has reached N-1 and force a synchronous reset to zero on the following cycle.

The Modulo-N counter is one of the most frequently used counters in practical FPGA design because it generates a precisely controlled periodic event every N clock cycles.

// Parameterized Modulo-N Counter with asynchronous reset
// Counts from 0 to (MODULO - 1), then resets to 0
// The terminal count pulse (tc) is asserted for one cycle when count = MODULO - 1
module modulo_counter #(
    parameter MODULO = 10,                        // Count modulus; must be >= 2
    localparam WIDTH = $clog2(MODULO)             // Minimum bit-width for the count
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset
    input  wire             en,      // Count enable
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             tc       // Terminal count: high for one cycle at MODULO-1
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (en) begin
            if (count == MODULO - 1)
                count <= {WIDTH{1'b0}};   // Reset to 0 at terminal count
            else
                count <= count + 1'b1;
        end
    end

    // Terminal count: asserted one cycle before the reset
    assign tc = en & (count == MODULO - 1);
endmodule

Practical Example: Baud Rate Generator Using Modulo-N Counter

One of the most common uses of a Modulo-N counter is generating a precise baud rate tick from a higher-frequency system clock. The modulus is calculated as the ratio of the system clock frequency to the desired baud rate:

// Baud rate generator: 50 MHz system clock, 115200 baud
// MODULO = 50,000,000 / 115,200 = 434 (rounded)
// The baud_tick output pulses high for one cycle every 434 clock cycles
module baud_rate_gen #(
    parameter CLK_FREQ  = 50_000_000,
    parameter BAUD_RATE = 115_200,
    localparam MODULO   = CLK_FREQ / BAUD_RATE,
    localparam WIDTH    = $clog2(MODULO)
) (
    input  wire clk,
    input  wire rst_n,
    output wire baud_tick   // One pulse per baud period
);
    wire [WIDTH-1:0] count;

    modulo_counter #(
        .MODULO (MODULO)
    ) u_baud_cnt (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (1'b1),       // Always counting
        .count (count),
        .tc    (baud_tick)   // tc fires every MODULO cycles = one baud period
    );
endmodule

Typical applications:

• Baud rate generator for UART, SPI, I2C clock dividers

• PWM period counter — sets the PWM frequency

• Sampling clock generator for ADC interfaces

• Display refresh counter — trigger a new display frame every N clock cycles

• Seven-segment display multiplexer timing controller

10.8.5 Gray Code Counter

A Gray code counter produces a sequence in which only one bit changes between consecutive count values. This property is critical in applications where the counter value is read by logic in a different clock domain — if two or more bits changed simultaneously (as in binary counting), a reader in another domain might sample a transitional value where some bits have already changed and others have not, producing a completely incorrect reading.

The standard implementation counts in binary internally and converts to Gray code at the output. The conversion from binary to Gray code is a simple XOR operation:

// Parameterized Gray Code Counter with asynchronous reset
// Internal binary counter converted to Gray code at the output
// Used primarily for asynchronous FIFO read/write pointer generation
module gray_counter #(
    parameter WIDTH = 4    // Counter width in bits; range: 2..16
) (
    input  wire             clk,
    input  wire             rst_n,    // Active-low asynchronous reset
    input  wire             en,       // Count enable
    output wire [WIDTH-1:0] gray_out  // Gray code output
);
    // Internal binary counter
    reg [WIDTH-1:0] bin_count;

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            bin_count <= {WIDTH{1'b0}};
        else if (en)
            bin_count <= bin_count + 1'b1;
    end

    // Binary to Gray code conversion: G[i] = B[i] XOR B[i+1]
    // MSB of Gray code equals MSB of binary
    assign gray_out = bin_count ^ (bin_count >> 1);

endmodule

Gray Code Sequence (4-bit Example)

Decimal	Binary	Gray Code	Bits Changed
0	0000	0000	—
1	0001	0001	bit 0
2	0010	0011	bit 1
3	0011	0010	bit 0
4	0100	0110	bit 2
5	0101	0111	bit 0
6	0110	0101	bit 1
7	0111	0100	bit 0

Notice that in every row, only a single bit changes from the previous value — regardless of the decimal count value. This is the defining property of Gray code.

Typical applications:

• Asynchronous FIFO read/write pointers — the most critical application; Gray code ensures that a pointer sampled across a clock domain boundary is always a valid adjacent count value

• Rotary encoder position decoding

• CDC-safe event counters in multi-clock-domain systems

10.8.6 BCD Counter (Binary Coded Decimal)

A BCD counter counts in decimal — each decade (digit) counts from 0 to 9 and then resets to 0 while generating a carry to the next higher decade. Each decimal digit is represented by a 4-bit binary value, giving a range of 0000 (0) to 1001 (9). The values 1010 (10) through 1111 (15) are illegal in BCD and must never appear.

A multi-digit BCD counter is constructed by cascading single-decade BCD counters, where the terminal count output of one decade drives the enable input of the next.

Single-Decade BCD Counter (0 to 9)

// Single-decade BCD counter: counts 0 to 9, then resets to 0
// tc (terminal count) pulses high when count reaches 9
module bcd_decade (
    input  wire       clk,
    input  wire       rst_n,   // Active-low asynchronous reset
    input  wire       en,      // Count enable (driven by tc of previous decade)
    output reg  [3:0] digit,   // BCD digit output (0 to 9)
    output wire       tc       // Terminal count: high for one cycle when digit = 9
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            digit <= 4'd0;
        else if (en) begin
            if (digit == 4'd9)
                digit <= 4'd0;   // Reset to 0 after reaching 9
            else
                digit <= digit + 4'd1;
        end
    end

    assign tc = en & (digit == 4'd9);
endmodule

Three-Decade BCD Counter (000 to 999)

// Three-decade BCD counter: counts 000 to 999
// Cascaded from three single-decade modules
// digit0 = ones, digit1 = tens, digit2 = hundreds
module bcd_counter_3digit (
    input  wire       clk,
    input  wire       rst_n,
    input  wire       en,
    output wire [3:0] digit2,   // Hundreds digit
    output wire [3:0] digit1,   // Tens digit
    output wire [3:0] digit0,   // Ones digit
    output wire       tc        // Terminal count: high when count = 999
);
    wire tc0, tc1;

    // Ones decade: always enabled when en is active
    bcd_decade u_ones (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (en),
        .digit (digit0),
        .tc    (tc0)
    );

    // Tens decade: enabled only when ones decade reaches 9
    bcd_decade u_tens (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (tc0),    // Carry from ones decade
        .digit (digit1),
        .tc    (tc1)
    );

    // Hundreds decade: enabled only when tens decade reaches 9
    bcd_decade u_hundreds (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (tc1),    // Carry from tens decade
        .digit (digit2),
        .tc    (tc)
    );

endmodule

Typical applications:

• Seven-segment display counter — each BCD digit drives one seven-segment display digit directly

• Digital clock (seconds, minutes, hours in BCD format)

• Industrial event counter with decimal readout

• Frequency counter display

10.8.7 Ring and Johnson Counters (Optional Reading)

Ring and Johnson counters are shift-register-based counters that generate specific non-binary sequences without requiring adder logic. They are included here as optional reference material for designers who encounter them in timing generation or phase control applications.

Ring Counter

A ring counter circulates a single 1 bit through an N-bit shift register. After N clock cycles, the pattern repeats. Only N states are possible (not 2^N), and each state has exactly one bit set — making it equivalent to a one-hot state machine.

// 4-bit Ring Counter
// Sequence: 1000 → 0100 → 0010 → 0001 → 1000 → ...
// Each output bit is active for exactly one clock cycle per period
module ring_counter #(
    parameter WIDTH = 4    // Number of stages; range: 2..16
) (
    input  wire             clk,
    input  wire             rst_n,
    input  wire             en,
    output reg  [WIDTH-1:0] ring    // One-hot ring output
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            ring <= {{WIDTH-1{1'b0}}, 1'b1};   // Initialize with single 1 at LSB
        else if (en)
            ring <= {ring[0], ring[WIDTH-1:1]}; // Rotate right
    end
endmodule

Johnson Counter (Twisted-Ring Counter)

A Johnson counter is similar to a ring counter but feeds back the complement of the MSB to the LSB input. This doubles the number of states to 2N and produces a sequence where only one bit changes per clock cycle — a Gray-code-like property without requiring XOR conversion logic.

// 4-bit Johnson Counter
// Sequence: 0000 → 0001 → 0011 → 0111 → 1111 → 1110 → 1100 → 1000 → 0000 → ...
// 2N = 8 unique states for N = 4
module johnson_counter #(
    parameter WIDTH = 4    // Number of stages; produces 2*WIDTH unique states
) (
    input  wire             clk,
    input  wire             rst_n,
    input  wire             en,
    output reg  [WIDTH-1:0] johnson
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            johnson <= {WIDTH{1'b0}};
        else if (en)
            johnson <= {~johnson[0], johnson[WIDTH-1:1]};  // Feedback inverted LSB to MSB
    end
endmodule

Typical applications:

• Ring counter: LED chaser / running light pattern, one-hot FSM controller, phase-accurate timing signal generator

• Johnson counter: divide-by-2N frequency divider, phase generator producing 2N evenly spaced phases, stepper motor commutation sequence

10.8.8 Counter Type Selection Guide

The following table summarizes the key characteristics of each counter type to guide design decisions:

Counter Type	Count Sequence	Wrap Behavior	Extra Logic Required	Primary Use Case
Up Counter	0 → 2^N-1	Natural overflow	None	General purpose, address generation
Down Counter	2^N-1 → 0	Natural underflow	None	Timeout, countdown timer
Up/Down Counter	Bidirectional	Natural both ways	Direction Mux	Position tracking, FIFO occupancy
Modulo-N Counter	0 → N-1	Forced reset at N-1	Comparator	Baud rate, PWM period, display timing
Gray Code Counter	Gray sequence	Natural (binary internal)	XOR conversion	Async FIFO pointers, CDC-safe counters
BCD Counter	0 → 9 per decade	Forced reset at 9	Comparator per decade	Decimal display, digital clock
Ring Counter	One-hot rotation	Circular shift	None (shift register)	LED chaser, one-hot FSM
Johnson Counter	2N Gray-like states	Circular shift + invert	None (shift register)	Phase generator, frequency divider

10.9 Counter with Enable and Reset

The counter variants in Section 10.8 each demonstrated a specific counting behavior in its simplest form. In a real FPGA design, a counter module must also handle a complete set of control signals — reset, clock enable, load, and direction — with a well-defined priority ordering that guarantees correct behavior under every combination of inputs. This section builds a series of progressively complete counter modules, culminating in a production-ready parameterized counter that can serve as the standard counter component for all subsequent lab and project work in this course.

10.9.1 Control Signal Priority

When multiple control signals are asserted simultaneously, the counter must apply them in a strictly defined priority order. This ordering is not a matter of convention — it has a concrete hardware justification. The priority from highest to lowest is:

• Reset (rst_n) — highest priority: the counter must reach a known safe state regardless of the state of any other signal. If reset could be blocked by enable or load, the system could become unrecoverable in a fault condition.

• Load — second priority: loading a preset value is a deliberate override of the count sequence. It takes precedence over normal counting but must yield to reset.

• Enable — third priority: allows or suppresses counting. When enable is inactive and no load or reset is asserted, the counter holds its current value.

• Hold (implicit) — lowest priority: when no control signal is active, the counter retains its current value. This is the default behavior and requires no explicit logic.

In Verilog, this priority ordering is expressed directly as a cascaded if / else if chain inside the always block. The first branch checked has the highest priority:

// Priority template for a fully controlled counter
always @(posedge clk or negedge rst_n) begin
    if (!rst_n)          // Priority 1: reset (unconditional)
        count <= RESET_VAL;
    else if (load)       // Priority 2: load preset value
        count <= preset;
    else if (en)         // Priority 3: count
        count <= count + 1'b1;
    // Implicit else: hold current value (priority 4)
end

10.9.2 Counter with Synchronous Reset and Enable

The synchronous reset + enable counter is the standard choice for datapath counters where the reset event is always aligned with the system clock — such as baud rate generators, PWM period counters, and pipeline cycle counters.

// Parameterized Up Counter — synchronous reset + clock enable
// Suitable for datapath applications (baud rate, PWM, pipeline timing)
module counter_sync_rst #(
    parameter WIDTH = 8    // Counter width in bits
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low synchronous reset
    input  wire             en,      // Clock enable
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             ovf      // Overflow: count reached maximum
);
    always @(posedge clk) begin      // Synchronous: only clk in sensitivity list
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (en)
            count <= count + 1'b1;
    end

    assign ovf = en & (&count);
endmodule

Synthesized hardware structure:

• Reset is implemented as a Mux before the adder input — the flip-flop CLR pin is not used.

• Enable maps to the flip-flop's dedicated ENA pin — no LUT consumed.

• Increment logic uses the dedicated carry chain across adjacent LEs.

Typical applications:

• Baud rate generator base counter (always clocked, reset only at startup)

• PWM period counter (reset value controls PWM frequency)

• Pipeline timing counter (synchronized to pipeline clock)

10.9.3 Counter with Asynchronous Reset and Enable

The asynchronous reset + enable counter is the standard choice for control logic counters where the reset must take effect immediately — independent of the clock — such as timeout detectors, watchdog counters, and protocol retry counters.

// Parameterized Up Counter — asynchronous reset + clock enable
// Suitable for control logic (timeout, watchdog, protocol retry counter)
module counter_async_rst #(
    parameter WIDTH = 8    // Counter width in bits
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset
    input  wire             en,      // Clock enable
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             ovf      // Overflow: count reached maximum
);
    always @(posedge clk or negedge rst_n) begin   // rst_n in sensitivity list
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (en)
            count <= count + 1'b1;
    end

    assign ovf = en & (&count);
endmodule

Synthesized hardware structure:

• Reset connects directly to the flip-flop's hardware CLR pin — immediate effect, no LUT consumed.

• Enable maps to the flip-flop's dedicated ENA pin — no LUT consumed.

• This is the most area-efficient counter structure on the MAX-10: both control signals use dedicated LE hardware pins, leaving all LUT inputs available for the increment logic.

Typical applications:

• Timeout counter — must clear immediately when a watchdog fires

• Protocol retry counter — must reset on bus error regardless of clock

• Debounce counter — must reset immediately when button state changes

10.9.4 Loadable Counter (Preset Value)

A loadable counter adds a load control signal that forces the counter to a user-specified preset value on the next active clock edge. This makes the counter's starting point configurable at runtime — an essential feature for programmable timers, adjustable PWM duty cycles, and variable-rate event generators.

// Parameterized Loadable Up Counter — asynchronous reset + load + enable
// Priority: reset > load > enable > hold
module counter_loadable #(
    parameter WIDTH = 8    // Counter width in bits
) (
    input  wire             clk,
    input  wire             rst_n,   // Active-low asynchronous reset
    input  wire             load,    // Load preset value (priority 2)
    input  wire             en,      // Count enable (priority 3)
    input  wire [WIDTH-1:0] preset,  // Value to load when load = 1
    output reg  [WIDTH-1:0] count,   // Current count value
    output wire             tc       // Terminal count: count = all ones
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b0}};   // Reset overrides everything
        else if (load)
            count <= preset;          // Load overrides counting
        else if (en)
            count <= count + 1'b1;    // Count when enabled
        // Implicit else: hold current value
    end

    assign tc = en & (&count);
endmodule

Typical applications:

• Programmable timer: load the desired timeout count, then enable counting — the terminal count flag signals expiry

• Variable PWM duty cycle: load a new compare value at the start of each PWM period to change the duty cycle dynamically

• Adjustable baud rate: load a new divider value to switch baud rates without resetting the system

10.9.5 Production-Ready Counter Module

The following module integrates all control signals into a single, fully parameterized counter suitable for direct use in course lab work and project designs. It combines asynchronous reset, clock enable, synchronous load, and bidirectional counting, with a complete set of status outputs.

// Production-Ready Parameterized Counter
// Supports: asynchronous reset, clock enable, synchronous load, up/down direction
// Priority: rst_n (async) > load > en > hold
// Status outputs: overflow, underflow, terminal count
module counter_full #(
    parameter WIDTH     = 8,              // Counter width in bits
    parameter RST_VALUE = {WIDTH{1'b0}}   // Reset value (default: all zeros)
) (
    input  wire             clk,
    input  wire             rst_n,    // Active-low asynchronous reset
    input  wire             en,       // Clock enable
    input  wire             load,     // Synchronous load (overrides counting)
    input  wire             dir,      // Count direction: 1 = up, 0 = down
    input  wire [WIDTH-1:0] preset,   // Preset value loaded when load = 1
    output reg  [WIDTH-1:0] count,    // Current count value
    output wire             ovf,      // Overflow:  counting up   from all-ones
    output wire             unf,      // Underflow: counting down from all-zeros
    output wire             tc_up,    // Terminal count up:   count = all-ones
    output wire             tc_down   // Terminal count down: count = all-zeros
);
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= RST_VALUE;       // Priority 1: async reset
        else if (load)
            count <= preset;          // Priority 2: synchronous load
        else if (en) begin
            if (dir)
                count <= count + 1'b1;  // Priority 3a: count up
            else
                count <= count - 1'b1;  // Priority 3b: count down
        end
        // Implicit priority 4: hold (en = 0, load = 0, no reset)
    end

    // Status outputs (combinational)
    assign tc_up   =  (&count);                  // All bits = 1
    assign tc_down = ~(|count);                  // All bits = 0
    assign ovf     = en &  dir & tc_up;          // Up overflow
    assign unf     = en & ~dir & tc_down;        // Down underflow

endmodule

Port Description

Port	Direction	Description
clk	Input	System clock — all operations are synchronous to rising edge
rst_n	Input	Active-low asynchronous reset — clears counter to RST_VALUE immediately
en	Input	Clock enable — counter advances only when en = 1
load	Input	Synchronous load — loads preset on next rising edge, overrides counting
dir	Input	Direction — 1 = count up, 0 = count down
preset	Input	Preset value loaded into counter when load = 1
count	Output	Current count value
ovf	Output	Overflow flag — pulses high when counting up from maximum value
unf	Output	Underflow flag — pulses high when counting down from zero
tc_up	Output	Terminal count up — high when count = all-ones
tc_down	Output	Terminal count down — high when count = all-zeros

10.9.6 Practical Application: Programmable Hardware Timer

A programmable hardware timer is one of the most common peripheral modules in embedded FPGA designs. It allows software or higher-level logic to set a timeout period in clock cycles and receive a flag when that period has elapsed. The following implementation uses the counter_loadable module from Section 10.9.4 as its core counting element.

// Programmable Hardware Timer
// Operation:
//   1. Set period_val to the desired timeout in clock cycles minus 1
//   2. Assert start to begin timing (internally loads period_val and enables counter)
//   3. expired pulses high for one clock cycle when the countdown reaches zero
//   4. The timer automatically reloads and restarts if auto_reload = 1
//   5. Assert rst_n low at any time to immediately cancel and reset the timer
module hw_timer #(
    parameter WIDTH = 16   // Timer width; max timeout = 2^WIDTH - 1 clock cycles
) (
    input  wire             clk,
    input  wire             rst_n,       // Active-low asynchronous reset
    input  wire             start,       // Start / restart the timer
    input  wire             auto_reload, // 1 = restart automatically on expiry
    input  wire [WIDTH-1:0] period_val,  // Timeout period in clock cycles (minus 1)
    output wire             running,     // High while timer is active
    output wire             expired      // Pulses high for one cycle on timeout
);
    reg  [WIDTH-1:0] count;
    reg              active;
    wire             tc;

    // Active flag: set by start, cleared on expiry (unless auto_reload)
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            active <= 1'b0;
        else if (start)
            active <= 1'b1;
        else if (tc && !auto_reload)
            active <= 1'b0;   // Stop after one period if auto_reload is off
    end

    // Countdown counter: loads period_val on start or auto-reload expiry
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            count <= {WIDTH{1'b0}};
        else if (start || (tc && auto_reload))
            count <= period_val;          // Load period on start or auto-reload
        else if (active)
            count <= count - 1'b1;        // Count down while active
    end

    // Terminal count: counter has reached zero
    assign tc      = active & ~(|count);
    assign expired = tc;
    assign running = active;

endmodule

Usage Example

// Instantiate a 1-second one-shot timer on a 50 MHz system clock
// period_val = 50,000,000 - 1 = 49,999,999 clock cycles
hw_timer #(
    .WIDTH (26)   // $clog2(50_000_000) = 26 bits required
) u_1sec_timer (
    .clk         (clk),
    .rst_n       (rst_n),
    .start       (timer_start),
    .auto_reload (1'b0),                    // One-shot: stop after one period
    .period_val  (26'd49_999_999),          // 1 second at 50 MHz
    .running     (timer_running),
    .expired     (timer_expired)
);

// Instantiate a 1 kHz auto-reload periodic timer (1 ms period)
// period_val = 50,000 - 1 = 49,999 clock cycles
hw_timer #(
    .WIDTH (16)   // $clog2(50_000) = 16 bits required
) u_1ms_timer (
    .clk         (clk),
    .rst_n       (rst_n),
    .start       (1'b1),                    // Always running
    .auto_reload (1'b1),                    // Periodic: restart automatically
    .period_val  (16'd49_999),              // 1 ms at 50 MHz
    .running     (),                        // Not used
    .expired     (tick_1ms)                 // 1 kHz tick output
);

10.9.7 Design Rules Summary

Design Decision	Rule
Control signal priority	Always enforce reset > load > enable > hold in the if / else if chain. Never allow enable or load to block a reset.
Reset type	Use asynchronous reset for control logic counters (timeout, watchdog, retry). Use synchronous reset for datapath counters (baud rate, PWM, pipeline).
Load vs Reset	Use load to set a non-zero starting value at runtime. Use reset only to return to the fixed reset value. Never repurpose reset to load an arbitrary value.
Terminal count output	Always derive terminal count flags combinationally from the count value using reduction operators (&count, \|count). Never register the terminal count inside the counter — this adds one cycle of latency that the downstream logic must compensate for.
Enable gating	Always use en inside the always block to map to the LE's dedicated ENA pin. Never gate the clock signal directly with en.
Parameterization	Use parameter for WIDTH and RST_VALUE. Use $clog2 to compute the minimum bit-width from any modulus or period value.
Simulation	Test all combinations of simultaneous control signals: reset during load, reset during count, load during count. Verify that priority ordering is correctly enforced in simulation before synthesis.

10.10 Simulation and Waveform Verification

10.11 Common Mistakes

10.10 Simulation and Waveform Verification

Every register and counter module developed in this chapter must be verified through simulation before it is submitted for synthesis or deployed on the DE10-Lite board. Simulation confirms that the RTL code produces the intended logical behavior under all relevant input conditions — including boundary values, simultaneous control signal assertions, and reset sequences. This section provides a complete simulation workflow using Icarus Verilog and GTKWave, the tools used throughout this course.

10.10.1 Why Simulate Before Synthesizing?

Synthesis and simulation serve different purposes and are not interchangeable verification methods. A design that compiles without errors in Quartus Prime is not necessarily correct — it only means the RTL is syntactically valid and structurally mappable to hardware. Simulation is the only way to verify that the logic behaves as intended before committing to hardware.

What Simulation Verifies

• Functional correctness — does the output match the expected sequence?

• Control signal priority — does reset override enable? Does load override counting?

• Boundary conditions — does the counter wrap correctly at the terminal value?

• Timing relationships — do outputs update on the correct clock edge?

• Reset behavior — does asynchronous reset take effect immediately, without waiting for a clock edge?

Simulation-Synthesis Mismatch

A simulation-synthesis mismatch occurs when the simulated behavior of a module differs from the behavior of the synthesized hardware. This is one of the most difficult categories of bug to diagnose because both the simulation and the synthesis appear to succeed without errors. The most common causes in register and counter design are:

• Blocking assignment ( = ) used in sequential logic — simulation executes assignments sequentially within the time step, which does not model flip-flop behavior correctly. Synthesis ignores the ordering and produces a flip-flop regardless, causing the simulated and synthesized behaviors to diverge.

• Non-blocking assignment ( <= ) used in combinational logic — simulation schedules all updates to the end of the time step, which may delay the combinational output by one simulation delta and cause incorrect results in downstream logic.

• initial block used for register initialization — simulation applies the initial value at time zero, but the synthesized hardware may not reproduce this behavior after a reset event, as discussed in Section 10.1.6.

• Incomplete sensitivity list — if always @(*) is replaced with a manually specified list that omits a signal, simulation will not re-evaluate the block when that signal changes, but synthesis will always produce combinational logic that responds to all inputs.

10.10.2 Testbench Structure for Registers and Counters

A testbench is a non-synthesizable Verilog module that instantiates the design under test (DUT), drives its inputs, and optionally checks its outputs. For the register and counter modules in this chapter, the standard testbench structure consists of four sections: clock generation, reset sequence, stimulus application, and waveform dump.

// Standard Testbench Template for Chapter 10 Register and Counter Modules
// Replace  and port connections to adapt for any DUT
`timescale 1ns / 1ps   // Time unit: 1 ns; time precision: 1 ps

module tb_template;

    // -----------------------------------------------------------------------
    // 1. Signal declarations (mirror the DUT port list)
    // -----------------------------------------------------------------------
    reg         clk;
    reg         rst_n;
    reg         en;
    // Add additional DUT input signals here

    wire [7:0]  count;    // DUT output signals declared as wire
    wire        ovf;

    // -----------------------------------------------------------------------
    // 2. DUT instantiation
    // -----------------------------------------------------------------------
    // Replace with the actual module name and correct port connections
    counter_async_rst #(
        .WIDTH (8)
    ) u_dut (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (en),
        .count (count),
        .ovf   (ovf)
    );

    // -----------------------------------------------------------------------
    // 3. Clock generation: 50 MHz (period = 20 ns)
    // -----------------------------------------------------------------------
    initial clk = 1'b0;
    always #10 clk = ~clk;   // Toggle every 10 ns -> 20 ns period -> 50 MHz

    // -----------------------------------------------------------------------
    // 4. Waveform dump for GTKWave
    // -----------------------------------------------------------------------
    initial begin
        $dumpfile("tb_template.vcd");   // Output VCD file name
        $dumpvars(0, tb_template);      // Dump all signals in this testbench
    end

    // -----------------------------------------------------------------------
    // 5. Stimulus: reset sequence followed by functional tests
    // -----------------------------------------------------------------------
    initial begin
        // Apply asynchronous reset for 3 clock cycles
        rst_n = 1'b0;
        en    = 1'b0;
        #35;            // Hold reset for 35 ns (just past 1.5 clock edges)
        rst_n = 1'b1;   // Release reset
        #20;            // Wait one full clock cycle before applying stimulus

        // --- Functional test stimulus goes here ---
        en = 1'b1;
        #200;           // Count for 10 clock cycles (10 x 20 ns)

        en = 1'b0;
        #40;            // Hold for 2 cycles to verify hold behavior

        en = 1'b1;
        #400;           // Continue counting

        // End simulation
        #20;
        $finish;
    end

endmodule

Key points in this template:

• The `timescale directive must appear at the top of the testbench file. Without it, Icarus Verilog uses a default time unit that may produce unexpected delay behavior.

• Reset is held for 35 ns (just past 1.5 clock edges) to ensure it is asserted across at least one complete clock cycle, regardless of when in the clock period it is applied.

• $dumpfile and $dumpvars generate a VCD (Value Change Dump) file that GTKWave reads to display the waveform.

• All DUT inputs are declared as reg in the testbench (so they can be driven in initial and always blocks); all DUT outputs are declared as wire.

10.10.3 Simulating the Full-Featured Register (Section 10.4.5)

The following testbench targets the dff_async_rst_en module from Section 10.4.5. It verifies reset priority, enable behavior, and the hold state in sequence.

`timescale 1ns / 1ps

module tb_dff_async_rst_en;

    reg        clk, rst_n, en;
    reg        d;
    wire       q;

    // DUT instantiation
    dff_async_rst_en u_dut (
        .clk   (clk),
        .rst_n (rst_n),
        .en    (en),
        .d     (d),
        .q     (q)
    );

    // 50 MHz clock
    initial clk = 0;
    always #10 clk = ~clk;

    // Waveform dump
    initial begin
        $dumpfile("tb_dff_async_rst_en.vcd");
        $dumpvars(0, tb_dff_async_rst_en);
    end

    initial begin
        // --- Phase 1: Power-on reset ---
        rst_n = 0; en = 0; d = 0;
        #35;
        rst_n = 1;
        #20;

        // --- Phase 2: Verify enable = 1, d = 1 -> q should capture 1 ---
        en = 1; d = 1;
        #20;   // One clock cycle
        // Expected: q = 1

        // --- Phase 3: Verify enable = 0 -> q should hold ---
        en = 0; d = 0;
        #40;   // Two clock cycles
        // Expected: q remains 1 (hold, not capturing d = 0)

        // --- Phase 4: Verify enable = 1, d = 0 -> q should capture 0 ---
        en = 1;
        #20;
        // Expected: q = 0

        // --- Phase 5: Verify async reset priority over enable ---
        en = 1; d = 1;
        #20;   // Capture d = 1 -> q = 1
        rst_n = 0;   // Assert reset mid-cycle (not on clock edge)
        #5;          // 5 ns later: q should already be 0 (async, no clock needed)
        rst_n = 1;
        #15;
        // Expected: q went to 0 immediately after rst_n asserted,
        //           without waiting for the next clock edge

        // --- Phase 6: Normal operation after reset release ---
        en = 1; d = 1;
        #20;
        // Expected: q = 1

        #20;
        $finish;
    end

endmodule

Expected GTKWave Waveform Behavior

• Phase 1: q stays at 0 during reset, then holds 0 after release (en = 0).

• Phase 2: q transitions to 1 on the next rising clock edge after en is asserted.

• Phase 3: q remains 1 across two clock edges despite d = 0 — confirming hold behavior.

• Phase 5 (critical): q drops to 0 within a few nanoseconds of rst_n going low — not on the next clock edge. This confirms that the reset is truly asynchronous.

10.10.4 Simulating the Production-Ready Counter (Section 10.9.5)

The following testbench targets the counter_full module from Section 10.9.5. It systematically exercises all control signal combinations and boundary conditions.

`timescale 1ns / 1ps

module tb_counter_full;

    // Parameters matching the DUT
    localparam WIDTH = 4;   // Use 4-bit width for readable waveform

    reg                  clk, rst_n, en, load, dir;
    reg  [WIDTH-1:0]     preset;
    wire [WIDTH-1:0]     count;
    wire                 ovf, unf, tc_up, tc_down;

    // DUT instantiation
    counter_full #(
        .WIDTH     (WIDTH),
        .RST_VALUE ({WIDTH{1'b0}})
    ) u_dut (
        .clk     (clk),
        .rst_n   (rst_n),
        .en      (en),
        .load    (load),
        .dir     (dir),
        .preset  (preset),
        .count   (count),
        .ovf     (ovf),
        .unf     (unf),
        .tc_up   (tc_up),
        .tc_down (tc_down)
    );

    // 50 MHz clock
    initial clk = 0;
    always #10 clk = ~clk;

    // Waveform dump
    initial begin
        $dumpfile("tb_counter_full.vcd");
        $dumpvars(0, tb_counter_full);
    end

    initial begin
        // ---------------------------------------------------------------
        // Phase 1: Power-on async reset
        // ---------------------------------------------------------------
        rst_n = 0; en = 0; load = 0; dir = 1; preset = 0;
        #35;
        rst_n = 1;
        #20;
        // Expected: count = 0 after reset

        // ---------------------------------------------------------------
        // Phase 2: Count up — verify basic increment and tc_up flag
        // ---------------------------------------------------------------
        en = 1; dir = 1;
        #(20 * 16);  // Count through all 16 values (0..15 for 4-bit)
        // Expected: count cycles 0 -> 1 -> ... -> 15 -> 0
        // ovf pulses high for one cycle when count was 15 and incremented

        // ---------------------------------------------------------------
        // Phase 3: Disable counting — verify hold behavior
        // ---------------------------------------------------------------
        en = 0;
        #60;   // 3 clock cycles
        // Expected: count does not change

        // ---------------------------------------------------------------
        // Phase 4: Load preset value — verify load priority over enable
        // ---------------------------------------------------------------
        en = 1; load = 1; preset = 4'd10;
        #20;   // One clock cycle
        load = 0;
        // Expected: count = 10 on the cycle after load is asserted

        // ---------------------------------------------------------------
        // Phase 5: Count down from loaded value — verify unf flag
        // ---------------------------------------------------------------
        dir = 0;   // Count down
        #(20 * 12);
        // Expected: count decrements 10 -> 9 -> ... -> 0
        // unf pulses high when count was 0 and decremented (wraps to 15)

        // ---------------------------------------------------------------
        // Phase 6: Verify reset priority over load
        // ---------------------------------------------------------------
        load = 1; preset = 4'd7;
        #5;      // Mid-cycle: assert reset while load is active
        rst_n = 0;
        #5;
        // Expected: count immediately goes to 0 (reset wins over load)
        rst_n = 1; load = 0;
        #20;

        // ---------------------------------------------------------------
        // Phase 7: Verify reset priority over enable + counting
        // ---------------------------------------------------------------
        en = 1; dir = 1;
        #40;           // Count up for 2 cycles
        rst_n = 0;     // Assert async reset mid-cycle
        #5;
        // Expected: count immediately clears to 0 without waiting for clock
        rst_n = 1;
        #20;

        // ---------------------------------------------------------------
        // Phase 8: Verify simultaneous load and enable (load has priority)
        // ---------------------------------------------------------------
        en = 1; load = 1; preset = 4'd5; dir = 1;
        #20;
        load = 0;
        // Expected: count = 5 (load wins over enable; counter does not increment)
        #40;
        // Expected: count increments from 5

        #20;
        $finish;
    end

endmodule

Simulation Test Coverage Summary

Phase	Test Condition	Expected Observation
1	Power-on async reset	count = 0 immediately, no clock edge required
2	Count up, full range	0 → 15 → 0 wrap, ovf pulses at wrap
3	Enable = 0	Count holds for 3 cycles
4	Load while enabled	Count jumps to preset = 10, load wins over enable
5	Count down from 10	10 → 0 → 15 wrap, unf pulses at wrap
6	Reset while loading	Count = 0 immediately, reset wins over load
7	Async reset mid-cycle	Count = 0 within nanoseconds, no clock edge needed
8	Load and enable simultaneously	Count = preset (load priority); counting resumes next cycle

10.10.5 Running Simulation with Icarus Verilog

The following commands compile and simulate the counter testbench using Icarus Verilog in the course Jupyter environment. The same pattern applies to any module in this chapter — replace the filenames as needed.

// Step 1: Compile the DUT and testbench together
// iverilog -o    [additional_files]
iverilog -o sim_counter counter_full.v tb_counter_full.v

// Step 2: Run the simulation (generates the .vcd waveform file)
vvp sim_counter

// Step 3: Open the waveform in GTKWave
gtkwave tb_counter_full.vcd &

In the Jupyter notebook environment used in this course, wrap the commands in conda run to ensure the correct environment is active:

// Jupyter notebook cell: compile and simulate using the course conda environment
import subprocess

subprocess.run([
    "conda", "run", "-n", "py312_vsc_base",
    "iverilog", "-o", "sim_counter", "counter_full.v", "tb_counter_full.v"
], check=True)

subprocess.run([
    "conda", "run", "-n", "py312_vsc_base",
    "vvp", "sim_counter"
], check=True)

// GTKWave is launched separately from the desktop environment

10.10.6 GTKWave Waveform Interpretation Guide

After opening the VCD file in GTKWave, add all relevant signals to the waveform view and apply the following interpretation techniques specific to register and counter verification.

Identifying Clock Edge Alignment

• All synchronous signal transitions (count increment, enable capture, load) must occur on the rising edge of clk. In GTKWave, zoom into a transition and confirm that the output changes align with the rising clock edge — not before it and not significantly after it.

• If an output changes between clock edges (not on a rising edge), it indicates either a combinational output (correct for assign statements) or an unintended latch (incorrect for sequential logic).

Verifying Asynchronous Reset

• After asserting rst_n = 0 mid-cycle (Phase 5 and Phase 7 of the testbench), zoom into the waveform and confirm that count transitions to 0 within a few nanoseconds of rst_n going low — independent of the clock.

• If count only clears on the next rising clock edge, the reset has been synthesized as synchronous despite the sensitivity list containing negedge rst_n. Review the RTL and confirm the if (!rst_n) branch appears first in the always block.

Verifying Enable Hold Behavior

• When en = 0, the count waveform must appear as a flat horizontal line — no transitions across any number of clock edges. Any transition while en = 0 indicates that the enable logic has not been correctly implemented.

Measuring Counter Period and Terminal Count Pulse Width

• Use GTKWave's Marker tool (press Ctrl+M to place a marker) to measure the time between two identical count values — this gives the counter period in nanoseconds.

• The terminal count flags ( tc_up, ovf) must be exactly one clock period wide. A pulse wider than one cycle indicates the comparator logic is incorrect; a pulse that never appears indicates the terminal count value is not being reached.

Common Waveform Anomalies and Their Causes

Waveform Observation	Likely Cause	Fix
Output changes between clock edges	Unintended latch or combinational feedback	Check for incomplete if/else in always @(*)
Reset only takes effect on clock edge	Asynchronous reset coded as synchronous	Add negedge rst_n to sensitivity list
Count changes when en = 0	Enable not checked or wrong polarity	Verify else if (en) branch in always block
Count never reaches terminal value	Width too small or modulo value incorrect	Check WIDTH and MODULO parameter values
ovf pulse lasts more than one clock cycle	Terminal count flag incorrectly registered	Use combinational assign for ovf
Load does not override enable	Wrong priority in if/else if chain	Ensure load is checked before en
All outputs show X (unknown) throughout	Reset never deasserted in testbench	Confirm rst_n = 1 is applied after the reset period

10.10.7 Simulation Verification Checklist

Apply the following checklist to every register and counter module before proceeding to synthesis or board testing. Each item must be confirmed in the GTKWave waveform.

Check Item	What to Verify in GTKWave
Power-on reset	Output is at reset value from time 0; transitions correctly after rst_n release
Async reset immediacy	Output clears within nanoseconds of rst_n going low — not on the next clock edge
Clock enable hold	Output is flat (no transitions) across multiple clock edges when en = 0
Count sequence	Every count value in the expected sequence appears exactly once per clock cycle
Wrap-around / terminal count	Counter wraps at the correct value; ovf / unf pulses for exactly one clock cycle
Load priority over enable	When both load and en are asserted, output = preset (not count + 1)
Reset priority over load	When rst_n is asserted while load is active, output = reset value (not preset)
Direction control	Count increments when dir = 1 and decrements when dir = 0
No X states after reset	No signals show unknown (X) or high-impedance (Z) values after rst_n is released
Simulation-synthesis match	Blocking vs non-blocking assignments are used correctly; no initial blocks in synthesizable RTL

Lesson 10: Reset, Registers, and Counters in Verilog

10.1 Introduction to Reset, Registers, and Counters

Introduction to Reset, Registers, and Counters

Register

What Is a Register?

Reset

What Is a Reset?

Counter

What Is a Counter?

How These Concepts Relate

FPGA Hardware Context

A Note on the initial Block and Register Initialization

On FPGAs

On FPGAs (Including Intel MAX-10)

On ASICs

On ASICs

Recommended Practice

The reg Keyword: Combinational vs Sequential Logic

The Key Insight: reg Is a Data Type, Not a Hardware Primitive

Combinational Logi

Combinational Logic Using reg

Sequential Logic

Sequential Logic Using reg

The Danger Zone

The Danger Zone: Unintended Latches

Side-by-Side Comparison

10.2 Reset Fundamentals: Synchronous vs Asynchronous

Reset Fundamentals: Synchronous vs Asynchronous

Why Is Reset Necessary?

Synchronous Reset

Synchronous Reset

Asynchronous Reset

Asynchronous Reset

The Risk of Asynchronous Reset

The Risk of Asynchronous Reset: Metastability at Deassertion

When to Use Each Reset Style

Use Synchronous Reset When:

Use Asynchronous Reset When:

Practical Case Studies

Case 1: UART Receive Controller (Asynchronous Reset)

Case 1: UART Receive Controller (Asynchronous Reset)

Case 2: I2C Bus Arbitration Controller (Asynchronous Reset)

Case 2: I2C Bus Arbitration Controller (Asynchronous Reset)

Case 3: Pipelined Multiplier Stage Register (Synchronous Reset)

Case 3: Pipelined Multiplier Stage Register (Synchronous Reset)

Case 4: Baud Rate Generator Counter (Synchronous Reset)

Case 4: Baud Rate Generator Counter (Synchronous Reset)

Comparison Summary

Design Rules Summary

10.3 Reset Best Practices in FPGA Design

Reset Best Practices in FPGA Design

Active-Low vs Active-High Reset

Reset Synchronizer Design

Why Two Stages?

Multi-Clock-Domain Reset Distribution

Reset Fan-out Management

Using Quartus Global Signals

When to Insert Reset Buffers Manually

Reset Constraints in Quartus Prime

Verifying Reset Inference in RTL Viewer

SDC Timing Constraints for Asynchronous Reset

Checking Recovery and Removal Times

Common Reset Design Errors

Error 1: Reset Polarity Inversion

Error 1: Reset Polarity Inversion

Error 2: Asynchronous Reset Crossing Clock Domains Without a Synchronizer

Error 2: Asynchronous Reset Crossing Clock Domains Without a Synchronizer

Error 3: Reset Pulse Too Narrow

Error 3: Reset Pulse Too Narrow

Error 4: Using Reset Inside a Combinational always @(*) Block

Error 4: Using Reset Inside a Combinational always @(*) Block

Error 5: FSM Has No Default State After Reset

Error 5: FSM Has No Default State After Reset

Best Practices Summary

10.4 Register Design Patterns (Single, Enable, Reset, Shift)

Register Design Patterns (Single, Enable, Reset, Shift)

Single-Bit Register (Basic D Flip-Flop)

Register with Synchronous Reset

Register with Synchronous Reset

Register with Asynchronous Reset

The Key Insight: `reg` Is a Data Type, Not a Hardware Primitive

Combinational Logic Using `reg`

Sequential Logic Using `reg`