Pipeline and Vector Processing MCQ Questions and Answers
1. Which of the following best describes the primary goal of instruction pipelining?
a) Reduce power consumption of each instruction
b) Increase clock frequency by simplifying stages
c) Increase instruction throughput by overlapping execution of multiple instructions
d) Reduce the number of instructions in a program
2. In a classic 5-stage RISC instruction pipeline (IF, ID, EX, MEM, WB), which stage performs register read and instruction decode?
a) IF (Instruction Fetch)
b) ID (Instruction Decode / Register Read)
c) EX (Execute)
d) MEM (Memory access)
3. A pipeline’s speedup over non-pipelined execution (ideal) for n-stage pipeline is approximately:
a) n/2
b) n – 1
c) n
d) n²
4. What is a structural hazard in pipelining?
a) Data dependence between instructions
b) Resource conflict because two stages need the same hardware at the same time
c) Control flow change due to branch
d) Exception or interrupt during instruction
5. Which technique resolves RAW (read-after-write) hazards without stalling by using results directly from intermediate pipeline registers?
a) Speculative execution
b) Delay slots
c) Forwarding (bypassing)
d) Register renaming
6. A pipeline stall (bubble) is introduced primarily to handle:
a) Faster memory accesses
b) Reduced instruction size
c) Data or control hazards that cannot be immediately resolved
d) Increased clock speed
7. Which of the following is a control hazard mitigation technique?
a) Out-of-order completion
b) Register forwarding
c) Branch prediction
d) Loop unrolling
8. The minimum possible CPI (cycles per instruction) in a perfectly pipelined machine with no hazards is:
a) 0
b) 1
c) Number of pipeline stages
d) 0.5
9. In an arithmetic pipeline for floating-point addition, which stage is most likely to perform alignment of operands?
a) Rounding stage
b) Normalization stage
c) Pre-alignment or alignment stage (shift mantissas)
d) Exception handling stage
10. Which statement best describes latency and throughput in pipelined processors?
a) Latency is the rate of completing instructions; throughput is time for one instruction
b) Latency is time for a single instruction; throughput is number of instructions completed per unit time
c) Latency and throughput are same for pipelined systems
d) Throughput is always less than latency
11. What is the primary difference between superscalar and pipelined processors?
a) Superscalar has fewer stages than pipelined
b) Superscalar issues multiple instructions per cycle; pipelining overlaps stages of different instructions
c) Pipelined processors never have hazards
d) Superscalar cannot do forwarding
12. Which one is a structural limitation that prevents ideal pipelining?
a) Branch prediction
b) Insufficient functional units for concurrent stages
c) Register forwarding
d) Larger instruction memory
13. Instruction-level parallelism (ILP) is best described as:
a) Parallelism across multiple machines
b) Parallel execution of independent instructions within a single program
c) Vector operations on arrays
d) Using multiple threads for the same task
14. Which hazard type is Write-After-Read (WAR)?
a) Read-after-write (RAW)
b) Anti-dependence where a later write could overwrite earlier read if reordered
c) Output dependence
d) Control hazard
15. Register renaming principally eliminates which hazard?
a) Control hazards
b) Structural hazards
c) Write-after-read (WAR) and write-after-write (WAW) false dependencies
d) Memory consistency hazards
16. In an in-order pipeline with no forwarding, a dependent instruction following immediately will typically require how many stall cycles (minimum)?
a) 0
b) 1
c) At least 1 or more depending on stage latencies — typically several cycles until result written back
d) Negative cycles (it speeds up)
17. The term “pipeline depth” refers to:
a) Number of instructions in cache
b) Number of registers in register file
c) Number of stages in the pipeline
d) Number of functional units
18. Which of the following increases pipeline hazards but can increase throughput?
a) Fewer pipeline stages
b) Simpler compiler
c) Deeper pipelines (more stages)
d) Removing branch prediction
19. Delayed branching (delay slot) is:
a) Hardware-only technique
b) A technique to fix data hazards by reordering registers
c) A compiler/hardware cooperation where instruction(s) following a branch are always executed
d) A way to increase register file size
20. Which is true about a VLIW (Very Long Instruction Word) architecture?
a) It issues only one instruction at a time
b) It requires dynamic hardware scheduling
c) It relies on compiler to pack independent operations into one long instruction word
d) It does not benefit from pipelining
21. In vector processing, “stride” refers to:
a) Number of vector registers
b) Step between successive memory elements accessed by a vector instruction
c) Vector length limit
d) Clock cycles per vector element
22. Which architecture is specialized for executing vector operations efficiently?
a) Scalar RISC CPU
b) CISC with no vector units
c) Vector processor (vector supercomputer / SIMD array)
d) Microcontroller
23. Array processors (SIMD) differ from vector processors in that:
a) Array processors cannot operate on arrays
b) Array processors have multiple processing elements operating in lockstep on different data, while vector processors apply long vector instructions to a single pipeline
c) Vector processors have no pipelining
d) There is no difference
24. Peak arithmetic throughput of a pipelined floating-point multiply unit is best measured in:
a) Bytes per second
b) Cache misses per second
c) Floating-point operations per cycle (or per second when multiplied by clock)
d) Branch mispredictions per second
25. What is the main benefit of chaining in vector processors?
a) Reducing instruction cache misses
b) Allowing result of one vector operation to be used by another without writing back to registers (overlapping execution)
c) Increasing vector stride
d) Simplifying vector instruction encoding
26. Which technique helps reduce branch penalty in deep pipelines?
a) Out-of-order commit
b) Dynamic branch prediction with speculative execution
c) Increasing register count only
d) Reducing instruction size
27. A scoreboard in CPU pipeline control is primarily used for:
a) Improving branch prediction accuracy
b) Tracking instruction issue, resource status, and hazards for dynamic scheduling
c) Handling cache coherence
d) Managing TLB entries
28. Tomasulo’s algorithm primarily provides:
a) Static scheduling at compile time
b) Dynamic scheduling and register renaming to avoid false dependencies
c) A method for cache replacement
d) A procedure for handling interrupts
29. Loop unrolling helps pipelining by:
a) Reducing register file size
b) Increasing available ILP and reducing branch overhead per iteration
c) Introducing more branches
d) Decreasing code size always
30. In an arithmetic pipeline performing multiply-add (A·B + C), the most effective performance improvement is achieved when:
a) Memory access time increases
b) The pipeline can perform fused multiply-add (FMA) without separate rounding between multiply and add
c) Branch prediction is disabled
d) Pipeline depth is reduced to one
31. Which of the following is a vector register file organization advantage?
a) Smaller vector length supported
b) Ability to hold long vectors in registers to reduce memory traffic
c) Removing need for control logic
d) Preventing forwarding
32. “Cross-lane” communication in SIMD array processors refers to:
a) Communication between OS processes
b) Data transfer between cache levels
c) Data transfer between different processing elements / lanes in the array
d) Branch misprediction handling
33. Memory bandwidth limitation most directly affects which of the following in vector processing?
a) Branch prediction accuracy
b) Register renaming effectiveness
c) Achievable vector throughput (sustained performance)
d) Number of pipeline stages
34. Which of the following is an advantage of pipelined arithmetic units compared to non-pipelined ones?
a) Higher per-instruction latency always
b) Higher throughput since new operations can be started each cycle
c) No need for forwarding
d) They use less silicon area always
35. A “scoreboarding” approach will stall instruction issue when:
a) There are no register files
b) There is a structural hazard or RAW dependency blocking safe issue
c) Branch predictor is perfect
d) Memory bandwidth increases
36. Which best describes scalar vs vector instructions?
a) Scalar instructions operate on multiple data elements at once
b) Scalar operate on single data items; vector operate on sequences of data items using one instruction
c) Vector instructions are only for control flow
d) Scalar instructions require multiple ALUs
37. How does predication reduce branch penalties?
a) By increasing branch frequency
b) By stalling the pipeline more often
c) By converting branches into conditional instructions that execute both paths without control transfer
d) By disabling speculative execution
38. Which of the following best describes interleaved memory for vector processors?
a) Single-bank memory that serializes accesses
b) Multiple memory banks allowing multiple consecutive vector elements to be accessed in parallel
c) Cacheless memory only for scalar CPUs
d) Redundant memory copies for fault tolerance
39. In an instruction pipeline, a “hazard detection unit” typically resides in which stage?
a) MEM
b) ID (Instruction Decode)
c) WB
d) IF
40. In vector processors, “strip-mining” refers to:
a) Removing vector registers
b) Breaking a long vector operation into smaller chunks that fit hardware vector length
c) Multiplying vectors element-wise
d) A cache replacement policy
41. What is the principal limitation of in-order pipelines for achieving high ILP?
a) Too many registers
b) Dependence on static program order which prevents issuing independent later instructions early
c) Lack of caches
d) Excessive vector lengths
42. Which of the following is a true statement about pipeline flushing?
a) Flushing improves throughput
b) Flushing reduces the number of functional units required
c) Flushing discards partially executed instructions (e.g., after mispredicted branch), causing performance penalty
d) Flushing prevents hazards permanently
43. For chained vector functional units, the main hardware requirement is:
a) Separate branch predictors for each unit
b) Multiple caches per lane
c) Buffering and forwarding paths between vector units to pass partial results
d) Register renaming
44. Which mechanism allows out-of-order execution while preserving precise exceptions?
a) Scoreboarding without reordering
b) Reorder Buffer (ROB) to commit instructions in program order
c) Branch delay slots
d) Direct register writes from reservation stations
45. A pipeline with five stages runs at 1 GHz. If each instruction requires five cycles in a non-pipelined design at the same stage time, the ideal pipelined throughput improvement is approximately:
a) 1x
b) 5x
c) 10x
d) 0.2x
46. Which of the following is NOT a vector instruction class typically found in vector ISAs?
a) Vector load/store
b) Vector arithmetic (add, multiply)
c) Vector process scheduling (OS-level)
d) Vector reduction (sum)
47. In a pipelined CPU, data forwarding cannot resolve which hazard?
a) RAW between EX and EX stages
b) Hazard when result is produced only in WB stage and needed in ID before write-back (requires stall)
c) RAW between MEM and EX stages with forwarding path
d) None — forwarding resolves all RAW cases
48. Which of these best describes “throughput” in vector processors?
a) Number of branch predictions per second
b) Number of vector elements processed per unit time (or operations per second)
c) Total number of vector registers
d) Average memory latency
49. What is a common hardware feature of array processors (SIMD) that aids in masked operations?
a) Larger caches only
b) Per-lane enable/mask bits to selectively enable or disable operations for each lane
c) Single global mask only
d) No branching at all
50. If a pipeline stage takes longer than the clock period, what is the likely outcome?
a) Clock frequency increases automatically
b) The pipeline will not meet timing; clock must be slowed or stage split into more stages
c) Pipeline depth decreases automatically
d) Hazards disappear
51. Which of the following best defines a “balanced pipeline”?
a) Pipeline stages have equal silicon area
b) Pipeline has equal number of forwarding paths as stages
c) Stage latencies are roughly equal so that no stage is a significant bottleneck
d) The number of stages equals the number of registers
52. In vector processing, “gather” and “scatter” operations are used for:
a) Branch prediction only
b) Non-contiguous memory loads (gather) and stores (scatter)
c) Increasing cache associativity
d) Renaming registers
53. Which of the following pipeline optimization focuses on reducing control hazards specifically?
a) Forwarding
b) Branch target buffer (BTB) and prediction
c) Register allocation
d) Loop unrolling only
54. For arithmetic pipelining of integer multiply, pipelining is least effective when:
a) Multiplier is combinational and very slow
b) Operand size is small and latency is comparable to single-cycle—overhead of pipelining may outweigh gains
c) Multiple multiplies are independent
d) There is abundant ILP
55. The benefit of speculative execution is:
a) It guarantees no mispredictions
b) It prevents data hazards
c) It allows the processor to continue executing along a predicted path to utilize pipeline slots; incorrect speculation may be rolled back
d) It reduces instruction cache misses
56. In a pipelined RISC architecture, the purpose of the IF stage is to:
a) Execute arithmetic
b) Fetch the instruction from memory and increment PC
c) Write results to registers
d) Resolve hazards
57. Which of the following is a typical characteristic of vector instructions compared to scalar ones?
a) Shorter encoding only
b) They specify a vector length and operate on multiple data elements per instruction
c) They never use memory
d) They always require complex branching
58. What does “lane” refer to in SIMD array processors?
a) The OS-level thread id
b) A single register in scalar CPU
c) An individual processing element that executes the same instruction on different data
d) The pipeline stage for branch resolution
59. Which is the main role of an instruction prefetch buffer in pipelined CPUs?
a) Handle data hazards
b) Reduce stalls in IF stage due to instruction cache misses
c) Manage register renaming
d) Increase WAR hazards
60. In vector processors, a “reduction” operation typically:
a) Increases vector length
b) Combines all elements of a vector into a single scalar result (e.g., sum, max)
c) Scatters vector elements to memory
d) Splits vectors into sub-vectors
61. Which of the following is a downside of deeper pipelines?
a) Reduced clock skew
b) Increased branch misprediction penalty and pipeline flush cost
c) Fewer hazards
d) Lower clock frequency always
62. Which technique reduces false sharing and improves vector unit performance in multicore SIMD systems?
a) Increasing branch frequency
b) Aligning data and using cache line-aware partitioning to avoid cross-core false sharing
c) Removing caches
d) Using larger instruction words
63. In Tomasulo’s algorithm, reservation stations are used to:
a) Hold branch targets only
b) Hold operand values and track when operands become available for out-of-order issue
c) Manage memory addresses for vector loads only
d) Commit instructions in program order
64. The advantage of using pipelined memory access for vector loads is:
a) Decreasing vector length
b) Overlapping multiple memory accesses to increase effective bandwidth
c) Removing hazards entirely
d) Avoiding register file use
65. Which of the following best explains “vector chaining” benefit?
a) It compresses vectors on the fly
b) It allows the output of one vector functional unit to be fed directly to the next without waiting for full write-back
c) It reduces register width
d) It prevents masking
66. A dependency graph for instructions helps a compiler to:
a) Replace hardware predictors
b) Identify independent instructions to schedule for parallel execution
c) Reduce register file size
d) Increase memory stalls
67. What is the function of a Branch Target Buffer (BTB)?
a) Store vector lengths for instructions
b) Cache the target addresses of recently taken branches to speed up prediction/branch target fetch
c) Manage register renaming
d) Prevent structural hazards
68. Which metric most directly reflects pipeline efficiency under realistic conditions?
a) Clock period only
b) IPC (instructions per cycle) or CPI averaged over real workload
c) Vector stride
d) Number of registers
69. In an arithmetic pipeline performing normalization and rounding, which stage would typically handle rounding?
a) Fetch stage
b) Alignment stage
c) Final stage after normalization (close to write-back)
d) Exception stage only
70. Which of these is a major challenge in designing array processors for general-purpose computing?
a) Lack of vector registers
b) Handling irregular control flow and memory access patterns efficiently
c) Too many branch predictors
d) No instruction cache
71. Which technique allows a pipeline to recover precise state after an exception in out-of-order machines?
a) Scoreboarding without ROB
b) Commit via Reorder Buffer (ROB) so that architectural state is updated in program order
c) Branch delay slots
d) Static scheduling
72. Which of the following is true about vector length agnostic (VLEN-agnostic) ISAs?
a) They require fixed hardware vector length only
b) They allow software/compiler to operate on vectors longer than hardware vector registers by strip-mining
c) They remove need for mask bits
d) They cannot perform reductions
73. Which hardware element is essential to implement register renaming?
a) Branch predictor
b) Rename mapping table (physical register file and free-list)
c) Cache coherence directory
d) Vector mask register
74. In a pipelined CPU, what is the effect of increasing clock frequency without reducing stage latency?
a) No change to performance
b) Timing violations—circuit may fail unless stages are shorter; requires redesign or retiming
c) Decrease in pipeline hazards automatically
d) Increase in vector length
75. In vector processors, what is the main role of a “vector length register” (VL)?
a) Determine cache size
b) Specify the number of elements to process in a vector instruction
c) Hold predicate bits only
d) Control branch frequency
76. A pipeline that can issue multiple instructions per stage in the same cycle is termed:
a) Scalar pipeline
b) Superscalar pipeline
c) Microcoded pipeline
d) Unrolled pipeline
77. Which of the following is true about non-blocking caches for pipelined processors?
a) They always reduce power compared to blocking caches
b) They allow the processor to continue servicing other cache requests while one miss is being handled, improving pipeline utilization
c) They eliminate data hazards
d) They reduce instruction count
78. Which scenario most requires the use of register renaming?
a) To improve branch prediction
b) When the program has many false (name) dependencies causing WAW/WAR hazards
c) When vector length is short
d) When memory is highly interleaved
79. In a vector pipeline, what is “element-wise” operation?
a) Operation only on scalar elements of register file
b) Applying the same arithmetic/logical operation separately to each corresponding element of two vectors or vector and scalar
c) Operation that changes pipeline depth
d) Masking operation only
80. Which design choice reduces branch penalties in pipelines without changing program code?
a) Loop unrolling (requires code change)
b) Hardware-based global branch predictor with speculation and fast recovery
c) Removing instruction cache
d) Increasing register width
81. Which of the following is a cause of WAW (write-after-write) hazard?
a) True data dependence
b) Two instructions write to the same register out of program order in out-of-order execution unless prevented
c) Read of a value before it’s written
d) Branch misprediction
82. In vectorized code, the compiler’s role includes:
a) Reducing core count
b) Identifying loops that can be transformed into vector operations and aligning data
c) Increasing branch density
d) Removing functional units
83. Which of the following is a correct statement about pipelined floating-point adders?
a) They always have less latency than combinational ones
b) They can achieve higher throughput by breaking the operation into pipeline stages, though latency may increase
c) They cannot be chained
d) They eliminate need for normalization
84. Which is a common way to handle variable-latency operations in pipelines?
a) Always stall pipeline permanently
b) Use scoreboard/reservation stations and dynamic scheduling to allow other independent instructions to proceed
c) Remove the operations entirely
d) Force single-cycle model only
85. The term “packing” in vector SIMD refers to:
a) Cache compression algorithm
b) Combining multiple small data elements into a wider word so a single vector operation processes them together
c) Spreading vector elements across cores
d) Packing branch predictors
86. Which of the following increases effective ILP seen by a pipelined core?
a) Reducing the number of registers
b) Compiler reordering of independent instructions and loop unrolling
c) Increasing branch misprediction rate
d) Decreasing instruction cache size
87. In an instruction pipeline, a NOP instruction is often used to:
a) Improve branch prediction accuracy
b) Insert a harmless bubble to avoid hazards or align code
c) Increase arithmetic throughput directly
d) Rename registers
88. Which of the following best explains “vector scatter-gather” performance challenge?
a) They always use contiguous memory
b) Non-contiguous memory accesses break bandwidth and caching assumptions, making gather/scatter expensive
c) They reduce instruction cache misses
d) They eliminate need for vector masks
89. What is an advantage of hardware multithreading (SMT) for pipelines?
a) Removes branch hazards entirely
b) When one thread stalls (e.g., memory), another thread can use pipeline stages, improving utilization
c) Reduces register renaming needs
d) Always increases single-thread IPC
90. Which of the following components is critical for achieving high sustained vector performance?
a) Large register file only
b) High memory bandwidth and multiple memory banks (to feed vector lanes)
c) Many branch predictors
d) Small instruction cache
91. Which scheduling approach issues instructions only when all operands are ready and resources available, preventing hazards at issue time?
a) Static compile-time scheduling only
b) Dynamic scheduling (Tomasulo/scoreboard style)
c) Branch delay slot scheduling only
d) Vector strip-mining
92. In SIMD array processors, what is the effect of lane-to-lane variation in memory latency?
a) It speeds up all lanes equally
b) Some lanes may be idle waiting for memory, reducing effective SIMD utilization
c) It eliminates branch mispredictions
d) It reduces the need for masking
93. Which is a correct description of a “reservation station”?
a) A device that caches branch targets
b) A buffer that holds instructions waiting for operands and resource availability in Tomasulo-like schemes
c) A register used for vector lengths only
d) A hardware mechanism to flush pipelines on exceptions
94. Which of the following techniques can help reduce the effective memory latency for vector loads?
a) Decreasing vector lengths only
b) Prefetching and interleaving memory banks to overlap memory access with computation
c) Removing caches
d) Increasing branch frequency
95. Which pipeline stage typically performs memory address calculation for load/store in a RISC pipeline?
a) IF
b) WB
c) EX (Execute) — address computation, then MEM performs the access
d) ID
96. What is the effect of long dependency chains on pipelined throughput?
a) They increase throughput always
b) They reduce achievable ILP and cause stalls, lowering throughput
c) They have no effect if forwarding is present
d) They improve branch prediction
97. In vector processing, “unit-stride” access is most efficient because:
a) It uses branch prediction better
b) Consecutive elements are in successive memory locations allowing contiguous access and bank-parallelism
c) It reduces register renaming requirements
d) It increases mask complexity
98. Which of the following is a reason to split a long combinational arithmetic unit into pipeline stages?
a) To increase combinational delay
b) To raise clock frequency and enable higher throughput by reducing critical path per stage
c) To remove hazards automatically
d) To increase instruction count
99. Which statement about SIMD and vector processors is true?
a) SIMD cannot exploit data parallelism
b) Both exploit data parallelism: SIMD does so with multiple lanes executing same instruction, vector processors do so with vector pipelines applying one instruction to many data elements
c) Vector processors cannot handle reductions
d) SIMD always has longer latency than scalar
100. During branch misprediction recovery in a pipelined processor, best practice is to:
a) Commit speculative results immediately
b) Flush speculative instructions and restart fetch from correct path, restoring state as necessary
c) Ignore the misprediction and continue
d) Disable branch prediction permanently
