Multiprocessors MCQ Questions and Answers

1. Which characteristic best defines a tightly-coupled multiprocessor system?
a) Each processor has its own local memory and no shared memory.
b) Processors communicate only by message passing across a network.
c) Processors share a global memory and communicate through it.
d) Processors work sequentially on a single pipeline stage.
Answer: c) Processors share a global memory and communicate through it.

2. A primary advantage of cache-coherent shared-memory multiprocessors is:
a) Elimination of the need for any synchronization primitives.
b) Automatic visibility of memory updates across processors without programmer-managed communication.
c) Reduced hardware complexity compared to distributed memory.
d) No requirement for network interconnects.
Answer: b) Automatic visibility of memory updates across processors without programmer-managed communication.

3. Which interconnection network is non-blocking and provides a unique path for each input-output pair without conflict?
a) Crossbar switch.
b) Bus-based network.
c) Linear array.
d) Ring network.
Answer: a) Crossbar switch.

4. In an N-processor shared bus system, the arbitration point is usually:
a) Each processor independently deciding when to drive the bus.
b) A centralized arbiter that grants bus access.
c) Random backoff timers only.
d) A token passed around processors.
Answer: b) A centralized arbiter that grants bus access.

5. Which type of interconnection scales poorest as the number of processors increases?
a) Multistage network.
b) Crossbar switch.
c) Mesh network.
d) Tree network.
Answer: b) Crossbar switch.

6. What is the main goal of directory-based cache coherence protocols?
a) To broadcast every write to all processors.
b) To keep a centralized log of all memory references.
c) To track which caches hold each block and avoid unnecessary broadcasts.
d) To eliminate caches entirely.
Answer: c) To track which caches hold each block and avoid unnecessary broadcasts.

7. Which coherence state in MESI allows a cache line to be read and modified without notifying memory?
a) Modified.
b) Exclusive.
c) Shared.
d) Invalid.
Answer: a) Modified.

8. A point-to-point interconnect where each node connects to a small set of neighbors in a 2D grid describes which topology?
a) Hypercube.
b) Mesh.
c) Star.
d) Crossbar.
Answer: b) Mesh.

9. In multiprocessor performance analysis, Amdahl’s Law primarily quantifies:
a) Memory latency effects.
b) The maximum speedup given a fixed sequential fraction of a program.
c) The maximum network throughput.
d) Cache hit rates.
Answer: b) The maximum speedup given a fixed sequential fraction of a program.

10. Which arbitration scheme hands control of a shared resource to the requesting processor in a fixed rotating order?
a) Centralized priority.
b) Round-robin (token) arbitration.
c) Random arbitration.
d) First-come, first-served with indefinite priority.
Answer: b) Round-robin (token) arbitration.

11. In snooping cache-coherence protocols, what is required for correctness?
a) A directory to track sharers.
b) A broadcast-capable interconnect so caches can observe transactions.
c) A separate copy of memory per processor.
d) No memory ordering constraints.
Answer: b) A broadcast-capable interconnect so caches can observe transactions.

12. What is false sharing?
a) Sharing of a large object across threads intentionally.
b) Cache line contention caused by independent variables placed on the same cache line.
c) A directory-based coherence optimization.
d) Sharing only read-only data.
Answer: b) Cache line contention caused by independent variables placed on the same cache line.

13. Which interconnection topology has nodes labeled with binary addresses and links between nodes differing in exactly one bit?
a) Mesh.
b) Hypercube.
c) Bus.
d) Ring.
Answer: b) Hypercube.

14. Which statement about multicore processors is correct?
a) Multicore chips always reduce memory bandwidth demands.
b) Multicore processors place multiple cores on a single die sharing some on-chip resources.
c) Each core in a multicore must have its own off-chip memory.
d) Multicore designs eliminate coherence problems.
Answer: b) Multicore processors place multiple cores on a single die sharing some on-chip resources.

15. Which synchronization primitive provides mutual exclusion by allowing only one thread to enter a critical section at a time?
a) Barrier.
b) Semaphore/mutex (lock).
c) Atomic fetch-and-add for counters only.
d) Prefetch instruction.
Answer: b) Semaphore/mutex (lock).

16. A barrier synchronization primitive ensures that:
a) Threads can skip other threads’ critical sections.
b) All participating threads reach the barrier point before any proceed.
c) Only one thread proceeds while others block indefinitely.
d) Memory is flushed to disk.
Answer: b) All participating threads reach the barrier point before any proceed.

17. Which cache-coherence problem can occur when writes are not immediately visible to other processors leading to inconsistent reads?
a) False sharing.
b) Write propagation delay (staleness).
c) Cache pollution.
d) Thrashing.
Answer: b) Write propagation delay (staleness).

18. In a directory-based coherence scheme, the directory entry typically contains:
a) The entire cache contents of every processor.
b) A list or bitmap of which processors currently have a copy of the block.
c) The instruction stream for the block.
d) The physical location of the DRAM bank only.
Answer: b) A list or bitmap of which processors currently have a copy of the block.

19. Which of the following is a disadvantage of bus-based shared-memory multiprocessors?
a) Simpler coherence via snooping.
b) Bus contention limiting scalability.
c) Efficient multicast capabilities.
d) Natural support for non-uniform memory access.
Answer: b) Bus contention limiting scalability.

20. What does “memory consistency model” define?
a) The layout of caches.
b) The order in which memory operations appear to execute from the perspective of multiple processors.
c) The speed of DRAM.
d) The cache replacement policy.
Answer: b) The order in which memory operations appear to execute from the perspective of multiple processors.

21. Which scheduling policy best improves locality on shared-memory multicore systems?
a) Random thread placement.
b) Core pinning or affinity to keep threads on the same core/cache.
c) Frequent thread migration between cores.
d) Time-sliced swapping across NUMA nodes.
Answer: b) Core pinning or affinity to keep threads on the same core/cache.

22. In a Non-Uniform Memory Access (NUMA) architecture, the memory access time depends on:
a) The instruction cache size only.
b) Whether the requested memory is located on the local node or a remote node.
c) The number of levels in the TLB.
d) The compiler optimization level.
Answer: b) Whether the requested memory is located on the local node or a remote node.

23. Which of these is an advantage of message-passing multiprocessors over shared-memory ones?
a) Simpler programming model for all applications.
b) Eliminates need for any synchronization.
c) Easier scalability because communication is explicit and does not rely on shared-bus broadcasts.
d) Always faster on small-scale systems.
Answer: c) Easier scalability because communication is explicit and does not rely on shared-bus broadcasts.

24. The “directory bottleneck” in directory-based systems refers to:
a) The directory storing stale data forever.
b) The centralized directory becoming a performance and storage hotspot as the system grows.
c) The directory not tracking any sharers.
d) Directory entries being too small to hold addresses.
Answer: b) The centralized directory becoming a performance and storage hotspot as the system grows.

25. Which hardware instruction is commonly used to implement spinlocks efficiently?
a) Floating-point multiply.
b) Compare-and-swap (CAS) or test-and-set atomic operations.
c) Regular load followed by non-atomic store.
d) Branch prediction hints.
Answer: b) Compare-and-swap (CAS) or test-and-set atomic operations.

26. In coherence protocols, the term “invalidate” means:
a) The cache line is marked as stale so subsequent reads must fetch a fresh copy.
b) The cache line is duplicated across caches.
c) The line is permanently removed from memory.
d) The line is marked as read-only but remains valid.
Answer: a) The cache line is marked as stale so subsequent reads must fetch a fresh copy.

27. Which interconnect characteristic is most critical for cache-coherent snooping protocols?
a) Low power consumption only.
b) Support for broadcast or multicast of coherence messages.
c) High disk I/O bandwidth.
d) Support for virtual memory translation at each switch.
Answer: b) Support for broadcast or multicast of coherence messages.

28. Which multicore design technique reduces latency for shared data by placing a small cache on each core?
a) Off-chip memory only.
b) Private L1 caches per core with a shared L2 or L3 cache.
c) Single global cache for all cores.
d) No caches at all.
Answer: b) Private L1 caches per core with a shared L2 or L3 cache.

29. The primary role of cache coherence protocols in a multicore system is to:
a) Increase DRAM capacity.
b) Ensure correctness of memory semantics by making updates to shared data visible to other caches.
c) Schedule threads across cores.
d) Replace the OS memory manager.
Answer: b) Ensure correctness of memory semantics by making updates to shared data visible to other caches.

30. Which interconnect topology has logarithmic network diameter with respect to node count and uses binary labels to route?
a) Linear bus.
b) Hypercube.
c) Star.
d) Fully connected mesh.
Answer: b) Hypercube.

31. What is the read-for-ownership (RFO) operation used for in coherence protocols?
a) To read a value without any intent to modify.
b) To request exclusive ownership of a cache block in order to write to it.
c) To flush a cache line to disk.
d) To invalidate all other caches without reading.
Answer: b) To request exclusive ownership of a cache block in order to write to it.

32. Which problem arises when a processor repeatedly invalidates and re-loads the same cache line due to others frequently writing to it?
a) Thrashing.
b) Cache coherence deadlock.
c) Livelock.
d) False sharing.
Answer: a) Thrashing.

33. In a directory-based system, what optimization reduces directory storage by recording only a few sharers explicitly and using broadcast for large sharing sets?
a) Full-map directory.
b) Limited (or sparse) directory with broadcast fallback.
c) Snooping bus.
d) Cache bypassing.
Answer: b) Limited (or sparse) directory with broadcast fallback.

34. Which of the following is true about snooping vs. directory protocols?
a) Snooping scales better than directory for large numbers of processors.
b) Directory protocols avoid network-wide broadcasts and scale better for large systems.
c) Both are identical in scalability and implementation.
d) Directory protocols require a broadcast-capable bus.
Answer: b) Directory protocols avoid network-wide broadcasts and scale better for large systems.

35. A coherence protocol that allows multiple readers but only one writer at a time is called:
a) Exclusive-Read protocol.
b) Read-Write lock protocol.
c) Single-writer/multiple-reader coherence scheme (e.g., MESI).
d) Always-write protocol.
Answer: c) Single-writer/multiple-reader coherence scheme (e.g., MESI).

36. Which of the following is a benefit of hardware-supported atomic instructions for synchronization?
a) They are slower than software locks always.
b) They enable lock-free or wait-free algorithms and efficient spinlock implementation.
c) They remove the need for memory fences.
d) They guarantee determinism across threads.
Answer: b) They enable lock-free or wait-free algorithms and efficient spinlock implementation.

37. In multicore processors, cache coherence traffic can be reduced by:
a) Increasing the frequency of flushes.
b) Aligning data structures to avoid false sharing and increasing private data locality.
c) Decreasing cache sizes.
d) Always disabling write-back caches.
Answer: b) Aligning data structures to avoid false sharing and increasing private data locality.

38. What does a MESI protocol’s “Exclusive” state indicate?
a) The block is invalid.
b) The cache line exists only in this cache and is clean (matches main memory).
c) The block is shared among many caches and modified.
d) The line is being evicted.
Answer: b) The cache line exists only in this cache and is clean (matches main memory).

39. Which of these is NOT typically considered a hardware interconnection structure?
a) Bus.
b) Crossbar.
c) Mesh network.
d) Monolithic compiler pass.
Answer: d) Monolithic compiler pass.

40. When multiple cores share an on-chip L3 cache, which effect is likely?
a) Elimination of coherence misses.
b) Improved sharing of hot data and reduced off-chip memory traffic.
c) Necessity to remove all L1 caches.
d) Increased instruction latency by default.
Answer: b) Improved sharing of hot data and reduced off-chip memory traffic.

41. In a shared-memory multiprocessor, what is the purpose of memory barriers (fences)?
a) To improve branch prediction.
b) To enforce ordering constraints on memory operations to meet the consistency model.
c) To flush caches to disk.
d) To allocate virtual memory.
Answer: b) To enforce ordering constraints on memory operations to meet the consistency model.

42. What is a “reduce” operation in parallel programming (e.g., sum across threads)?
a) A data transfer that reduces bandwidth.
b) A collective operation that combines values from multiple processors into one using an associative operation.
c) A cache coherence state.
d) A power-saving mode.
Answer: b) A collective operation that combines values from multiple processors into one using an associative operation.

43. Which of the following best describes a wormhole routing technique in interconnection networks?
a) Entire packet is buffered at each hop before forwarding.
b) Packet is split into small flow-control units (flits) and forwarded as they arrive, reducing buffer requirements.
c) Packets are sent only at night.
d) Packets are never dropped.
Answer: b) Packet is split into small flow-control units (flits) and forwarded as they arrive, reducing buffer requirements.

44. A cache write-back policy implies:
a) Writes immediately update main memory.
b) Writes update only the cache and main memory is updated on eviction.
c) Writes are lost on power failure with no recovery.
d) Cache never marks lines dirty.
Answer: b) Writes update only the cache and main memory is updated on eviction.

45. Which property ensures that every write to a memory location by one processor will eventually be seen by others?
a) Deadlock.
b) Progress (eventual consistency/progress of propagation).
c) Locality.
d) Thrashing.
Answer: b) Progress (eventual consistency/progress of propagation).

46. Which interconnect has the simplest hardware but poorest scalability beyond a few processors?
a) Crossbar.
b) Bus.
c) Multi-stage network.
d) Torus.
Answer: b) Bus.

47. Which synchronization mechanism is usually preferable for short critical sections where contention is expected to be low?
a) Sleeping kernel mutex (heavyweight).
b) Spinlock (busy-wait).
c) Distributed barrier with checkpointing.
d) File-system lock.
Answer: b) Spinlock (busy-wait).

48. In coherence protocols, write-invalidate and write-update are two fundamental strategies. Which is true?
a) Write-update sends updated data to other caches; write-invalidate invalidates their copies.
b) Write-invalidate duplicates updated data to all caches.
c) Write-update invalidates main memory.
d) They are the same mechanism under different names.
Answer: a) Write-update sends updated data to other caches; write-invalidate invalidates their copies.

49. What is the main limitation of directory-based coherence when many processors read a block concurrently?
a) It cannot track readers.
b) Directory storage and update overhead can become large; updates to the directory may become a bottleneck.
c) It forces all reads to go to memory only.
d) It disables write-backs.
Answer: b) Directory storage and update overhead can become large; updates to the directory may become a bottleneck.

50. Which cache replacement policy tries to evict the block that has not been used for the longest time?
a) Random.
b) Least Recently Used (LRU).
c) First-In-First-Out (FIFO).
d) Most Recently Used (MRU).
Answer: b) Least Recently Used (LRU).

51. In a multicore with private L1 caches and a shared L2, a cache miss in L1 will typically:
a) Always fetch from main memory directly.
b) Probe the shared L2 before going to main memory.
c) Cause the L1 to be disabled.
d) Evict the entire L2 cache.
Answer: b) Probe the shared L2 before going to main memory.

52. Which of the following reduces coherence traffic by allowing certain writes to proceed locally and propagate lazily?
a) Write-through with immediate broadcast.
b) Lazy release consistency or relaxed coherence models.
c) Strong sequential consistency strictly enforced per instruction.
d) Removing caches.
Answer: b) Lazy release consistency or relaxed coherence models.

53. The typical function of a memory controller in a multicore system is to:
a) Translate high-level languages.
b) Manage DRAM access scheduling, refresh, and arbitration among requesters.
c) Implement instruction-level parallelism.
d) Store cache directories only.
Answer: b) Manage DRAM access scheduling, refresh, and arbitration among requesters.

54. Which concept describes having multiple hardware threads or contexts per core (e.g., SMT)?
a) Chiplet.
b) Multithreading (Simultaneous Multithreading – SMT).
c) NUMA.
d) Directory caching.
Answer: b) Multithreading (Simultaneous Multithreading – SMT).

55. What is the principal goal of load balancing in multicore systems?
a) To maximize memory accesses to remote nodes.
b) To distribute work across cores to minimize idle time and maximize throughput.
c) To reduce the number of cores used.
d) To increase disk usage.
Answer: b) To distribute work across cores to minimize idle time and maximize throughput.

56. Which interconnect parameter most directly affects latency between two cores?
a) Number of cache levels only.
b) Network diameter (number of hops) and per-hop delay.
c) Size of L2 only.
d) Instruction set architecture.
Answer: b) Network diameter (number of hops) and per-hop delay.

57. Which feature is commonly shared among cores on a single multicore die?
a) Private L1 caches only.
b) Some levels of cache (e.g., L3), memory controllers, and interconnect fabric.
c) Separate DRAM chips per core always.
d) Independent operating systems per core with no shared resources.
Answer: b) Some levels of cache (e.g., L3), memory controllers, and interconnect fabric.

58. In the context of multiprocessor debugging, a Heisenbug is:
a) A bug fixed automatically by adding logging.
b) A bug that changes behavior when you try to observe it (non-deterministic under instrumentation).
c) A hardware failure unrelated to software.
d) A deterministic compile-time error.
Answer: b) A bug that changes behavior when you try to observe it (non-deterministic under instrumentation).

59. Which coherence protocol action must be taken when a processor wants to write to a cache line currently in other caches’ Shared state under MESI?
a) Do nothing; just write locally.
b) Issue an invalidate to other caches and gain exclusive/modified state.
c) Broadcast the write value and keep shared state everywhere.
d) Disable coherence.
Answer: b) Issue an invalidate to other caches and gain exclusive/modified state.

60. Which of these interconnect topologies is a direct network with wrap-around connections in both dimensions, suitable for 2D layouts?
a) Tree network.
b) Torus.
c) Crossbar.
d) Star network.
Answer: b) Torus.

61. What is the major advantage of implementing coherence at the hardware level versus software-managed coherence?
a) Hardware-level coherence can be transparent to the programmer and typically faster due to lower latency actions.
b) Hardware coherence removes the need for caches.
c) Software-managed coherence is always faster.
d) Hardware coherence is easier to scale infinitely.
Answer: a) Hardware-level coherence can be transparent to the programmer and typically faster due to lower latency actions.

62. Which metric quantifies how often cache requests are directed to the next memory level?
a) Hit latency.
b) Miss rate / Miss ratio.
c) Page fault frequency.
d) Instruction per cycle (IPC).
Answer: b) Miss rate / Miss ratio.

63. A read-modify-write atomic operation is necessary when implementing:
a) Read-only data sharing.
b) Locks and some concurrent data structure updates to avoid race conditions.
c) Floating-point arithmetic.
d) Cache replacement.
Answer: b) Locks and some concurrent data structure updates to avoid race conditions.

64. In multiprocessor interconnects, virtual channels are used primarily to:
a) Encrypt packets.
b) Avoid deadlock and provide QoS by separating different traffic classes.
c) Replace physical links.
d) Increase clock speed.
Answer: b) Avoid deadlock and provide QoS by separating different traffic classes.

65. Which statement about write-through caches is true?
a) They never update main memory on writes.
b) They immediately write data to main memory on every cache write, simplifying coherence but increasing memory traffic.
c) They reduce memory traffic compared to write-back always.
d) They imply no cache at all.
Answer: b) They immediately write data to main memory on every cache write, simplifying coherence but increasing memory traffic.

66. Which of the following is a common technique to implement fast interprocessor interrupts or notifications on multicore chips?
a) Use of shared files on disk.
b) Message-signalled interrupts or doorbell registers in the interconnect.
c) Printing to console and polling it.
d) Global time-of-day signals.
Answer: b) Message-signalled interrupts or doorbell registers in the interconnect.

67. Which memory consistency model is the strongest (most intuitive) but hardest to implement efficiently?
a) Weak consistency.
b) Release consistency.
c) Sequential consistency (SC).
d) Relaxed consistency.
Answer: c) Sequential consistency (SC).

68. What technique allows different parts of a cache block to be treated and transferred separately to reduce false sharing effects?
a) Coarse-grain locking.
b) Sub-blocking or word-level coherence.
c) Making cache lines larger always.
d) Disabling coherence.
Answer: b) Sub-blocking or word-level coherence.

69. In a distributed shared memory (DSM) system, the illusion of a shared memory is created by:
a) Hardware only with no software.
b) Software mechanisms that map remote memory accesses to messages between nodes.
c) Using the same DRAM chip for all nodes.
d) Removing page tables.
Answer: b) Software mechanisms that map remote memory accesses to messages between nodes.

70. Which type of cache miss occurs because a block is used for the first time?
a) Conflict miss.
b) Capacity miss.
c) Compulsory (cold) miss.
d) Coherence miss.
Answer: c) Compulsory (cold) miss.

71. A barrier implementation that uses a centralized counter and spinning threads on it can suffer from:
a) Perfect scalability.
b) Contention on the counter variable causing poor scalability.
c) No wait time at all.
d) Infinite memory leaks.
Answer: b) Contention on the counter variable causing poor scalability.

72. The difference between UMA and NUMA systems is primarily:
a) UMA has uniform memory access time for all processors; NUMA has variable access times depending on locality.
b) UMA uses directory coherence only.
c) NUMA has a single shared bus only.
d) UMA is always distributed memory message-passing.
Answer: a) UMA has uniform memory access time for all processors; NUMA has variable access times depending on locality.

73. Which cache coherence scenario produces the highest network traffic?
a) Read-only shared data.
b) Frequent writes to the same shared block causing write-invalidate broadcasts.
c) Private data accessed only by one core.
d) Cold-start only.
Answer: b) Frequent writes to the same shared block causing write-invalidate broadcasts.

74. The primary role of a hardware memory fence (mfence) is to:
a) Flush the instruction cache.
b) Prevent certain memory reorderings by the compiler/hardware to ensure visibility semantics.
c) Stop thread creation.
d) Reboot the processor.
Answer: b) Prevent certain memory reorderings by the compiler/hardware to ensure visibility semantics.

75. In multiprocessor scheduling, work stealing is a strategy where:
a) Idle processors take tasks from busy processors’ queues to balance load.
b) Busy processors steal memory from idle ones.
c) Processors steal cache lines arbitrarily.
d) OS steals time from user processes to run background tasks.
Answer: a) Idle processors take tasks from busy processors’ queues to balance load.

76. Which of the following is an example of a coherence miss?
a) A miss caused by a cache line being evicted for capacity reasons.
b) A miss caused because another processor modified the line and invalidated this cache’s copy.
c) A miss when bringing in a new block never used before.
d) A miss due to pipeline stall.
Answer: b) A miss caused because another processor modified the line and invalidated this cache’s copy.

77. Which topology is most likely to require virtual channels and advanced routing to avoid deadlock in high-performance routers?
a) Single shared bus.
b) Mesh or torus networks with multiple flows.
c) A simple star with no contention.
d) Point-to-point dedicated links with no routing.
Answer: b) Mesh or torus networks with multiple flows.

78. Which of the following best describes “lock-free” data structures?
a) They never use atomic operations.
b) They guarantee that some thread will make progress (system-wide progress) without blocking others.
c) They provide stronger mutual exclusion than locks.
d) They use heavy OS-level blocking primitives.
Answer: b) They guarantee that some thread will make progress (system-wide progress) without blocking others.

79. What is the typical consequence of increasing cache associativity (e.g., from direct-mapped to 8-way)?
a) Increased conflict misses but lower hit time always.
b) Reduced conflict misses at the cost of slightly higher access time and complexity.
c) No change in performance.
d) Eliminates compulsory misses.
Answer: b) Reduced conflict misses at the cost of slightly higher access time and complexity.

80. In interprocessor arbitration, a daisy-chain priority scheme:
a) Gives static priority to processors closer to the arbiter in the chain.
b) Ensures perfect fairness.
c) Is equivalent to round-robin.
d) Requires no wiring.
Answer: a) Gives static priority to processors closer to the arbiter in the chain.

81. Which is a benefit of splitting the last-level cache (LLC) into per-core slices with an on-chip network?
a) Reduced coherence complexity only.
b) Scalability in capacity and bandwidth and better physical layout for large core counts.
c) Elimination of the need for L1 caches.
d) Always lower latency for every access.
Answer: b) Scalability in capacity and bandwidth and better physical layout for large core counts.

82. When evaluating multiprocessor performance, “speedup” is defined as:
a) Absolute number of instructions per second only.
b) The ratio of execution time on one processor to execution time on multiple processors for the same workload.
c) The inverse of IPC.
d) Network throughput divided by memory bandwidth.
Answer: b) The ratio of execution time on one processor to execution time on multiple processors for the same workload.

83. Which hardware feature helps reduce cache coherence overhead for producer-consumer patterns?
a) Hardware transactional memory (HTM) or producer-consumer aware queues (e.g., write-combining buffers).
b) Disabling caches entirely.
c) Using write-through only.
d) Always broadcasting on every read.
Answer: a) Hardware transactional memory (HTM) or producer-consumer aware queues (e.g., write-combining buffers).

84. Which of the following best reduces memory contention on a multicore NUMA system?
a) Randomly allocating pages across nodes.
b) Using first-touch allocation and thread affinity to allocate memory local to the accessing core.
c) Disabling locality optimizations.
d) Moving OS data structures off-chip.
Answer: b) Using first-touch allocation and thread affinity to allocate memory local to the accessing core.

85. Which technique can be used to avoid priority inversion with locks?
a) Ignoring the problem and hoping it disappears.
b) Priority inheritance, where a lower-priority thread holding a lock temporarily inherits a higher priority.
c) Disabling preemption system-wide.
d) Using only spinlocks in all contexts.
Answer: b) Priority inheritance, where a lower-priority thread holding a lock temporarily inherits a higher priority.

86. The difference between snooping and centralized-directory-based coherence is mainly about:
a) Whether caches exist at all.
b) The method of detecting and communicating sharing information — snooping watches bus broadcasts, directory maintains sharer lists.
c) The number of processors being always two.
d) The instruction set used.
Answer: b) The method of detecting and communicating sharing information — snooping watches bus broadcasts, directory maintains sharer lists.

87. Which of the following is true of hardware transactional memory (HTM)?
a) HTM guarantees transactions never abort.
b) HTM aims to execute critical sections speculatively and only serialize on conflicts, simplifying synchronization.
c) HTM replaces caches entirely.
d) HTM makes write-backs immediate.
Answer: b) HTM aims to execute critical sections speculatively and only serialize on conflicts, simplifying synchronization.

88. What is a common cause of scalability bottlenecks in multicore systems executing a shared mutable data structure?
a) Too much read-only data.
b) Contention on shared locks and frequent coherence operations on hot cache lines.
c) Excessively large private caches.
d) Low DRAM latency.
Answer: b) Contention on shared locks and frequent coherence operations on hot cache lines.

89. Which of the following best describes “directory replication” as an optimization?
a) Replicating directory entries across multiple nodes to reduce lookup latency and avoid a single hotspot.
b) Copying caches to disk.
c) Removing directories and using broadcast instead.
d) Duplicating main memory contents.
Answer: a) Replicating directory entries across multiple nodes to reduce lookup latency and avoid a single hotspot.

90. Which memory access pattern tends to perform best on a cache-coherent shared-memory multicore?
a) Random writes to a single shared location by many threads.
b) Partitioned data where each thread works on its private subset with occasional read-only sharing.
c) Constant mutual updates to the same variable by all threads.
d) Constant allocation and deallocation of huge objects across all threads.
Answer: b) Partitioned data where each thread works on its private subset with occasional read-only sharing.

91. In multistage interconnection networks (e.g., Omega network), blocking means:
a) No packets are ever delivered.
b) Two connections may contend for the same internal link causing one to be blocked.
c) All packets have deterministic latency.
d) The network never needs buffering.
Answer: b) Two connections may contend for the same internal link causing one to be blocked.

92. Which is an advantage of hardware prefetching in multicore systems?
a) It always reduces memory bandwidth usage.
b) It can hide memory latency by bringing data into caches before it is requested.
c) It eliminates the need for caches.
d) It guarantees no cache pollution.
Answer: b) It can hide memory latency by bringing data into caches before it is requested.

93. Which approach often reduces coherence overhead for read-mostly shared data?
a) Intensive write-backs for every read.
b) Using read-only replication or making copies per core and updating only rarely with synchronization.
c) Forcing all reads to go to main memory.
d) Disabling caches for those data.
Answer: b) Using read-only replication or making copies per core and updating only rarely with synchronization.

94. The “home node” in a NUMA-directory system typically refers to:
a) The core that created the object only.
b) The memory location or node responsible for storing the canonical copy of a memory block and handling directory requests.
c) A backup disk.
d) The network switch connecting nodes.
Answer: b) The memory location or node responsible for storing the canonical copy of a memory block and handling directory requests.

95. Read-copy-update (RCU) is a synchronization technique that:
a) Provides immediate write-through to all caches.
b) Allows readers to run concurrently with writers by deferring reclamation and using versioned updates.
c) Replaces all locks with barriers.
d) Forces mutual exclusion for all readers.
Answer: b) Allows readers to run concurrently with writers by deferring reclamation and using versioned updates.

96. Which of the following is a downside of aggressive SMT (Simultaneous Multithreading) on a core?
a) It always increases single-thread performance.
b) Resource contention among threads sharing the same core’s execution units and caches may reduce per-thread performance.
c) It eliminates the need for caches.
d) It reduces throughput in all cases.
Answer: b) Resource contention among threads sharing the same core’s execution units and caches may reduce per-thread performance.

97. In interconnection networks, flit-level flow control is used to:
a) Increase packet sizes only.
b) Manage buffer availability and backpressure by splitting packets into flow-control digits (flits).
c) Guarantee infinite buffering.
d) Provide encryption for each packet.
Answer: b) Manage buffer availability and backpressure by splitting packets into flow-control digits (flits).

98. Which technique reduces the frequency of coherence invalidations for writer-dominated sharing?
a) Use a write-update protocol or combine writes at a designated writer (writer-owner) to reduce invalidation storms.
b) Always use read-only replication.
c) Remove caches entirely.
d) Use direct-mapped caches only.
Answer: a) Use a write-update protocol or combine writes at a designated writer (writer-owner) to reduce invalidation storms.

99. Which is often a first step in optimizing an application for multicore execution?
a) Making everything global and shared.
b) Identifying and partitioning independent work (data or task parallelism) to minimize shared-state interactions.
c) Increasing the number of locks arbitrarily.
d) Removing all synchronization.
Answer: b) Identifying and partitioning independent work (data or task parallelism) to minimize shared-state interactions.

100. Which of the following correctly pairs a coherence action with its effect?
a) Invalidate — write data to all caches.
b) Grant-exclusive — allow a processor to become the sole owner for writing a block.
c) Broadcast-read — remove data from memory permanently.
d) Flush — create a copy in all caches simultaneously.
Answer: b) Grant-exclusive — allow a processor to become the sole owner for writing a block.

Related Posts