1. What is the primary purpose of HDFS in the Hadoop ecosystem?
A) To process data in real-time
B) To store large datasets
C) To manage cluster resources
D) To provide a user interface
Show Explanation
2. What does MapReduce do in the Hadoop ecosystem?
A) It stores data in memory
B) It provides a SQL interface for data processing
C) It manages the network of nodes
D) It processes large data sets with a distributed algorithm
Show Explanation
3. What is the role of YARN in the Hadoop ecosystem?
A) To store data
B) To analyze data
C) To manage and allocate resources
D) To visualize data
Show Explanation
4. How does Spark improve data processing compared to Hadoop's MapReduce?
A) By using disk storage exclusively
B) By enabling in-memory data processing
C) By providing a single-threaded model
D) By simplifying data ingestion
Show Explanation
5. What is a characteristic of NoSQL databases such as HBase?
A) They can handle unstructured data
B) They use fixed schemas
C) They support only SQL queries
D) They require complex joins
Show Explanation
6. What is Apache Kafka primarily used for?
A) Batch processing of data
B) Data storage only
C) Running machine learning algorithms
D) Real-time data streaming
Show Explanation
7. What is Apache Storm designed for?
A) Batch processing
B) Data visualization
C) Real-time stream processing
D) Data storage
Show Explanation
8. What type of database is Neo4j?
A) Graph database
B) Relational database
C) Document database
D) Key-value store
Show Explanation
9. What is a limitation of HDFS for real-time applications?
A) It is not designed for low-latency access
B) It cannot store large files
C) It requires a complex setup
D) It only supports text files
Show Explanation
10. What is the core abstraction used by Apache Spark?
A) Tables
B) DataFrames
C) Queues
D) RDDs (Resilient Distributed Datasets)
Show Explanation
11. What is a key feature of HBase?
A) It is scalable and can handle large amounts of data
B) It is a purely in-memory database
C) It requires a fixed schema
D) It supports complex joins natively
Show Explanation
12. What is a characteristic of Cassandra?
A) It is designed for strict consistency
B) It provides high availability with no single point of failure
C) It requires complex SQL queries
D) It is a document-oriented database
Show Explanation
13. What is the main advantage of using graph analytics with Neo4j?
A) It can handle only structured data
B) It is suitable for large batch processes
C) It is limited to time-series data
D) It excels in analyzing relationships between entities
Show Explanation
14. What are the two main phases of the MapReduce programming model?
A) Map and Reduce
B) Input and Output
C) Filter and Aggregate
D) Load and Store
Show Explanation
15. Which programming languages are supported by Apache Spark?
A) Only Java
B) Only Python
D) Only R
C) Java, Scala, Python, and R
Show Explanation
16. How do the components of the Hadoop ecosystem work together?
A) They are all independent and do not interact
B) They provide a single point of failure
C) They only support batch processing
D) They are integrated to support distributed storage and processing
Show Explanation
17. What types of processing can Apache Spark handle?
A) Only batch processing
B) Only real-time processing
C) Both batch and real-time processing
D) Only stream processing
Show Explanation
18. What is a common challenge in real-time data processing?
A) Balancing low latency with high throughput
B) Lack of data sources
C) High cost of storage
D) Limited analysis capabilities
Show Explanation
19. Why is Kafka widely used in big data technologies?
A) It is a relational database
B) It efficiently handles large volumes of real-time data streams
C) It is a file storage system
D) It is only used for batch processing
Show Explanation
20. What is the role of YARN in the Hadoop ecosystem?
A) It manages resources across the Hadoop cluster
B) It stores data on disk
C) It processes data using MapReduce
D) It is a user interface for Hadoop
Show Explanation