1. Why is data quality important in data preprocessing?
A) It affects the accuracy of analyses and decisions
B) It reduces the size of datasets
C) It increases the complexity of data models
D) It simplifies data visualization
Show Explanation
2. What is data cleaning?
A) Analyzing data to find insights
B) The process of correcting or removing errors in data
C) Merging multiple datasets
D) Formatting data for visualization
Show Explanation
3. What is data integration?
A) Splitting data into smaller sets
B) Analyzing data trends
C) Combining data from different sources
D) Filtering data for specific analyses
Show Explanation
4. What does data reduction aim to achieve?
A) Reducing dataset size while maintaining information
B) Increasing dataset complexity
C) Visualizing data trends
D) Ensuring data accuracy
Show Explanation
5. What is the purpose of data transformation?
A) To increase data volume
B) To remove data points
C) To merge datasets
D) To convert data into a suitable format for analysis
Show Explanation
6. What is data discretization?
A) Analyzing data distributions
B) Converting continuous data into discrete categories
C) Cleaning erroneous data points
D) Merging datasets from different sources
Show Explanation
7. Why is handling missing data important in data preprocessing?
A) It prevents biased results in analyses
B) It increases dataset size
C) It simplifies data visualization
D) It makes data storage easier
Show Explanation
8. What is normalization in data preprocessing?
A) Filtering out noise from data
B) Reducing dataset size
C) Scaling numeric data to a specific range
D) Converting categorical data into numerical values
Show Explanation
9. What is binning in the context of data preprocessing?
A) Removing outliers from data
B) Scaling data to a specific range
C) Transforming categorical data into numerical values
D) Grouping numerical values into bins
Show Explanation
10. Why is outlier removal important in data preprocessing?
A) It increases dataset size
B) It enhances model performance by removing extreme values
C) It helps to combine datasets
D) It simplifies data storage
Show Explanation
11. Which of the following techniques can be part of data transformation?
A) Data cleaning only
B) Data integration only
C) Data discretization only
D) Normalization and encoding
Show Explanation
12. Which technique is commonly used to handle missing data?
A) Deleting all rows with missing data
B) Filling with the median only
C) Filling with the mean or mode
D) Randomly generating values
Show Explanation
13. What is data encoding?
A) Converting categorical data into numerical values
B) Cleaning erroneous data
C) Reducing data dimensions
D) Merging datasets from different sources
Show Explanation
14. What is the main goal of discretization in data preprocessing?
A) To increase data complexity
B) To improve data storage
C) To remove duplicate data points
D) To convert continuous variables into categorical ones
Show Explanation
15. What is standardization in data preprocessing?
A) Removing duplicates
B) Transforming data to have a mean of zero and a standard deviation of one
C) Filtering irrelevant data
D) Merging datasets
Show Explanation
16. What is the purpose of outlier detection methods in data preprocessing?
A) To identify and handle extreme values in data
B) To increase data size
C) To visualize data trends
D) To merge datasets
Show Explanation
17. Why is removing duplicate data important in data preprocessing?
A) It simplifies data storage
B) It increases dataset size
C) It ensures data accuracy and integrity
D) It enhances visualization
Show Explanation
18. Which of the following is NOT a data transformation technique?
A) Normalization
B) Encoding
C) Binning
D) Data collection
Show Explanation
19. Which technique is commonly used for data integration?
A) Data discretization
B) Data cleaning
C) Data transformation
D) Merging datasets
Show Explanation
20. What is the main objective of data reduction in preprocessing?
A) To increase dataset size
B) To simplify datasets while retaining important information
C) To enhance data accuracy
D) To merge datasets
Show Explanation