1. What is the main difference between classification and clustering?
A) Both predict numerical outcomes
B) Classification predicts labels, clustering groups data
C) Clustering uses labeled data
D) Classification is unsupervised
Show Explanation
2. Which of the following is a classification technique?
A) k-Means Clustering
B) Hierarchical Clustering
C) Decision Trees
D) Density-Based Clustering
Show Explanation
3. How does k-Nearest Neighbors (k-NN) determine the class of a data point?
A) By averaging feature values
B) By maximizing feature differences
C) By calculating cluster centers
D) By using majority voting from nearest neighbors
Show Explanation
4. What is a key assumption of the Naive Bayes classifier?
A) All features are correlated
B) Features are independent given the class
C) It does not use prior probabilities
D) It requires labeled data for clustering
Show Explanation
5. Which metrics are commonly used for model evaluation in classification?
A) Silhouette score
B) Inertia
C) Cluster count
D) Accuracy, precision, and recall
Show Explanation
6. What is the primary goal of k-Means Clustering?
A) To partition data into k clusters
B) To predict categorical outcomes
C) To evaluate model performance
D) To assess clustering quality
Show Explanation
7. How does Hierarchical Clustering differ from k-Means Clustering?
A) It requires the number of clusters beforehand
B) It creates a dendrogram showing relationships between clusters
C) It only works with numeric data
D) It does not evaluate cluster quality
Show Explanation
8. What is the main characteristic of Density-Based Clustering?
A) Fixed number of clusters
B) Requires labeled data
C) Only works with spherical clusters
D) Identifies clusters based on density
Show Explanation
9. What is the purpose of cluster evaluation?
A) To create new clusters
B) To reduce data dimensionality
C) To assess the quality of clustering
D) To visualize data relationships
Show Explanation
10. What does the silhouette score indicate in clustering?
A) The quality of clustering
B) The dimensionality of data
C) The speed of clustering algorithm
D) The number of clusters used
Show Explanation
11. What is a limitation of k-Means Clustering?
A) It can only cluster binary data
B) It does not require labeled data
C) It is sensitive to initial seed selection
D) It guarantees optimal clustering
Show Explanation
12. What is the purpose of model validation?
A) To reduce data dimensionality
B) To assess the model's generalization ability
C) To increase the model's complexity
D) To visualize data distributions
Show Explanation
13. How does the Random Forest algorithm improve classification accuracy?
A) By using a single decision tree
B) By selecting only one feature
C) By reducing the size of the training set
D) By combining predictions from multiple trees
Show Explanation
14. What does a confusion matrix represent in classification?
A) The distribution of clusters
B) True positives, false positives, etc. for model evaluation
C) The accuracy of clustering
D) The size of training and test sets
Show Explanation
15. Why is clustering important in data analysis?
A) It requires labeled data
B) It simplifies all data to one cluster
C) It uncovers hidden patterns in data
D) It guarantees accurate predictions
Show Explanation
16. What does the elbow method help determine in clustering?
A) The optimal number of clusters
B) The maximum distance between clusters
C) The average size of clusters
D) The dimensionality of data
Show Explanation
17. How can A/B testing be applied in classification model evaluation?
A) It is only applicable to clustering
B) It compares visualizations
C) It tests two models on the same data
D) It reduces the size of datasets
Show Explanation
18. What is cross-validation used for in model evaluation?
A) To increase model complexity
B) To assess the model's performance on different data splits
C) To create more training data
D) To visualize model predictions
Show Explanation
19. What is a key feature of hierarchical clustering?
A) It requires labeled data
B) It only works with numerical data
C) It produces a single partition of the dataset
D) It creates a dendrogram to visualize cluster relationships
Show Explanation
20. What is the main principle of density-based clustering?
A) Groups points based on density
B) Requires predefined number of clusters
C) Only works with spherical clusters
D) Uses distance metrics exclusively
Show Explanation