This article will summarize the learnings from the following sources:
Classification is a supervised learning technique where the goal is to categorize data into predefined classes or categories.
Types of classification include:
The algorithm learns from a labeled training dataset and predicts the classes of new, unseen data. Examples of classification include:
Algorithms used for classification include Logistic Regression, Decision Trees, Support Vector Machines (SVMs), K-Nearest Neighbors (kNN), and Neural Networks.
Evaluation metrics for classification:
Clustering is an unsupervised learning technique used to group a set of objects or data points into clusters, where objects within the same cluster are more similar to each other than to objects in other clusters. Unlike classification, clustering is used when the data is unlabeled, aiming to uncover inherent structures or patterns. It is versatile and can be applied to various data types, such as numerical, categorical, text, and image data.
Core Components:
Evaluation Metrics: Metrics like silhouette score, Davies–Bouldin index, and within-cluster sum of squares (WCSS) assess clustering performance.
Applications: Customer segmentation, market analysis, image segmentation, document clustering, anomaly detection, and recommendation systems.
Challenges: Handling the curse of dimensionality, noisy data, and the subjective nature of defining similarity metrics.