Skip to main content

K-Nearest Neighbor(KNN)

K-Nearest Neighbors (KNN) is a simple and widely used machine learning algorithm for classification and regression tasks. It is a lazy learning algorithm, meaning it does not learn a model during training but rather memorizes the dataset and makes predictions based on similarity.

How KNN Works

  1. Choose K: Select the number of nearest neighbors (K).
  2. Calculate Distance: Compute the distance between the new data point and all points in the dataset (commonly using Euclidean distance).
  3. Find Nearest Neighbors: Select the K closest data points.
  4. Make Prediction:
    • Classification: Assign the most common class among the K neighbors.
    • Regression: Compute the average of the K neighbors' values.

Distance Metrics Used

  • Euclidean Distance: ( d = \sqrt{\sum (x_i - y_i)^2} \)
  • Manhattan Distance: ( d = \sum |x_i - y_i| \)
  • Minkowski Distance: A generalized form of both Euclidean and Manhattan distances.

Pros and Cons of KNN

Advantages:

  • Simple and easy to implement.
  • No need for training (instance-based learning).
  • Works well with small datasets.

Disadvantages:

  • Computationally expensive for large datasets.
  • Performance depends on the choice of K.
  • Sensitive to irrelevant or redundant features.

Choosing the Best K

  • A small K (e.g., 1 or 3) makes the model sensitive to noise.
  • A large K smooths decision boundaries but may ignore patterns.
  • Cross-validation helps find the optimal K.

Example in Python (Using Scikit-Learn)

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train KNN model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate model
from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(y_test, y_pred))

citation

K-Nearest Neighbor(KNN) Algorithm

K-Nearest Neighbors (KNN) is a simple way to classify things by looking at what’s nearby. Imagine a streaming service wants to predict if a new user is likely to cancel their subscription (churn) based on their age. They checks the ages of its existing users and whether they churned or stayed. If most of the “K” closest users in age of new user canceled their subscription KNN will predict the new user might churn too. The key idea is that users with similar ages tend to have similar behaviors and KNN uses this closeness to make decisions.

https://www.geeksforgeeks.org/k-nearest-neighbours/