TransWikia.com

Using KNN to categorise inventory (physical stock items) - is it the best way?

Data Science Asked by tristar8 on September 4, 2021

I’m working on a machine learning problem involving inventory (i.e. physical retail stock), however through the cleaning (outlier removal) process some of the items (via their corresponding transactions) will be removed. Therefore, I thought of using KNN to group similar items into respective categories.

There are 1245 items

The info for each item is

  1. Average Weighted Price
  2. Total Quantity Sold
  3. Total Revenue Achieved
  4. Min Sold per Transaction
  5. Max Sold per Transaction
  6. Min Sell Price
  7. Max Sell Price
  8. Number of Unique Transactions

Am I right in thinking that KNN is a good option – and if so, how do I decide on the number of clusters?

2 Answers

Training: You can use a distance metric to compute the distance between all observations along the dimensions of your observed variables (Avg. Weight. Price, Tot. Quant. Sold, etc.). For each observation or row or sample i, the point with the smallest distance from that observation is the nearest neighbor. The point with the second smallest distance is the 2nd nearest neighbor, and so on.

Prediction: You can find the nearest neighbors for new data by calculating their distances to each point in the training data as above. A predicted label is then assigned, usually by taking the most common label amongst the test data points' k nearest neighbors. Hence k-NN classification:

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(algorithm='auto', 
                           metric='minkowski', # pick a distance metric
                           metric_params=None,
                           n_neighbors=5, # take the majority label from the 5-nearest neighbors
                           p=2, # a hyperparameter required for 'minkowski' distance metric
                           weights='uniform')

knn.fit(train_data, train_labels) 

# Find the predicted class of the test data:
knn.predict(testset_data)

Answered by Dij on September 4, 2021

So your question is on the effectiveness of KNN to categories items based on features you have listed above.

As you might already know, KNN is a unsupervised clustering algorithm which creates K clusters with a minimal intra-cluster variation. This is method can be particularly use for when you know what the number of groups K you need. Also, it is particularly handy if you do not have any labels for categories for all examples.

At the same time, this method isn’t deterministic, which means that groupings do vary after each execution.

From this information, you might get a better idea for yourself as to whether KNN would be useful for this task.

Answered by shepan6 on September 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP