TransWikia.com

K-Means Clustered Golf

Code Golf Asked by Magic Octopus Urn on October 27, 2021

K-Means Clustering (Wikipedia)

The task here is rather simple, perform a single iteration of a k-means clustering algorithm on a binary matrix. This is essentially the setup task for the main k-means algorithm, I felt the setup might be easier and entice golfing languages to give it a shot as well. The matrix passed to you will have the following format:

 0 0 0 0 0 0 0 0 1
 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 1
 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 1 0 0 1
 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0
 0 1 0 0 0 1 0 0 0
 0 1 0 0 0 0 0 0 1
 0 0 0 0 0 0 0 0 0

1 represents a point, while 0 represents a lack of points. Your job will be to randomly generate k-1 centroids and perform the initial clustering of the data around the centroids you’ve generated. k is defined as min(grid#Width, grid#Height)-1. The labeling of each centroid should go from 2 until k. For instance, in this scenarios you could generate the following centroids:

Centroid 2 was generated at: (1.0, 4.0)
Centroid 3 was generated at: (1.0, 5.0)
Centroid 4 was generated at: (5.0, 1.0)
Centroid 5 was generated at: (3.0, 3.0)
Centroid 6 was generated at: (0.0, 2.0)
Centroid 7 was generated at: (6.0, 6.0)
Centroid 8 was generated at: (2.0, 6.0)

After generating the centroids, you must loop through each and every point that is marked with a 1, as we can treat the ones marked with 0 as empty space. For each centroid, you must decide which centroid is the closest to the point in question. Here’s the distances for the example:

(0,8) distance from centroid 2 is 4.123105625617661
(0,8) distance from centroid 3 is 3.1622776601683795
(0,8) distance from centroid 4 is 8.602325267042627
(0,8) distance from centroid 5 is 5.830951894845301
(0,8) distance from centroid 6 is 6.0
(0,8) distance from centroid 7 is 6.324555320336759
(0,8) distance from centroid 8 is 2.8284271247461903
(1,1) distance from centroid 2 is 3.0
(1,1) distance from centroid 3 is 4.0
(1,1) distance from centroid 4 is 4.0
(1,1) distance from centroid 5 is 2.8284271247461903
(1,1) distance from centroid 6 is 1.4142135623730951
(1,1) distance from centroid 7 is 7.0710678118654755
(1,1) distance from centroid 8 is 5.0990195135927845
(2,8) distance from centroid 2 is 4.123105625617661
(2,8) distance from centroid 3 is 3.1622776601683795
(2,8) distance from centroid 4 is 7.615773105863909
(2,8) distance from centroid 5 is 5.0990195135927845
(2,8) distance from centroid 6 is 6.324555320336759
(2,8) distance from centroid 7 is 4.47213595499958
(2,8) distance from centroid 8 is 2.0
(3,2) distance from centroid 2 is 2.8284271247461903
(3,2) distance from centroid 3 is 3.605551275463989
(3,2) distance from centroid 4 is 2.23606797749979
(3,2) distance from centroid 5 is 1.0
(3,2) distance from centroid 6 is 3.0
(3,2) distance from centroid 7 is 5.0
(3,2) distance from centroid 8 is 4.123105625617661
(4,5) distance from centroid 2 is 3.1622776601683795
(4,5) distance from centroid 3 is 3.0
(4,5) distance from centroid 4 is 4.123105625617661
(4,5) distance from centroid 5 is 2.23606797749979
(4,5) distance from centroid 6 is 5.0
(4,5) distance from centroid 7 is 2.23606797749979
(4,5) distance from centroid 8 is 2.23606797749979
(4,8) distance from centroid 2 is 5.0
(4,8) distance from centroid 3 is 4.242640687119285
(4,8) distance from centroid 4 is 7.0710678118654755
(4,8) distance from centroid 5 is 5.0990195135927845
(4,8) distance from centroid 6 is 7.211102550927978
(4,8) distance from centroid 7 is 2.8284271247461903
(4,8) distance from centroid 8 is 2.8284271247461903
(6,1) distance from centroid 2 is 5.830951894845301
(6,1) distance from centroid 3 is 6.4031242374328485
(6,1) distance from centroid 4 is 1.0
(6,1) distance from centroid 5 is 3.605551275463989
(6,1) distance from centroid 6 is 6.082762530298219
(6,1) distance from centroid 7 is 5.0
(6,1) distance from centroid 8 is 6.4031242374328485
(7,1) distance from centroid 2 is 6.708203932499369
(7,1) distance from centroid 3 is 7.211102550927978
(7,1) distance from centroid 4 is 2.0
(7,1) distance from centroid 5 is 4.47213595499958
(7,1) distance from centroid 6 is 7.0710678118654755
(7,1) distance from centroid 7 is 5.0990195135927845
(7,1) distance from centroid 8 is 7.0710678118654755
(7,5) distance from centroid 2 is 6.082762530298219
(7,5) distance from centroid 3 is 6.0
(7,5) distance from centroid 4 is 4.47213595499958
(7,5) distance from centroid 5 is 4.47213595499958
(7,5) distance from centroid 6 is 7.615773105863909
(7,5) distance from centroid 7 is 1.4142135623730951
(7,5) distance from centroid 8 is 5.0990195135927845
(8,1) distance from centroid 2 is 7.615773105863909
(8,1) distance from centroid 3 is 8.06225774829855
(8,1) distance from centroid 4 is 3.0
(8,1) distance from centroid 5 is 5.385164807134504
(8,1) distance from centroid 6 is 8.06225774829855
(8,1) distance from centroid 7 is 5.385164807134504
(8,1) distance from centroid 8 is 7.810249675906654
(8,8) distance from centroid 2 is 8.06225774829855
(8,8) distance from centroid 3 is 7.615773105863909
(8,8) distance from centroid 4 is 7.615773105863909
(8,8) distance from centroid 5 is 7.0710678118654755
(8,8) distance from centroid 6 is 10.0
(8,8) distance from centroid 7 is 2.8284271247461903
(8,8) distance from centroid 8 is 6.324555320336759

The final result of the clustering algorithm is that there should be no 1s remaining in the matrix, only centroid numbers. This is why it was important to label the centroids from 2-k+1, allowing us to replace them, as follows:

 0 0 0 0 0 0 0 0 8
 0 6 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 8
 0 0 5 0 0 0 0 0 0
 0 0 0 0 0 5 0 0 7
 0 0 0 0 0 0 0 0 0
 0 4 0 0 0 0 0 0 0
 0 4 0 0 0 7 0 0 0
 0 4 0 0 0 0 0 0 7
 0 0 0 0 0 0 0 0 0

This is the initial clustering layout for 7 centroids on the provided grid, given randomly generated centroids. Your job is to output the clustered version of the binary input grid.

Rules

  • The k-1 centroids must be randomly generated and should be anywhere from (0,0) to (grid#Width, grid#Height).
    • The value of k is min(grid#Width, grid#Height)-1.
    • The centroids generated MUST be numbered from 2 to k.
  • The input format MUST be a grid of 0s and 1s, where 0s represent empty space and 1s represent points.
    • A grid is either a string using 1-char per cell and n as row delimiters or a 2D array.
    • The grid passed is not guaranteed to be square, but it is guaranteed to not be empty.
  • The final output can either use arrays or a delimited string.
  • Shortest code wins, this is .

2 Answers

APL (Dyalog Unicode), 61 50 bytes

⍴⍴,{⍺1+{⊃⍸⍵=⌊/⍵}¨↓⍵}⍸∘.(+.×⍨-)⌊/∘⍴{⍵[1↓⍺?⍴⍵]}∘⍸=⍨

Try it online!

-11 bytes thanks to Bubbler

How it Works

          ⍝ Implicitly take binary matrix G as input
⍸=⍨       ⍝ Yield all (1-indexed) coordinates from (1,1) to (w,h)
⌊/∘⍴      ⍝ k+1=min(w,h)
{...}     ⍝ Choose k centroids:
 ?          ⍝ Randomly choose
 ⍺          ⍝ k+1 distinct indices
 ⍴⍵         ⍝ from the array of w×h possible indices
 1↓         ⍝ Drop the first so there are only k indices
 ⍵[...]     ⍝ index into the array of w×h coordinates
∘.        ⍝ Outer product of
            ⍝ the k centroids with
 ⍳          ⍝ the coordinates of all of the "1"s in G
 (+.×⍨-)    ⍝ with operator: distance squared
{...}¨↓   ⍝ For each row (point)
 ⊃⍸⍵=⌊/⍵    ⍝ Find the index of the centroid with smallest distance
            ⍝ (breaks ties by choosing lower numbers → small bias toward lower centroid numbers)
1+        ⍝ Add 1 to shift centroid numbers to [2,k]
          ⍝ Now we have a list of the centroid numbers for the "1"s, in reading order
⍺         ⍝ Expand to prototype array:
,         ⍝ Prototype array is flattened G (Expand  throws RANK ERROR for non-vector left argument)
⍴     ⍝ Reshape back to
⍴     the shape of G

Answered by fireflame241 on October 27, 2021

Mathematica 109 Bytes

If I understand the question correctly, something like this ought to work. "KMeans" is one of the built in methods to FindClusters and ClusteringComponents.

SparseArray[Thread[(p=Position[#,1])->1+ClusteringComponents[p,Min[d=Dimensions@#]-1,1,Method->"KMeans"]],d]&

Usage: %@in//MatrixForm to get visualization, in is an integer array.

Answered by Kelly Lowder on October 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP