Profile Lucas Sta Maria

I'm a final year Computer Science and Mathematics student at Northeastern University interested in programming languages and developer tooling.

(λx.x x) (λx.x x)

Similarity!

Similarity is a multiplayer web-game that groups players based on their song preferences. When players join a room, they simultaneously rate a set of several songs. The game mathematically groups the players based on the ratings they assigned. It was created as my final project for my high school Linear Algebra class.

Functionality

  • Creating rooms
  • Joining rooms with codes
  • Starting the game if admin
  • Rating songs simultaneously with other players
  • Seeing similarity values to other players
  • Seeing clusters/groups of other players with similar song preferences

Technology

I used Vue and the standard HTML/CSS/JS on the frontend. On the backend, I used Node.js with Express.js to statically serve the files. To add the game functionality with players and rooms, I connected the backend and frontend with Socket.io. When I presented this to my class, it was hosted on a Raspberry Pi.

The Math

The distance between two points in \(R^2\) on an (\(x\), \(y\)) coordinate plane can be calculated with

\[ \sqrt{(x1 - x2)^2 + (y1 - y2)^2} \]

The distance between two points in R3 in an (x, y, z) coordinate space can be calculated with

\[ \sqrt{(x1 - x2)^2 + (y1 - y2)^2 + (z1 - z2)^2} \]

This can naturally be extended to an \(R^n\) space. Just continue performing \((d1 - d2)^2\), summing it with the other dimensions, and square rooting the sum. This is how the similarity score is calculated; the euclidian distance is measured between two points in an \(R^10\) space since there are 10 questions. For each dimension, the minumum and maximum values are 1 and 5 respectively, which match the scores that players can rate a song with. The euclidian distance is calculated for each combination of players.

When the server has the euclidian distances between the players, it begins performing the clustering computations. It represents each player as a node, and the distances between them as weighted edges. Since a smaller value means a stronger connection, we must invert the value. To achieve this, we define a threshold \(T\). If the edge is greater than \(T\), we represent the edge as 0. Else, the edge is represented with \(T - value\). We can then perform clustering on the graph.

We represent the relationships between the players with an \(N^2\) sized adjacency matrix \(A\), representing the weighted graph. From \(A\), we create the diagonal matrix \(D\) by summing the number of entries in each row of \(A\) and placing that on the diagonal entry in the corresponding row of \(D\). We then create the Laplacian matrix \(L\) by performing \(L=D-A\). We then solve for the eigenvalues of \(L\), and their corresponding eigenvectors. We retrieve the second-smallest eigenvalue and its corresponding eigenvector, then discretize the values of the eigenvector to positive or negative. Each entry in the discretized eigenvector corresponds to the ith player. We group players of the same sign together, giving us two groups, or clusters.

If we want to further partition our clusters, we go to the next largest eigenvalue and discretize its corresponding eigenvector. We then combine the eigenvectors and partition players based on their sets of signs. In this game, at most two levels of clustering take place, giving us a maximum of four clusters.