I continue my little excursion into network science. In the last post, I gave a little introduction to simulating and visualizing undirected networks with community structure in R. In this post I want to explore a method to infer the community structure of a network from its adjacency matrix. That is, given that I know which nodes are connected to each other, I want to infer which nodes belong to the same community. I will focus on the case of two communities, but the method can be extended to three or more.

This post requires basic knowledge of matrix calculus, in particular eigen-decomposition. Understanding how the k-means clustering algorithm works is also useful. The R-code to reproduce all the figures is given at the bottom of the post.

Network science is potentially useful for certain problems in data analysis, and I know close to nothing about it.

In this short post I present my first attempt at network analysis: A minimal example to construct and visualize an artificial undirected network with community structure in R. No network libraries are loaded. Only basic R-functions are used.

The final product of this post is this plot: I am going to walk you through the code to produce the above plots. The complete code is given at the bottom of the post.

As a command-line junky, I find most bibliographic managment software (such as mendeley) too bloated. All I want such software to be capable of is

1. to add a new entry to my bibliography (bibtex), and
2. search through the articles.

Here’s my minimal approach to implementing these two features:

Autocorrelation of a time series can be useful for prediction because the most recent observation of the prediction target contains information about future values. At the same time autocorrelation can play tricks on you because many standard statistical methods implicitely assume independence of measurements at different times.

Imagine you perform a statistical analysis on a time series of stock market data. After some transformation, averaging, and “renormalization” you find that the resulting quantity, let’s call it $x$, behaves as a function of time like $x(t)\propto t^{-\beta}$. Since you are a physicist you get excited because you have just discovered a power law. Physicists love power laws.

Now you analyze some more financial time series using the same technique and find similar behavior. Power laws all over the place. You get even more excited. In the paper about the analysis (which you submit to a physics journal) you may throw buzzwords like “scale-free”, “critical phase transition”, and “universality”. You can also add to your CV that you contributed to the understanding of market dynamics.

I was once told that the reason that such a shape was so commonly used for aeroplane wings was merely that then one could study it mathemtically by just employing the Zhoukowski transformation. I hope that this is not true!

(R. Penrose, “The Road to Reality”, p.150)

Penrose here talks about a complex holomorphic mapping also known as the aerofoil transformation.

The Monty Hall problem goes like this: You are at a game show in front of 3 doors. There is a car behind one door and goats behind the other two doors. After you have made your choice the show master opens one of the remaining two doors, namely one with a goat. You now have the chance to change your initial choice. Do you stick to your door? Do you change? Does it make any difference at all?

The thing is, it does make a difference. If you stick, you have a probability of 1/3 to win the car, but if you change, your chances go up to 2/3. The idea is that the showmaster conveys information about where the car is by opening a wrong door. There are mathematical arguments as to why you should change. If you find that unintuitive, you are in good company. Why should you be smarter after the showmaster has opened a wrong door than before? An intuitive explanation of why the showmaster conveys information is this:

Imagine there were not 3 but 1000 doors to begin with. There is still one car but now there are 999 goats. You get to choose a single door initially. Now the showmaster does not open 1 door but 998 doors, all with goats, and you are again left with your initial choice and one more door. Do you stick to your choice now or do you change to the door the showmaster did not open? He clearly gave you a hint by not opening that particular door.

From Feynman Lectures II 1-1:

If you were standing at arm’s length from somenone and each of you had one percent more electrons than protons, the repelling force would be incredible. How great? Enough to lift the Empire State Building? No! To lift Mount Everest? No! The repulsion would be enough to lift a “weight” equal to that of the entire Earth!

Really? Where is that back of the envelope? Alright here we go:

In the idealized physical world, a rotor is simply a mass $m$ attached to an axis of length $r$, free to move in the plane. Gravity and friction are absent. Such a rotor becomes a kicked rotor if it is periodically hit with a hammer. Every kick transfers momentum $p_k$ to the rotor and the time between successive kicks is $T$.

The current state of the rotor is completely described by the current angle of rotation $\theta$ and the current angular momentum $L$. We shall be interested only in the state of the rotor right after the kick. All quantities evaluated right after the kick are indicated by a prime.

If the rotor rotates at an angular velocity $\omega$, the angle of rotation changes between two kicks by $\omega T$. Only the component of $p_k$ in the direction of motion is transferred. Therefor the angular momentum increases by $p_k r \sin\theta$ as a result of the kick. We have $L' = L + p_k r \sin\theta$ $\theta' = \theta + \omega' T$,

where $\omega'$ is the angular velocity after the kick. We can make the above equations dimensionless (unitless) by dividing the angular momentum $L$ by a typical angular momentum of the system. This typical angular momentum is given by $\hat{L}=rp=rmv=mr^2/T$ where $v=r/T$ is the typical velocity of the system. We divide the $L$ – equation by $\hat{L}$ and get $l' = l + \frac{p_k T}{mr}\sin\theta \equiv l + K\sin\theta$.

The dimensionless angular momentum $l$ is the rotors angular momentum measured in units of the typical angular momentum $\hat{L}$. We have summarized some parameters into the constant $K$.

Now, the (already dimensionless) angle equation can be manipulated a little. We notice that $\omega' T = v' T / r = p'T/mr = L'T/mr^2$ which is the same as $L'/\hat{L}$ which is simply the dimensionless angular momentum after the kick $l'$. So we now have $l' = l + K\sin\theta$ $\theta' = \theta + l'$,

where we can, without loss of generality, apply $\mod 2\pi$ to the angle equation and thus also to the angular momentum equation.

This is the famous Chirikov standard map which describes the behavior of the kicked rotor right after the kicks. It maps a phase space point $(\theta,l)$ before the kick to its corresponding point $(\theta',l')$ after the kick.

What follows is an R-script that chooses 1000 random points on the square $[0,2\pi]\times[0,2\pi]$ and uses each point as the starting point to draw 1000 iterations of the standard map:

smap<-function(t,l) {
K<- 1
for ( i in seq(1e3) ) {
t[i+1]<- (t[i]+l[i]) %% (2*pi)
l[i+1]<- (l[i]+K*sin(t[i+1])) %% (2*pi)
}
cbind(t,l)
}

par(mar=rep(0,4))
plot(NULL,xlim=c(0,2*pi),ylim=c(0,2*pi),axes=F,xlab="",ylab="")
for (i in seq(1000) ) {
t<-c(runif(1)*2*pi,runif(1)*2*pi)
m<-smap( t,t )
points(m[,1],m[,2],col=rgb(runif(1),runif(1),runif(1)) ,pch=".",cex=2)
}


And this is the output: 