I assume that the readers know the Bayes' rule already. If you are not familiar to it, read any kind of textbook about probability, data science, and machine learning. I recommend the book, which I learned Bayes' rule. Bayesians say that you cannot do inference without making assumptions. Thus, Bayesians also use probabilities to describe inferences. The author in the chapter 2 introduces some rules of probability theory and introduces more about assumptions in inference in the chapter 3.
Efficient Monte Carlo sampling This post is on the extension of the post about Hamiltonian Monte Carlo method. Therefore, I assume the readers already read the post. Overrelaxation also reduces the random property of the Monte Carlo sampling, and speeds up the convergence of the Markov chain.
Gibbs sampling In advance of studying over relaxation, we study Gibbs sampling. In the general case of a system with K variables, a single iteration involves sampling one parameter at a time.
Hard K-means and responsibilities If you did not read the first part of the clustering series. Please go check it out. I use the same data points and this post starts from troubleshooting the hard K-means algorithm in the previous post.
In the previous post, we defined assignment. The equivalent representation of this assignment of points to clusters is given by responsibilities, $r^{(n)}_k$. In the assignment step, we set $r^{(n)}_k$ to one if mean k is the closest mean to datapoint $ {\textbf x}^{(n)}$; otherwise, $r^{(n)}_k$ is zero.
Single neuron is amazing One of the lessons I had during physics program is that we should start to understand small thing deeply however complicated the system which you want to know is. Not just it is easier but also it helps a lot to understand the more complex ones.
Neural network is often compared to black magic. We do not understand why and how exactly so effective it is, but it makes great estimations in some specific matters.
Generative model Generative model is a model for generating all variables including outputs. I will give a very simple example with strong assumptions.
Data $\boldsymbol{x^{(n)} } $ are generated by an unknown matrix, $\boldsymbol{G}$.
$$ \boldsymbol{x} = \boldsymbol{G}~\boldsymbol{s} $$
The goal is to find the source variable $\boldsymbol{s}$.
we assume that the number of sources is equal to the number of observations We assume that the latent variables are independently distributed, with marginal distributions We assume that the vector $\boldsymbol{x}$ is generated without noise for simplicity.