Figure 3.4

Transform a distribution

In [1]:
%pylab inline 
Populating the interactive namespace from numpy and matplotlib

In [2]:
%load http://www.astroml.org/_downloads/fig_transform_distribution.py
In [5]:
r"""
Transformation of Distribution
------------------------------
Figure 3.4.

An example of transforming a uniform distribution. In the left panel, x
is sampled from a uniform distribution of unit width centered on x = 0.5
(:math:`\mu` = 0 and W = 1; see Section 3.3.1). In the right panel,
the distribution is transformed via y = exp(x). The form of the resulting
pdf is computed from eq. 3.20.
"""
# Author: Jake VanderPlas
# License: BSD
#   The figure produced by this code is published in the textbook
#   "Statistics, Data Mining, and Machine Learning in Astronomy" (2013)
#   For more information, see http://astroML.github.com
#   To report a bug or issue, use the following forum:
#    https://groups.google.com/forum/#!forum/astroml-general
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt

#----------------------------------------------------------------------
# This function adjusts matplotlib settings for a uniform feel in the textbook.
# Note that with usetex=True, fonts are rendered with LaTeX.  This may
# result in an error if LaTeX is not installed on your system.  In that case,
# you can set usetex to False.
from astroML.plotting import setup_text_plots
setup_text_plots(fontsize=16, usetex=True)

#------------------------------------------------------------
# Set up the data
np.random.seed(0)

# create a uniform distribution
uniform_dist = stats.uniform(0, 1)
x_sample = uniform_dist.rvs(1000)
x = np.linspace(-0.5, 1.5, 1000)
Px = uniform_dist.pdf(x)

# transform the data
y_sample = np.exp(x_sample)
y = np.exp(x)
Py = Px / y

#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(10, 5))
fig.subplots_adjust(left=0.11, right=0.95, wspace=0.3, bottom=0.17, top=0.9)

ax = fig.add_subplot(121)
ax.hist(x_sample, 20, histtype='stepfilled', fc='#CCCCCC', normed=True)
ax.plot(x, Px, '-k')
ax.set_xlim(-0.2, 1.2)
ax.set_ylim(0, 1.4001)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.95, 0.95, r'$p_x(x) = {\rm Uniform}(x)$',
        va='top', ha='right',
        transform=ax.transAxes)
ax.set_xlabel('$x$')
ax.set_ylabel('$p_x(x)$')


ax = fig.add_subplot(122)
ax.hist(y_sample, 20, histtype='stepfilled', fc='#CCCCCC', normed=True)
ax.plot(y, Py, '-k')
ax.set_xlim(0.85, 2.9)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.95, 0.95, '$y=\exp(x)$\n$p_y(y)=p_x(\ln y) / y$',
        va='top', ha='right',
        transform=ax.transAxes)
ax.set_xlabel('$y$')
ax.set_ylabel('$p_y(y)$')

plt.show()

I have some questions about the concepts behind this figure:

  • Is Equation 3.20 the same as $p(y) = p(x)\frac{dx}{dy}$ ? --> yes
  • What is the meaning of the subscripts, is it merely to remind us what the variable is?
In [7]:
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
from astroML.plotting import setup_text_plots
setup_text_plots(fontsize=16, usetex=True)

#------------------------------------------------------------
# Set up the data
np.random.seed(0)

# create a uniform distribution
uniform_dist = stats.uniform(0, 1)
x_sample = uniform_dist.rvs(1000)
x = np.linspace(-0.5, 1.5, 1000)
Px = uniform_dist.pdf(x)
In [12]:
# transform the data
#y_sample = np.exp(x_sample)
#y = np.exp(x)
#Py = Px / y  # This is the same as Py = Px / (dy/dx) 

#Let's say the transformation is y(x) = x^2
y_sample = x_sample**2
y = x**2
dy_dx= 2.0*x
Py = Px / (dy_dx)  # This is the same as Py = Px / (dy/dx) 
In [61]:
#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(10,5), facecolor = '#f3f3f3')
fig.subplots_adjust(left=0.10, right=0.95, wspace=0.3, bottom=0.20, top=0.9)
ax = fig.add_subplot(121, axis_bgcolor='#f3f3f3')

ax.hist(x_sample, 20, histtype='stepfilled', fc='#00FF00',alpha=0.2, normed=True)
ax.plot(x, Px, '-k')
ax.set_xlim(-0.2, 1.2)
ax.set_ylim(0, 1.4001)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.30, 0.95, r'$p_x(x) = {\rm Uniform}(x)$',
        va='top', ha='center',
        transform=ax.transAxes)
ax.set_xlabel('$x$')
ax.set_ylabel('$p_x(x)$')
ax = fig.add_subplot(122, axis_bgcolor='#f3f3f3')
ax.hist(y_sample, 20, histtype='stepfilled', fc='#00FF00', alpha=0.2, normed=True)
ax.plot(y, Py, '-k')
ax.set_xlim(0.85, 2.9)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.95, 0.95, '$y=\exp(x)$\n$p(y)dy = p(x)dx $',
        va='top', ha='right',
        transform=ax.transAxes)
ax.set_xlabel('$y$')
ax.set_ylabel('$p_y(y)$')

plt.show()

Recap

So I learned some things. The first, and most important thing, is that my intuition was correct all along. Equation 3.20 is overly complicated. Why do we need to bring $\phi^{-1}(y)$ into this, when we could easily just say $x$? Here's a simpler way to say the same thing: $$ p(y) = p(x) (\frac{dy}{dx})^{-1} $$ or: $$ p(y)dy = p(x)dx $$ I guess this is the "easy to show part", but it took me a little pencil and paper work, and some examples to convince myself that Equation 3.20 is the same, and that there were no hidden Jacobians somewhere. In fact, Equation 3.20 really is just a Jacobian, I think. This example just happens to be in one dimension, so the Jacobian is sort of simple.

What else did I learn?

That's mainly it. Except that the names for background colors are inconsistent in Matplotlib- facecolor, axis_bgcolor, and fc, are examples, all of which appeared in this short text figure.

Addendum- Notice that the units work out. If $p(x)$ is a probability density, then $p(x)dx$ is the probability (dimensionless) of a draw between $x_0$ and $x_0+dx$. This makes sense, since $p(y)$ will have units of $\frac{1}{y}$. So $p(y)dy$ is dimensionless, too.