%pylab inline
Here is the original code:
%load http://www.astroml.org/_downloads/fig_simple_naivebayes.py
"""
Simple Gaussian Naive Bayes Classification
------------------------------------------
Figure 9.2
A decision boundary computed for a simple data set using Gaussian naive Bayes
classification. The line shows the decision boundary, which corresponds to the
curve where a new point has equal posterior probability of being part of each
class. In such a simple case, it is possible to find a classification with
perfect completeness and contamination. This is rarely the case in the real
world.
"""
# Author: Jake VanderPlas
# License: BSD
# The figure produced by this code is published in the textbook
# "Statistics, Data Mining, and Machine Learning in Astronomy" (2013)
# For more information, see http://astroML.github.com
# To report a bug or issue, use the following forum:
# https://groups.google.com/forum/#!forum/astroml-general
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
from sklearn.naive_bayes import GaussianNB
#----------------------------------------------------------------------
# This function adjusts matplotlib settings for a uniform feel in the textbook.
# Note that with usetex=True, fonts are rendered with LaTeX. This may
# result in an error if LaTeX is not installed on your system. In that case,
# you can set usetex to False.
from astroML.plotting import setup_text_plots
setup_text_plots(fontsize=8, usetex=True)
#------------------------------------------------------------
# Simulate some data
np.random.seed(0)
mu1 = [1, 1]
cov1 = 0.3 * np.eye(2)
mu2 = [5, 3]
cov2 = np.eye(2) * np.array([0.4, 0.1])
X = np.concatenate([np.random.multivariate_normal(mu1, cov1, 100),
np.random.multivariate_normal(mu2, cov2, 100)])
y = np.zeros(200)
y[100:] = 1
#------------------------------------------------------------
# Fit the Naive Bayes classifier
clf = GaussianNB()
clf.fit(X, y)
# predict the classification probabilities on a grid
xlim = (-1, 8)
ylim = (-1, 5)
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 71),
np.linspace(ylim[0], ylim[1], 81))
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])
Z = Z[:, 1].reshape(xx.shape)
#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(5, 3.75))
ax = fig.add_subplot(111)
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.binary, zorder=2)
ax.contour(xx, yy, Z, [0.5], colors='k')
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
plt.show()
Preamble: import the modules
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
from sklearn.naive_bayes import GaussianNB
from astroML.plotting import setup_text_plots
setup_text_plots(fontsize=16, usetex=True)
np.eye(d)
is the identity matrix of dimension $d$print np.eye(4)
$$\boldsymbol{\Sigma_1} = \left( \begin{array}{cc} 0.4 & 0 \\ 0 & 0.1 \\ \end{array} \right)$$
#------------------------------------------------------------
# Simulate some data
np.random.seed(0)
mu1 = [1, 1]
cov1 = 0.3 * np.eye(2)
mu2 = [5, 3]
cov2 = np.eye(2) * np.array([0.4, 0.1])
Npts=100
mvn0 = np.random.multivariate_normal(mu1, cov1, Npts)
mvn1 = np.random.multivariate_normal(mu2, cov2, Npts)
np.concatenate
just sticks the arrays end-on-end, so that they are one dataset composed of two populations, which were originally distinct, but are now only distinguishable from their aggregate properties.#np.concatenate?
print shape(mvn0)
print shape([mvn0])
print shape([mvn0, mvn1])
print shape(np.concatenate([mvn0, mvn1]))
X = np.concatenate([mvn0,mvn1], axis=0)
y = np.zeros(200)
y[100:] = 1
print y
Here is the actual machine learning:
'''
class GaussianNB(BaseNB)
| Gaussian Naive Bayes (GaussianNB)
|
| Parameters
| ----------
| X : array-like, shape = [n_samples, n_features]
| Training vector, where n_samples in the number of samples and
| n_features is the number of features.
|
| y : array, shape = [n_samples]
| Target vector relative to X
'''
clf1 = GaussianNB()
clf1.fit(X, y)
print shape(X)
print shape(y)
With a help(clf1)
, you can see all the Parameters (X, y); Attributes (classprior, theta, sigma); and Methods (fit, predict, predict_log_proba, predict_proba, get_params, set_params, score).
The next section of code is all a clever way of making a list of $(x, y, P)$, where $x, y$ are coordinates and $P$ is the probability of classification. Ultimately we simply use this list to make a contour plot with a single contour demarcating $P=0.5$, which is the break-even point for naive Bayes classification.
predict_proba
(self, X) method of sklearn.naive_bayes.GaussianNB instanceReturn probability estimates for the test vector X.
Parameters
X : array-like, shape = [n_samples, n_features]
help(clf.predict_proba)
# predict the classification probabilities on a grid
xlim = (-1, 8)
ylim = (-1, 5)
#help(np.meshgrid):
# Return coordinate matrices from two or more coordinate vectors.
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 71),
np.linspace(ylim[0], ylim[1], 81))
#help(np.c_):
# Translates slice objects to concatenation along the second axis.
#help(ravel):
# Return a flattened array.
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])
Zorig = Z
Z = Z[:, 1].reshape(xx.shape)
print shape(xx)
print shape(xx)[0]*shape(xx)[1]
print shape(xx.ravel())
print shape(np.c_[xx.ravel(), yy.ravel()])
print shape(Zorig)
print shape(Z)
#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(10, 7.5))
ax = fig.add_subplot(111)
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.binary, zorder=2)
ax.contour(xx, yy, Z, [0.5], colors='k')
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
plt.show()
xbins = xx[0, :]
ybins = yy[:, 0]
# plot the result
fig = plt.figure(figsize=(12, 6))
# define axes
ax_Pxy = plt.axes((0.2, 0.34, 0.27, 0.52))
ax_Px = plt.axes((0.2, 0.14, 0.27, 0.2))
ax_Py = plt.axes((0.1, 0.34, 0.1, 0.52))
ax_cb = plt.axes((0.48, 0.34, 0.01, 0.52))
ax_Px_y = [plt.axes((0.65, 0.62, 0.32, 0.23)),
plt.axes((0.65, 0.38, 0.32, 0.23)),
plt.axes((0.65, 0.14, 0.32, 0.23))]
# set axis label formatters
ax_Px_y[0].xaxis.set_major_formatter(NullFormatter())
ax_Px_y[1].xaxis.set_major_formatter(NullFormatter())
ax_Pxy.xaxis.set_major_formatter(NullFormatter())
ax_Pxy.yaxis.set_major_formatter(NullFormatter())
ax_Px.yaxis.set_major_formatter(NullFormatter())
ax_Py.xaxis.set_major_formatter(NullFormatter())
# draw the joint probability
plt.axes(ax_Pxy)
Z *= 1000
plt.imshow(Z, interpolation='nearest', origin='lower', aspect='auto',
extent=[-1, 9, -1, 6], cmap=plt.cm.binary)
cb = plt.colorbar(cax=ax_cb)
cb.set_label('$p(x, y)$')
plt.text(0, 1.02, r'$\times 10^{-3}$',
transform=ax_cb.transAxes)
''' This transform argument is sort of sophisticated...
-gully'''
# draw p(x) distribution
ax_Px.plot(xbins, Z.sum(0), '-k', drawstyle='steps')
# draw p(y) distribution
ax_Py.plot(Z.sum(1), ybins, '-k', drawstyle='steps')
''' Summing along an axis is as easy as H.sum()
The argument of the sum is the dimension over which
the sum is performed. 0 is the x-axis, 1 is the y-axis.
-gully'''
# define axis limits
ax_Pxy.set_xlim(-1, 8)
ax_Pxy.set_ylim(-1, 5)
ax_Px.set_xlim(-1, 8)
ax_Py.set_ylim(-1, 5)
# label axes
ax_Pxy.set_xlabel('$x$')
ax_Pxy.set_ylabel('$y$')
ax_Px.set_xlabel('$x$')
ax_Px.set_ylabel('$p(x)$')
ax_Px.yaxis.set_label_position('right')
ax_Py.set_ylabel('$y$')
ax_Py.set_xlabel('$p(y)$')
ax_Py.xaxis.set_label_position('top')
# draw marginal probabilities
Ngrid = 71
iy = [3 * Ngrid / 4, Ngrid / 2, Ngrid / 4]
''' which slices of the p(x,y) to take.
Here we define the index 0.25, 0.50, 0.75 of the range
p(x | y = 0.5), p(x | y = 1.0), p(x | y = 1.5)
-gully '''
colors = 'rgc'
#print 'type(colors)', type(colors)
''' apparently a string is also a list of characters
-gully'''
axis = ax_Pxy.axis()
for i in range(3):
# overplot range on joint probability
ax_Pxy.plot([-1, 9, -1, 9],
[ybins[iy[i] + 1], ybins[iy[i] + 1],
ybins[iy[i]], ybins[iy[i]]], c=colors[i], lw=1)
Px_y = Z[iy[i]] / Z[iy[i]].sum()
''' Px_y is the same as P( x | y = something )
-gully '''
ax_Px_y[i].plot(xbins, Px_y, drawstyle='steps', c=colors[i])
ax_Px_y[i].yaxis.set_major_formatter(NullFormatter())
''' a-ha, this is why the intermediate plots showed labels on
the y-axis for the right-hand-side plot.
-gully'''
ax_Px_y[i].set_ylabel('$p(x | %.1f)$' % ybins[iy[i]])
ax_Pxy.axis(axis)
''' I don't understand what the above axis call does... just a reset?'''
ax_Px_y[2].set_xlabel('$x$')
ax_Pxy.set_title('Joint Probability')
ax_Px_y[0].set_title('Conditional Probability')
plt.show()
Hmm... this can't be quite right, since the figure will change depending on which $x-$ and $y-$ limits I select. The conditional probably plots are fine, but the marginal probabilities are not normalizable (the area under the curve is infinite). That makes sense because the probability we're plotting is not a probability density, $p$, it's a probability $P$. Anyways, it's not integrable so it's not normalizable. The end.