Response surface models

Response surface models, also known as surrogate models or metamodels, can be used to develop a fast-running approximation to an expensive simulation. para-atm provides the ResponseSurface base class to define general functionality that can be shared across different response surface implementations. A Gaussian process regression capability, powered by scikit-learn, is provided by the SklearnGPRegressor class. Additional response surface model types may be added in the future.

To get started immediately, refer to the Examples section.

Response surface base class

class paraatm.rsm.base.ResponseSurface(X, Y)

Base class for response surface models

Derived classes should:

plot(ax=None, lb=None, ub=None, range_factor=1.1, ci=True, show_data=True, input_vals=None, ivar=None, color=None, legend=True, ncol_leg=3)

Create 1D response surface plots

By default, plot 1D curves of all variables on the same graph, but can also plot individual curves or subsets.

Parameters:
  • ax (matplotlib.axes or None) – Axis to plot on. If None, create new axis.
  • lb (float) – Lower bound, only supported for single-variable plot
  • ub (float) – Upper bound, only supported for single-variable plot
  • range_factor (float) – Factor used to determine default plotting range. The plotting range is determined as a multiple of the data range.
  • ci (bool) – Whether or not to plot confidence intervals for 1-D plots
  • show_data (bool) – Whether to plot the training data. For slice plots, training data are projected
  • input_vals (array) – Values to use for non-varying inputs (default to mean of training data)
  • ivar (int or list of ints) – Variable index to plot. Can be scalar or a list. Default is to plot all variables.
  • color (str) – Plotting color, for single-variable plots only
  • legend (bool) – Whether to include legend
  • ncol_leg (int) – Number of columns to use for formatting legend
surface_plot(ax=None, lb=None, ub=None, range_factor=1.1, ngrid=40, show_data=True, input_vals=None, slice_vars=None, surf_args=None)

Create 2D surface plot

Parameters:
  • ax (matplotlib.axes or None) – Axis to plot on. If None, create new axis
  • lb (array) – Lower bound to use for each variable
  • ub (array) – Upper bound to use for each variable
  • range_factor (float) – Factor used to determine default plotting range. The plotting range is determined as a multiple of the data range.
  • ngrid (int) – Grid density used in each dimension
  • show_data (bool) – Whether to plot training data
  • input_vals (array) – Array of input values, required when plotting a slice of a surface with more than two input variables.
  • slice_vars (tuple) – Length-2 tuple with indices of the variables to plot. Required when the response surface has more than two input variables.
  • surf_args (dict) – Additional keyword arguments to send to Axes3D.plot_surface
Returns:

Return type:

matplotlib.axes

Gaussian process regression

class paraatm.rsm.gp.SklearnGPRegressor(X, Y, noise=False, n_restarts_optimizer=10, alpha=1e-10, kernel=None, optimizer='fmin_l_bfgs_b')

Gaussian Process regression using scikit-learn

__init__(X, Y, noise=False, n_restarts_optimizer=10, alpha=1e-10, kernel=None, optimizer='fmin_l_bfgs_b')

Create GP regression instance and fit to training data

Parameters:
  • X (array) – 2d array of input values, with each row representing a case and each column a variable
  • Y (array) – 1d array of output values
  • noise (bool) – Whether the outputs are observed with noise. If True, the noise variance is estimated as part of the fitting. Additionally, scikit-learn includes the noise in the prediction standard deviation.
  • n_restarts_optimizer (int) – Number of times optimizer is restarted using randomly selected initial points
  • alpha (float) – Value added to diagonal of covariance matrix during fitting. Unlike the noise variance that can be fit through the WhiteKernel, the alpha value is not included in the predicted standard deviation.
  • kernel (kernel object) – Kernel specifying the covariance function of the GP. If None, RBF (a.k.a., squared exponential) is used with unique length parameters for each dimension.
  • optimizer (str, callable, or None) – Passed on to sklearn. In particular, None can be used to fix the kernel parameters instead of optimizing them.
__call__(X, return_stdev=False)

Compute predicted mean and standard deviation (optionally) at specified input(s)

Parameters:
  • X (array, 1d or 2d) – Input values, either as a single point (1d array) or a 2d array containing a set of points
  • return_stdev (bool) – Whether to return standard deviation of the prediction, in addition to the mean.
Returns:

The return value is either a tuple containing the mean and standard deviation (if return_stdev==True) or the mean value only. If ndim(X)==2, then the mean and standard deviations are returned as arrays, otherwise they are floats.

Return type:

tuple, float, or array

fit_rsquared()

Get fit R-squared value

loo_rsquared()

Get leave-one-out R-squared value

Kernel parameters and normalization constants are not changed. The intent is to test holding out the observations but not re-fitting the kernel parameters.

Examples

Here is an example showing a GP fit to a 1D function, with confidence bounds:

import matplotlib.pyplot as plt
import numpy as np
from paraatm.rsm.gp import SklearnGPRegressor

X = np.array([1., 3., 5., 6., 7., 8.])
Y = X * np.sin(X)
X = X[:,np.newaxis] # Make input array 2d

# Use n_restarts_optimizer=0 to get reproducible behavior
gp = SklearnGPRegressor(X, Y, n_restarts_optimizer=0)

gp.plot()

plt.show()

(Source code, png, hires.png, pdf)

../_images/rsm-1.png

The Gaussian Process model can also accommodate noisy data, as shown in the below example:

import matplotlib.pyplot as plt
import numpy as np
from paraatm.rsm.gp import SklearnGPRegressor

ndata = 150
np.random.seed(1)
X = np.linspace(0,10,ndata)
Y = np.sin(X) + np.random.normal(0,.2, size=ndata)
X = X[:,np.newaxis]

gp = SklearnGPRegressor(X, Y, n_restarts_optimizer=0, noise=True)

gp.plot()

plt.show()

(Source code, png, hires.png, pdf)

../_images/rsm-2.png

And an example of a 2D function, using surface_plot():

import matplotlib.pyplot as plt
import numpy as np
from paraatm.rsm.gp import SklearnGPRegressor

np.random.seed(1)
X = np.random.uniform(-2, 2, size=(30,2))
Y = X[:,0] * np.exp(-X[:,0]**2 - X[:,1]**2)

gp = SklearnGPRegressor(X, Y, n_restarts_optimizer=0)

ax = gp.surface_plot()
ax.view_init(17, -70)

plt.show()

(Source code, png, hires.png, pdf)

../_images/rsm-3.png