API Reference

covariance_modification

experiment_design.covariance_modification.iman_connover_transformation(doe, target_correlation, means=None, standard_deviations=None)[source]

Rearrange the values of doe to reduce correlation error while adhering to any marginal constraints of the values such as an LHS

Parameters:

doe (ndarray) – Array with shape (n_sample, n_dim) representing the initial DoE with arbitrary correlation.
target_correlation (ndarray) – Symmetric positive definite correlation matrix with shape (n_dim, n_dim) representing the desired correlation between variables
means (Optional[ndarray]) – Array with shape (n_dim,) representing the means of the marginal distributions. If None, it will be inferred from doe
standard_deviations (Optional[ndarray]) – Array with shape (n_dim,) representing the standard deviations of the marginal distributions. If None, it will be inferred from doe

Return type:

ndarray

Returns:

New DoE with the same shape and values as doe but smaller correlation error wrt. target_correlation

References

R.L. Iman and W.J. Conover (1982). “A distribution-free approach to inducing rank correlation among input variables”

C. Bogoclu (2022). “Local Latin Hypercube Refinement for Uncertainty Quantification and Optimization” Chapter 4.3.2

Examples

>>> from experiment_design.covariance_modification import iman_connover_transformation
>>> import numpy as np
>>> from scipy import stats
>>> np.random.seed(1337)
>>> samples = stats.randint(0, 100).rvs((30, 2))
>>> correlation_error = np.max(np.abs(np.corrcoef(samples, rowvar=False) - np.eye(2)))
>>> new_samples = iman_connover_transformation(samples, np.eye(2))
>>> np.max(np.abs(np.corrcoef(new_samples, rowvar=False) - np.eye(2))) < correlation_error
True
>>> sorted(samples[:, 0]) == sorted(new_samples[:, 0])
True
>>> sorted(samples[:, 1]) == sorted(new_samples[:, 1])
True

experiment_design.covariance_modification.second_moment_transformation(doe, target_correlation, means=None, standard_deviations=None, jitter=1e-06)[source]

Second-moment transformation for achieving the target covariance

Parameters:

doe (ndarray) – Array with shape (n_sample, n_dim) representing the initial design of experiment with arbitrary correlation.
target_correlation (ndarray) – Symmetric positive definite correlation matrix with shape (n_dim, n_dim) representing the desired correlation between variables
means (Optional[ndarray]) – Array with shape (n_dim,) representing the means of the marginal distributions. If None, it will be inferred from doe
standard_deviations (Optional[ndarray]) – Array with shape (n_dim,) representing the standard deviations of the marginal distributions. If None, it will be inferred from doe
jitter (float) – A small positive constant that will be added to the diagonal of the covariance matrix in case it is positive semi-definite to enable Cholesky decomposition.

Return type:

ndarray

Returns:

New DoE with the same shape but different values as doe, that matches the target_correlation exactly.

Examples

>>> from experiment_design.covariance_modification import iman_connover_transformation
>>> import numpy as np
>>> from scipy import stats
>>> np.random.seed(1337)
>>> samples = stats.norm.rvs(size=(50, 2))
>>> new_samples = iman_connover_transformation(samples, np.eye(2))
>>> correlation_error = np.max(np.abs(np.corrcoef(new_samples, rowvar=False) - np.eye(2)))
>>> bool(np.isclose(correlation_error, 0, atol=1e-6))
True

optimize

experiment_design.optimize.random_search(creator, scorer, steps)[source]

Given a DoE creator and scorer, maximize the score by random search.

Parameters:

creator (Callable[[], ndarray]) – DoE creating function.
scorer (Scorer) – DoE scoring function.
steps (int) – Number of steps to search.

Return type:

ndarray

Returns:

The DoE matrix with the best score.

experiment_design.optimize.simulated_annealing_by_perturbation(doe, scorer, steps=1000, cooling_rate=0.95, temperature=25.0, max_steps_without_improvement=25)[source]

Simulated annealing algorithm to maximize the score of a DoE by perturbing the rows of the design matrix along the columns. This kind of perturbation is used to avoid violating the LHS, i.e. to keep the number of filled bins same.

Parameters:

doe (ndarray) – DoE matrix with shape (sample_size, len(variables)).
scorer (Scorer) – Scoring function for the doe. It will be maximized.
steps (int) – Number of steps for the annealing algorithm.
cooling_rate (float) – Annealing parameter to decay temperature.
temperature (float) – Annealing temperature.
max_steps_without_improvement (int) – Limit on the maximum steps to take for exploration before setting the reference matrix to the last best value.

Return type:

ndarray

Returns:

Optimized DoE matrix

References

R.V. Joseph and Y. Hung (2008). “Orthogonal-Maximin Latin Hypercube Designs”

C. Bogoclu (2022). “Local Latin Hypercube Refinement for Uncertainty Quantification and Optimization” Chapter 4.3.2.2

orthogonal_sampling

class experiment_design.orthogonal_sampling.OrthogonalSamplingDesigner(inter_bin_randomness=0.8, non_occupied_bins=False, scorer_factory=None)[source]

Create or extend an orthogonal sampling design. Orthogonal sampling design partitions the design space into bins of equal marginal probability and places samples such that each bin is only filled once for each dimension. If all variables are uniform, orthogonal sampling becomes an LHS.

Parameters:

inter_bin_randomness (float) – Controls the randomness of placed points between the bin bounds. Specifically, 0 means the points are placed at the center of each bin, whereas 1 leads to a random point placement within the bounds. Any other fractions leads to a random placement within that fraction of the bin bounds in each dimension.
non_occupied_bins (bool) – Only relevant for extending the design, i.e. if old points are provided, and if the constraint regarding the number of occupation of each bin has to be violated. True means that each bin is occupied at least once for each dimension, although some bins might be occupied more often. Otherwise, each bin is occupied once or less often, leading to empty bins in some cases.
scorer_factory (Optional[ScorerFactory]) – A factory that creates scorers for the given variables, sample_size and in the cast of an extension, old sampling points. If not passed, a default one will be created, that evaluates the maximum correlation error and minimum pairwise distance. See experiment_design.scorers.create_default_scorer_factory for more details.

References

M.D. McKay, W.J. Conover and R.J. Beckmann (1979). “A comparison of three methods for selecting values of input variables in the analysis of output from a computer code”

A.B. Owen (1992). “Orthogonal arrays for computer experiments, integration and visualization”

C. Bogoclu (2022). “Local Latin Hypercube Refinement for Uncertainty Quantification and Optimization” Chapters 4.3.1 and 5

Examples

>>> from experiment_design import create_continuous_uniform_space, OrthogonalSamplingDesigner
>>> space = create_continuous_uniform_space([-2., -2.], [2., 2.])
>>> designer = OrthogonalSamplingDesigner()
>>> doe1 = designer.design(space, sample_size=20)
>>> doe1.shape
(20, 2)
>>> doe2 = designer.design(space, sample_size=4, old_sample=doe1)
>>> doe2.shape
(4, 2)

design(space, sample_size, old_sample=None, steps=None, initial_optimization_proportion=0.1)

Create or extend a DoE .

Parameters:

space (ParameterSpace) – Determines the dimensions of the resulting sample.
sample_size (int) – The number of points to be created.
old_sample (Optional[ndarray]) – Old DoE matrix with shape (old_sample_size, space.dimensions). If provided, it will be extended with sample_size new points, otherwise a new DoE will be created. In both cases, only the new points will be returned.
steps (Optional[int]) – Number of search steps for improving the DoE quality wrt. the self.scorer_factory.
initial_optimization_proportion (float) – Proportion of steps that will be used to create an initial DoE with a good score. Rest of the steps will be used to optimize the candidate points.

Return type:

ndarray

Returns:

DoE matrix with shape (sample_size, space.dimensions)

random_sampling

class experiment_design.random_sampling.RandomSamplingDesigner(exact_correlation=False, scorer_factory=None)[source]

Create or extend a DoE by randomly sampling from the variable distributions.

Parameters:

exact_correlation (bool) – If True, the correlation matrix of the resulting design will match the target correlation exactly using a second moment transformation. This may lead variables with finite bounds to generate values that are out of bounds. Otherwise, Iman-Connover method will be used, where the values will be kept as is for each variable as they are generated from the marginal distribution. This may lead to some imprecision of the correlation matrix.
scorer_factory (Optional[ScorerFactory]) –
A factory that creates scorers for the given variables, sample_size and in the cast of an extension, old sampling points. If not passed, a default one will be created, that evaluates the maximum correlation error and minimum pairwise distance.See experiment_design.scorers.create_default_scorer_factory for more details.

Examples

>>> from experiment_design import create_continuous_uniform_space, RandomSamplingDesigner
>>> space = create_continuous_uniform_space([-2., -2.], [2., 2.])
>>> designer = RandomSamplingDesigner()
>>> doe1 = designer.design(space, sample_size=20)
>>> doe1.shape
(20, 2)
>>> doe2 = designer.design(space, sample_Size=4, old_sample=doe1)
>>> doe2.shape
(4, 2)

design(space, sample_size, old_sample=None, steps=None, initial_optimization_proportion=0.1)

Create or extend a DoE .

Parameters:

space (ParameterSpace) – Determines the dimensions of the resulting sample.
sample_size (int) – The number of points to be created.
old_sample (Optional[ndarray]) – Old DoE matrix with shape (old_sample_size, space.dimensions). If provided, it will be extended with sample_size new points, otherwise a new DoE will be created. In both cases, only the new points will be returned.
steps (Optional[int]) – Number of search steps for improving the DoE quality wrt. the self.scorer_factory.
initial_optimization_proportion (float) – Proportion of steps that will be used to create an initial DoE with a good score. Rest of the steps will be used to optimize the candidate points.

Return type:

ndarray

Returns:

DoE matrix with shape (sample_size, space.dimensions)

scorers

class experiment_design.scorers.MaxCorrelationScorerFactory(local=True, eps=0.01)[source]

A scorer factory for the maximum absolute correlation error between sampling points.

Parameters:

local (bool) – If True, any points in the old_sample will be ignored, that fall outside the finite bounds of the provided variables. Has no effect if old_sample is None.
eps (float) – A small positive value to improve the stability of the log operation.

class experiment_design.scorers.PairwiseDistanceScorerFactory(local=False)[source]

A scorer factory for the minimum pairwise distance between sampling points.

Warning

Currently, all pair-wise distances are computed greedily. Although this works faster for small sample sizes thanks to the C++ implementation used in scipy.spatial.distance.pdist, it may be memory-inefficient for large sample sizes. Using algorithms like KDTrees could solve this issue. However, we prefer omitting such implementation for the sake of reducing the number of dependencies. You can implement a custom ScorerFactory to circumvent this issue.

Parameters:: local (bool) – If True, any points in the old_sample will be ignored, that fall outside the finite bounds of the provided variables. Has no effect if old_sample is None.
Returns:: A scorer that returns the log minimum pairwise distance divided by the log max distance.

class experiment_design.scorers.WeightedSumScorerFactory(scorer_factories, weights)[source]

A factory that creates a weighted sum of multiple scorers

Parameters:

scorer_factories (list[ScorerFactory]) – These are combined by adding the scores their scorers provide.
weights (Iterable[float]) – Weights to use for combining the scorers. If not passed, each scores will not be weighed.

experiment_design.scorers.create_default_scorer_factory(distance_score_weight=0.9, correlation_score_weight=0.1, local_correlation=True, local_pairwise_distance=False)[source]

Create a scorer factory, which is a weighted sum of maximum correlation error and minimum pairwise distance scorers

Parameters:

distance_score_weight (float) – Weight of the minimum pairwise distance score.
correlation_score_weight (float) – Weight of the maximum correlation error score.
local_correlation (bool) – Controls the local attribute of the MaxCorrelationScorerFactory.
local_pairwise_distance (bool) – Controls the local attribute of the PairwiseDistanceScorerFactory

Return type:

ScorerFactory

Returns:

WeightedSumScorerFactory instance.

References

R.V. Joseph and Y. Hung (2008). “Orthogonal-Maximin Latin Hypercube Designs”

variable

class experiment_design.variable.ContinuousVariable(distribution=None, lower_bound=None, upper_bound=None)[source]

A variable with continuous distribution

Parameters:

distribution (Optional[rv_frozen]) – rv_frozen instance representing the distribution. If None (default), it will be set to uniform between the passed lower_bound and upper_bound
lower_bound (Optional[float]) – Lower bound for the variable. If None (default), left support boundary of the distribution will be used in case the distribution is bounded. Otherwise, distribution.ppf(infinite_bound_probability_tolerance) will be used.
upper_bound (Optional[float]) – Upper bound for the variable. If None (default), right support boundary of the distribution will be used in case the distribution is bounded. Otherwise, distribution.ppf(1 - infinite_bound_probability_tolerance) will be used.

cdf_of(value)[source]

Given a value or an array of values return the probability using the CDF.

Return type:: float | ndarray

finite_lower_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite lower bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

finite_upper_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite upper bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

value_of(probability)[source]

Given a probability or an array of probabilities return the corresponding value(s) using the inverse CDF.

Return type:: float | ndarray

class experiment_design.variable.DiscreteVariable(distribution, value_mapper=<function DiscreteVariable.<lambda>>, inverse_value_mapper=<function DiscreteVariable.<lambda>>)[source]

A variable with discrete distribution

Parameters:

distribution (rv_frozen) – rv_frozen instance representing the distribution. If None (default), it will be set to uniform between the passed lower_bound and upper_bound
value_mapper (Callable[[float], float | int]) – Given an integer, i.e. an ordinal encoding, this is expected to return the corresponding discrete value of the underlying set of possible values. (Default: lambda x: x)
inverse_value_mapper (Callable[[float, int], float]) – Given a discrete value, this is expected to return the corresponding integer value, i.e. ordinal encoding. (Default: lambda x: x)

cdf_of(values)[source]

Given a value or an array of values return the probability using the cdf.

Return type:: float | ndarray

finite_lower_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite lower bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

finite_upper_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite upper bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

value_of(probability)[source]

Given a probability or an array of probabilities return the corresponding value(s) using the inverse cdf.

Return type:: float | ndarray

class experiment_design.variable.ParameterSpace(variables, correlation=None, infinite_bound_probability_tolerance=1e-06)[source]

A container of multiple variables defining a parameter space.

Parameters:

variables (list[Variable | ContinuousVariable | DiscreteVariable] | list[rv_frozen]) – List of variables or marginal distributions that define the marginal parameters
correlation (Union[float, ndarray, None]) – A float or asymmetric matrix with shape (len(variables), len(variables)), representing the linear dependency between the dimensions. If a float is passed, all non-diagonal entries of the unit matrix will be set to this value.
infinite_bound_probability_tolerance (float) – If the variable is unbounded, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)

cdf_of(values)[source]

Given an array of marginal values return the marginal probabilities with shape values.shape using the CDF.

Since it operates on the marginal variables, correlation does not have an effect.

Return type:: ndarray

property dimensions: int: Size of the space, i.e. the number of variables.

property lower_bound: ndarray

Lower bound values of the space with shape (self.dimensions, ).

variable.finite_lower_bound is used to provide finite bounds even for unbounded variables

property upper_bound: ndarray

Upper bound of the space with shape (self.dimensions, ).

variable.finite_upper_bound is used to provide finite bounds even for unbounded variables

value_of(probabilities)[source]

Given an array of marginal probabilities, return the corresponding values with shape probabilities.shape using the inverse marginal CDF.

Since it operates on the marginal variables, correlation does not have an effect.

Return type:: ndarray

class experiment_design.variable.Variable(*args, **kwargs)[source]

A protocol to represent the expected methods of valid Variable objects

cdf_of(value)[source]

Given a value or an array of values return the probability using the cdf.

Return type:: float | ndarray

property distribution: rv_frozen: Distribution of the variable

finite_lower_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite upper bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

finite_upper_bound(infinite_bound_probability_tolerance=1e-06)[source]

Provide a finite upper bound of the variable even if it was not provided by the user.

Parameters:: infinite_bound_probability_tolerance (float) – If the variable is unbounded and no explicit lower_bound was passed, this will be used to extract finite bounds as described in lower_bound and upper_bound descriptions. (Default: 1e-6)
Return type:: float

value_of(probability)[source]

Given a probability or an array of probabilities return the corresponding value(s) using the inverse cdf.

Return type:: float | ndarray

experiment_design.variable.create_continuous_uniform_space(lower_bounds, upper_bounds)[source]

Given lower and upper bounds, create uniformly distributed variables.

Parameters:

lower_bounds (Sequence[float]) – Array with shape (n_dim,) representing the lower bounds of the uniform variables
upper_bounds (Sequence[float]) – Array with shape (n_dim,) representing the upper bounds of the uniform variables

Return type:

ParameterSpace

Returns:

Parameter space consisting of continuous uniform variables with the same size as the passed bounds

experiment_design.variable.create_discrete_uniform_space(discrete_sets)[source]

Given sets of possible values, create corresponding discrete variables with equal probability of each value.

Parameters:: discrete_sets (list[list[int | float]]) – List of possible values for each variable
Return type:: ParameterSpace
Returns:: Parameter space consisting of discrete uniform variables with the same size as the discrete_sets

experiment_design.variable.create_variables_from_distributions(distributions)[source]

Given a list of distributions, create the corresponding continuous or discrete variables.

Parameters:: distributions (list[rv_frozen]) – Frozen scipy distributions each representing a marginal variable
Return type:: list[ContinuousVariable | DiscreteVariable]
Returns:: List of variables according to the passed distributions