Code Documentation

Evolutionary Optimization Algorithm

class eoa.EOA(population, fitness, **kwargs)[source]

This is a base class acting as an umbrella to perform an evolutionary optimization algorithm.

Parameters:
  • population – The whole possible population as a list
  • fitness – The fitness evaluation. Accepts an OrderedDict of individuals with their corresponding fitness and updates their fitness
  • init_pop – default=`UniformRand`; The python class that initiates the initial population
  • recomb – default=`UniformCrossover`; The python class that defines how to combine parents to produce children
  • mutation – default=`Mutation`; The python class that performs mutation on offspring population
  • termination – default=`MaxGenTermination`; The python class that determines the termination criterion
  • elitism – default=`Elites`; The python class that decides how to handel elitism
  • num_parents – The size of initial parents population
  • parents_porp – default=0.1; The size of initial parents population given as a portion of whole population (only used if num_parents is not given)
  • elits_porp – default=0.2; The porportion of offspring to be replaced by elite parents
  • mutation_prob – The probability that a component will be mutated (default: 0.05)
  • kwargs
class eoa.MaxGenTermination(**kwargs)[source]

Termination condition: Whether the maximum number of generations has been reached or not

class eoa.UniformCrossover(**kwargs)[source]

Recombination procedure.

class eoa.UniformRand(**kwargs)[source]

Initial population initiation.

Hilbert Space based regression

exception NpyProximation.Error(*args)[source]

Generic errors that may occur in the course of a run.

class NpyProximation.FunctionBasis[source]

This class generates two typical basis of functions: Polynomials and Trigonometric

static Fourier(n, deg, l=1.0)[source]

Returns the Fourier basis of degree deg in n variables with period l

Parameters:
  • n – number of variables
  • deg – the maximum degree of trigonometric combinations in the basis
  • l – the period
Returns:

the raw basis consists of trigonometric functions of degrees up to n

static Poly(n, deg)[source]

Returns a basis consisting of polynomials in n variables of degree at most deg.

Parameters:
  • n – number of variables
  • deg – highest degree of polynomials in the basis
Returns:

the raw basis consists of polynomials of degrees up to n

class NpyProximation.FunctionSpace(dim=1, measure=None, basis=None)[source]

A class tha facilitates a few types of computations over function spaces of type \(L_2(X, \mu)\)

Parameters:
  • dim – the dimension of ‘X’ (default: 1)
  • measure – an object of type Measure representing \(\mu\)
  • basis – a finite basis of functions to construct a subset of \(L_2(X, \mu)\)
FormBasis()[source]

Call this method to generate the orthogonal basis corresponding to the given basis. The result will be stored in a property called OrthBase which is a list of function that are orthogonal to each other with respect to the measure measure over the given range domain.

Series(f)[source]

Given a function f, this method finds and returns the coefficients of the series that approximates f as a linear combination of the elements of the orthogonal basis \(B\). In symbols \(\sum_{b\in B}\langle f, b\rangle b\).

Returns:the list of coefficients \(\langle f, b\rangle\) for \(b\in B\)
inner(f, g)[source]

Computes the inner product of the two parameters with respect to the measure measure, i.e., \(\int_Xf\cdot g d\mu\).

Parameters:
  • f – callable
  • g – callable
Returns:

the quantity of \(\int_Xf\cdot g d\mu\)

project(f, g)[source]

Finds the projection of f on g with respect to the inner product induced by the measure measure.

Parameters:
  • f – callable
  • g – callable
Returns:

the quantity of \(\frac{\langle f, g\rangle}{\|g\|_2}g\)

class NpyProximation.HilbertRegressor(deg=3, base=None, meas=None, fspace=None)[source]

Regression using Hilbert Space techniques Scikit-Learn style.

Parameters:
  • deg – int, default=3 The degree of polynomial regression. Only used if base is None
  • base – list, default = None a list of function to form an orthogonal function basis
  • meas – NpyProximation.Measure, default = None the measure to form the \(L_2(\mu)\) space. If None a discrete measure will be constructed based on fit inputs
  • fspace – NpyProximation.FunctionBasis, default = None the function subspace of \(L_2(\mu)\), if None it will be initiated according to self.meas
fit(X, y)[source]
Parameters:
  • X – Training data
  • y – Target values
Returns:

self

predict(X)[source]

Predict using the Hilbert regression method

Parameters:X – Samples
Returns:Returns predicted values
class NpyProximation.Measure(density=None, domain=None)[source]

Constructs a measure \(\mu\) based on density and domain.

Parameters:
  • density

    the density over the domain: + if none is given, it assumes uniform distribution

    • if a callable h is given, then \(d\mu=h(x)dx\)
    • if a dictionary is given, then \(\mu=\sum w_x\delta_x\) a discrete measure. The points \(x\) are the keys of the dictionary (tuples) and the weights \(w_x\) are the values.
  • domain – if density is a dictionary, it will be set by its keys. If callable, then domain must be a list of tuples defining the domain’s box. If None is given, it will be set to \([-1, 1]^n\)
integral(f)[source]

Calculates \(\int_{domain} fd\mu\).

Parameters:f – the integrand
Returns:the value of the integral
norm(p, f)[source]

Computes the norm-p of the f with respect to the current measure, i.e., \((\int_{domain}|f|^p d\mu)^{1/p}\).

Parameters:
  • p – a positive real number
  • f – the function whose norm is desired.
Returns:

\(\|f\|_{p, \mu}\)

class NpyProximation.Regression(points, dim=None)[source]

Given a set of points, i.e., a list of tuples of the equal lengths P, this class computes the best approximation of a function that fits the data, in the following sense:

  • if no extra parameters is provided, meaning that an object is initiated like R = Regression(P) then calling R.fit() returns the linear regression that fits the data.
  • if at initiation the parameter deg=n is set, then R.fit() returns the polynomial regression of degree n.
  • if a basis of functions provided by means of an OrthSystem object (R.SetOrthSys(orth)) then calling R.fit() returns the best approximation that can be found using the basic functions of the orth object.
Parameters:
  • points – a list of points to be fitted or a callable to be approximated
  • dim – dimension of the domain
SetFuncSpc(sys)[source]

Sets the bases of the orthogonal basis

Parameters:sysorthsys.OrthSystem object.
Returns:None

Note

For technical reasons, the measure needs to be given via SetMeasure method. Otherwise, the Lebesque measure on \([-1, 1]^n\) is assumed.

SetMeasure(meas)[source]

Sets the default measure for approximation.

Parameters:meas – a measure.Measure object
Returns:None
fit()[source]

Fits the best curve based on the optional provided orthogonal basis. If no basis is provided, it fits a polynomial of a given degree (at initiation) :return: The fit.

Sensitivity Analysis

Sensitivity analysis of a dataset based on a fit, sklearn style. The core functionality is provided by SALib .

class sensapprx.CorrelationThreshold(threshold=0.7)[source]

Selects a minimal set of features based on a given (Pearson) correlation threshold. The transformer omits the maximum number features with a high correlation and makes sure that the remaining features are not correlated behind the given threshold.

Parameters:threshold – the threshold for selecting correlated pairs.
fit(X, y=None)[source]

Finds the Pearson correlation among all features, selects the pairs with absolute value of correlation above the given threshold and selects a minimal set of features with low correlation

Parameters:
  • X – Training data
  • y – Target values (default: None)
Returns:

self

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X – numpy array of shape [n_samples, n_features]; Training set.
  • y – numpy array of shape [n_samples]; Target values.
Returns:

Transformed array

class sensapprx.SensAprx(n_features_to_select=10, regressor=None, method='sobol', margin=0.2, num_smpl=512, num_levels=6, grid_jump=1, num_resmpl=8, reduce=False, domain=None, probs=None)[source]

Transform data to select the most secretive factors according to a regressor that fits the data.

Parameters:
  • n_features_to_selectint number of top features to be selected
  • regressor – a sklearn style regressor to fit the data for sensitivity analysis
  • methodstr the sensitivity analysis method; defalt ‘sobol’, other options are ‘morris’ and ‘delta-mmnt’
  • margin – domain margine, default: .2
  • num_smpl – number of samples to perform the analysis, default: 512
  • num_levels – number of levels for morris analysis, default: 6
  • grid_jump – grid jump for morris analysis, default: 1
  • num_resmpl – number of resamples for moment independent analysis, default: 10
  • reduce – whether to reduce the data points to uniques and calculate the averages of the target or not, default: False
  • domain – pre-calculated unique points, if none, and reduce is True then unique points will be found
  • probs – pre-calculated values associated to domain points
fit(X, y)[source]

Fits the regressor to the data (X, y) and performs a sensitivity analysis on the result of the regression.

Parameters:
  • X – Training data
  • y – Target values
Returns:

self

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X – numpy array of shape [n_samples, n_features]; Training set.
  • y – numpy array of shape [n_samples]; Target values.
Returns:

Transformed array

Optimized Pipeline Detector

class aml.AML(config=None, length=5, scoring='accuracy', cat_cols=None, surrogates=None, min_random_evals=15, cv=None, check_point='./', stack_res=True, stack_probs=True, stack_decision=True, verbose=1, n_jobs=-1)[source]

A class that accepts a nested dictionary with machine learning libraries as its keys and a dictionary of their parameters and their ranges as value of each key and finds an optimum combination based on training data.

Parameters:
  • config – A dictionary whose keys are scikit-learn-style objects (as strings) and its corresponding values are dictionaries of the parameters and their acceptable ranges/values
  • length – default=5; Maximum number of objects in generated pipelines
  • scoring – default=’accuracy’; The scoring method to be optimized. Must follow the sklearn scoring signature
  • cat_cols – default=None; The list of indices of categorical columns
  • surrogates – default=None; A list of 4-tuples determining surrogates. The first entity of each tuple is a scikit-learn regressor and the 2nd entity is the number of iterations that this surrogate needs to be estimated and optimized. The 3rd is the sampling strategy and the 4th is the scipy.optimize solver
  • min_random_evals – default=15; Number of randomly sampled initial values for hyper parameters
  • cv – default=`ShuffleSplit(n_splits=3, test_size=.25); The cross validation method
  • check_point – default=’./’; The path where the optimization results will be stored
  • stack_res – default=True; StackingEstimator`s `res
  • stack_probs – default=True; StackingEstimator`s `probs
  • stack_decision – default=True; StackingEstimator`s `decision
  • verbose – default=1; Level of output details
  • n_jobs – int, default=-1; number of processes to run in parallel
add_surrogate(estimator, itrs, sampling=None, optim='L-BFGS-B')[source]

Adding a regressor for surrogate optimization procedure.

Parameters:
  • estimator – A scikit-learn style regressor
  • itrs – Number of iterations the estimator needs to be fitted and optimized
  • sampling – default= BoxSample; The sampling strategy (CompactSample, BoxSample or SphereSample)
  • optim – default=’L-BFGS-B’;`scipy.optimize` solver
Returns:

None

eoa_fit(X, y, **kwargs)[source]

Applies evolutionary optimization methods to find an optimum pipeline

Parameters:
  • X – Training data
  • y – Corresponding observations
  • kwargsEOA parameters
Returns:

self

fit(X, y)[source]

Generates and optimizes all legitimate pipelines. The best pipeline can be retrieved from self.best_estimator_

Parameters:
  • X – Training data
  • y – Corresponding observations
Returns:

self

get_top(num=5)[source]

Finds the top n pipelines

Parameters:num – Number of pipelines to be returned
Returns:An OrderedDict of top models
optimize_pipeline(seq, X, y)[source]

Constructs and optimizes a pipeline according to the steps passed through seq which is a tuple of estimators and transformers.

Parameters:
  • seq – the tuple of steps of the pipeline to be optimized
  • X – numpy array of training features
  • y – numpy array of training values
Returns:

the optimized pipeline and its score

types()[source]

Recognizes the type of each estimator to determine proper placement of each

Returns:None
class aml.StackingEstimator(estimator, res=True, probs=True, decision=True)[source]

Meta-transformer for adding predictions and/or class probabilities as synthetic feature(s).

Parameters:
  • estimator – object with fit, predict, and predict_proba methods. The estimator to generate synthetic features from.
  • res – True (default), stacks the final result of estimator
  • probs – True (default), stacks probabilities calculated by estimator
  • decision – True (default), stacks the result of decision function of the estimator
fit(X, y=None, **fit_params)[source]

Fit the StackingEstimator meta-transformer.

Parameters:
  • X – array-like of shape (n_samples, n_features). The training input samples.
  • y – array-like, shape (n_samples,). The target values (integers that correspond to classes in classification, real numbers in regression).
  • fit_params – Other estimator-specific parameters.
Returns:

self, object. Returns a copy of the estimator

set_params(**params)[source]

Sets the sklearn related parameters for the estimator

Parameters:params – parameters to be bassed to the estimator
Returns:self
transform(X)[source]

Transform data by adding two synthetic feature(s).

Parameters:X – numpy ndarray, {n_samples, n_components}. New data, where n_samples is the number of samples and n_components is the number of components.
Returns:X_transformed: array-like, shape (n_samples, n_features + 1) or (n_samples, n_features + 1 + n_classes) for classifier with predict_proba attribute; The transformed feature set.
class aml.Words(letters, last=None, first=None, repeat=False)[source]

This class takes a set as alphabet and generates words of a given length accordingly. A Words instant accepts the following parameters:

Parameters:
  • letters – is a set of letters (symbols) to make up the words
  • last – a subset of letters that are allowed to appear at the end of a word
  • first – a set of words that can only appear at the beginning of a word
  • repeat – whether consecutive occurrence of a letter is allowed
Generate(l)[source]

Generates the set of legitimate words of length l

Parameters:l – int, the length of words
Returns:set of all legitimate words of length l