aepsych.benchmark

Submodules

aepsych.benchmark.benchmark module

class aepsych.benchmark.benchmark.Benchmark(problems, configs, seed=None, n_reps=1, log_every=10)[source]

Bases: object

Benchmark base class.

This class wraps standard functionality for benchmarking models including generating cartesian products of run configurations, running the simulated experiment loop, and logging results.

TODO make a benchmarking tutorial and link/refer to it here.

Initialize benchmark.

Parameters
  • problems (List[Problem]) – Problem objects containing the test function to evaluate.

  • configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.

  • seed (int, optional) – Random seed to use for reproducible benchmarks. Defaults to randomized seeds.

  • n_reps (int, optional) – Number of repetitions to run of each configuration. Defaults to 1.

  • log_every (int, optional) – Logging interval during an experiment. Defaults to logging every 10 trials.

Return type

None

make_benchmark_list(**bench_config)[source]

Generate a list of benchmarks to run from configuration.

This constructs a cartesian product of config dicts using lists at the leaves of the base config

Returns

List of dictionaries, each of which can be passed

to aepsych.config.Config.

Return type

List[dict[str, float]]

materialize_config(config_dict)[source]
property num_benchmarks: int

Return the total number of runs in this benchmark.

Returns

Total number of runs in this benchmark.

Return type

int

make_strat_and_flatconfig(config_dict)[source]
From a config dict, generate a strategy (for running) and

flattened config (for logging)

Parameters

config_dict (Mapping[str, str]) – A run configuration dictionary.

Returns

A tuple containing a strategy

object and a flat config.

Return type

Tuple[SequentialStrategy, Dict[str,str]]

run_experiment(problem, config_dict, seed, rep)[source]

Run one simulated experiment.

Parameters
  • config_dict (Dict[str, str]) – AEPsych configuration to use.

  • seed (int) – Random seed for this run.

  • rep (int) – Index of this repetition.

  • problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as

of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, object]], SequentialStrategy]

run_benchmarks()[source]

Run all the benchmarks, sequentially.

flatten_config(config)[source]

Flatten a config object for logging.

Parameters

config (Config) – AEPsych config object.

Returns

A flat dictionary (that can be used to build a flat pandas data frame).

Return type

Dict[str,str]

log_at(i)[source]

Check if we should log on this trial index.

Parameters

i (int) – Trial index to (maybe) log at.

Returns

True if this trial should be logged.

Return type

bool

log(problem, flatconfig, metrics, trial_id, fit_time, gen_time, rep, seed, final=False)[source]

Log trial data.

Parameters
  • flatconfig (Mapping[str, object]) – Flattened configuration for this benchmark.

  • metrics (Mapping[str, object]) – Metrics to log.

  • trial_id (int) – Current trial index.

  • fit_time (float) – Model fitting duration.

  • gen_time (float) – Candidate selection duration.

  • rep (int) – Repetition index of this trial.

  • final (bool, optional) – Mark this as the final trial in a run? Defaults to False.

  • problem (aepsych.benchmark.problem.Problem) –

  • seed (int) –

Return type

Dict[str, object]

pandas()[source]
Return type

pandas.core.frame.DataFrame

class aepsych.benchmark.benchmark.DerivedValue(args, func)[source]

Bases: object

A class for dynamically generating config values from other config values during benchmarking.

Initialize DerivedValue.

Parameters
  • args (List[Tuple[str]]) – Each tuple in this list is a pair of strings that refer to keys in a nested dictionary.

  • func (Callable) – A function that accepts args as input.

Return type

None

For example, consider the following:

benchmark_config = {
“common”: {

“model”: [“GPClassificationModel”, “FancyNewModelToBenchmark”], “acqf”: “MCLevelSetEstimation”

}, “init_strat”: {

“min_asks”: [10, 20], “generator”: “SobolGenerator”

}, “opt_strat”: {

“generator”: “OptimizeAcqfGenerator”, “min_asks”:

DerivedValue(

[(“init_strat”, “min_asks”), (“common”, “model”)], lambda x,y : 100 - x if y == “GPClassificationModel” else 50 - x)

}

}

Four separate benchmarks would be generated from benchmark_config:
  1. model = GPClassificationModel; init trials = 10; opt trials = 90

  2. model = GPClassificationModel; init trials = 20; opt trials = 80

  3. model = FancyNewModelToBenchmark; init trials = 10; opt trials = 40

  4. model = FancyNewModelToBenchmark; init trials = 20; opt trials = 30

Note that if you can also access problem names into func by including (“problem”, “name”) in args.

aepsych.benchmark.pathos_benchmark module

class aepsych.benchmark.pathos_benchmark.PathosBenchmark(nproc=1, *args, **kwargs)[source]

Bases: aepsych.benchmark.benchmark.Benchmark

Benchmarking class for parallelized benchmarks using pathos

Initialize pathos benchmark.

Parameters

nproc (int, optional) – Number of cores to use. Defaults to 1.

run_experiment(problem, config_dict, seed, rep)[source]

Run one simulated experiment.

Parameters
  • config_dict (Dict[str, Any]) – AEPsych configuration to use.

  • seed (int) – Random seed for this run.

  • rep (int) – Index of this repetition.

  • problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as

of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, Any]], SequentialStrategy]

run_benchmarks()[source]

Run all the benchmarks,

Note that this blocks while waiting for benchmarks to complete. If you would like to start benchmarks and periodically collect partial results, use start_benchmarks and then call collate_benchmarks(wait=False) on some interval.

start_benchmarks()[source]

Start benchmark run.

This does not block: after running it, self.futures holds the status of benchmarks running in parallel.

property is_done: bool

Check if the benchmark is done.

Returns

True if all futures are cleared and benchmark is done.

Return type

bool

collate_benchmarks(wait=False)[source]

Collect benchmark results from completed futures.

Parameters
  • wait (bool, optional) – If true, this method blocks and waits

  • False. (on all futures to complete. Defaults to) –

Return type

None

aepsych.benchmark.pathos_benchmark.run_benchmarks_with_checkpoints(out_path, benchmark_name, problems, configs, global_seed=None, n_chunks=1, n_reps_per_chunk=1, log_every=None, checkpoint_every=60, n_proc=1, serial_debug=False)[source]

Runs a series of benchmarks, saving both final and intermediate results to .csv files. Benchmarks are run in sequential chunks, each of which runs all combinations of problems/configs/reps in parallel. This function should always be used using the “if __name__ == ‘__main__’: …” idiom.

Parameters
  • out_path (str) – The path to save the results to.

  • benchmark_name (str) – A name give to this set of benchmarks. Results will be saved in files named like “out_path/benchmark_name_chunk{chunk_number}_out.csv”

  • problems (List[Problem]) – Problem objects containing the test function to evaluate.

  • configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.

  • global_seed (int, optional) – Global seed to use for reproducible benchmarks. Defaults to randomized seeds.

  • n_chunks (int) – The number of chunks to break the results into. Each chunk will contain at least 1 run of every combination of problem and config.

  • n_reps_per_chunk (int, optional) – Number of repetitions to run each problem/config in each chunk.

  • log_every (int, optional) – Logging interval during an experiment. Defaults to only logging at the end.

  • checkpoint_every (int) – Save intermediate results every checkpoint_every seconds.

  • n_proc (int) – Number of processors to use.

  • serial_debug (bool) – debug serially?

Return type

None

aepsych.benchmark.problem module

class aepsych.benchmark.problem.Problem[source]

Bases: object

Wrapper for a problem or test function. Subclass from this and override f() to define your test function.

n_eval_points = 1000
property eval_grid
property name: str
f(x)[source]
property lb
property ub
property bounds
property metadata: Dict[str, Any]

A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

p(x)[source]

Evaluate response probability from test function.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Response probability at queries points.

Return type

np.ndarray

sample_y(x)[source]

Sample a response from test function.

Parameters

x (np.ndarray) – Points at which to sample.

Returns

A single (bernoulli) sample at points.

Return type

np.ndarray

f_hat(model)[source]

Generate mean predictions from the model over the evaluation grid.

Parameters

model (aepsych.models.base.ModelProtocol) – Model to evaluate.

Returns

Posterior mean from underlying model over the evaluation grid.

Return type

torch.Tensor

property f_true: numpy.ndarray

Evaluate true test function over evaluation grid.

Returns

Values of true test function over evaluation grid.

Return type

torch.Tensor

property p_true: torch.Tensor

Evaluate true response probability over evaluation grid.

Returns

Values of true response probability over evaluation grid.

Return type

torch.Tensor

p_hat(model)[source]

Generate mean predictions from the model over the evaluation grid.

Parameters

model (aepsych.models.base.ModelProtocol) – Model to evaluate.

Returns

Posterior mean from underlying model over the evaluation grid.

Return type

torch.Tensor

evaluate(strat)[source]

Evaluate the strategy with respect to this problem.

Extend this in subclasses to add additional metrics. Metrics include: - mae (mean absolute error), mae (mean absolute error), max_abs_err (max absolute error),

pearson correlation. All of these are computed over the latent variable f and the outcome probability p, w.r.t. the posterior mean. Squared and absolute errors (miae, mise) are also computed in expectation over the posterior, by sampling.

  • Brier score, which measures how well-calibrated the outcome probability is, both at the posterior

    mean (plain brier) and in expectation over the posterior (expected_brier).

Parameters

strat (aepsych.strategy.Strategy) – Strategy to evaluate.

Returns

A dictionary containing metrics and their values.

Return type

Dict[str, float]

class aepsych.benchmark.problem.LSEProblem[source]

Bases: aepsych.benchmark.problem.Problem

Level set estimation problem.

This extends the base problem class to evaluate the LSE/threshold estimate in addition to the function estimate.

threshold = 0.75
property metadata: Dict[str, Any]

A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

property f_threshold
property true_below_threshold: numpy.ndarray

Evaluate whether the true function is below threshold over the eval grid (used for proper scoring and threshold missclassification metric).

evaluate(strat)[source]

Evaluate the model with respect to this problem.

For level set estimation, we add metrics w.r.t. the true threshold: - brier_p_below_{thresh), the brier score w.r.t. p(f(x)<thresh), in contrast to

regular brier, which is the brier score for p(phi(f(x))=1), and the same for misclassification error.

Parameters

strat (aepsych.strategy.Strategy) – Strategy to evaluate.

Returns

A dictionary containing metrics and their values, including parent class metrics.

Return type

Dict[str, float]

aepsych.benchmark.test_functions module

aepsych.benchmark.test_functions.make_songetal_threshfun(x, y)[source]

Generate a synthetic threshold function by interpolation of real data.

Real data is from Dubno et al. 2013, and procedure follows Song et al. 2017, 2018. See make_songetal_testfun for more detail.

Parameters
  • x (np.ndarray) – Frequency

  • y (np.ndarray) – Threshold

Returns

Function that interpolates the given

frequencies and thresholds and returns threshold as a function of frequency.

Return type

Callable[[float], float]

aepsych.benchmark.test_functions.make_songetal_testfun(phenotype='Metabolic', beta=1)[source]

Make an audiometric test function following Song et al. 2017.

To do so,we first compute a threshold by interpolation/extrapolation from real data, then assume a linear psychometric function in intensity with slope beta.

Parameters
  • phenotype (str, optional) – Audiometric phenotype from Dubno et al. 2013. Specifically, one of “Metabolic”, “Sensory”, “Metabolic+Sensory”, or “Older-normal”. Defaults to “Metabolic”.

  • beta (float, optional) – Psychometric function slope. Defaults to 1.

Returns

A test function taking a [b x 2] array of points and returning the psychometric function value at those points.

Return type

Callable[[np.ndarray, bool], np.ndarray]

Raises

AssertionError – if an invalid phenotype is passed.

References

Song, X. D., Garnett, R., & Barbour, D. L. (2017).

Psychometric function estimation by probabilistic classification. The Journal of the Acoustical Society of America, 141(4), 2513–2525. https://doi.org/10.1121/1.4979594

aepsych.benchmark.test_functions.novel_discrimination_testfun(x)[source]

Evaluate novel discrimination test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold. Adding to the difficulty is the fact that the function is minimized at f=0 (or p=0.5), corresponding to discrimination being at chance at zero stimulus intensity.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Value of function at these points.

Return type

np.ndarray

aepsych.benchmark.test_functions.novel_detection_testfun(x)[source]

Evaluate novel detection test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Value of function at these points.

Return type

np.ndarray

aepsych.benchmark.test_functions.discrim_highdim(x)[source]
Parameters

x (numpy.ndarray) –

Return type

numpy.ndarray

aepsych.benchmark.test_functions.modified_hartmann6(X)[source]

The modified Hartmann6 function used in Lyu et al.

Module contents

class aepsych.benchmark.Benchmark(problems, configs, seed=None, n_reps=1, log_every=10)[source]

Bases: object

Benchmark base class.

This class wraps standard functionality for benchmarking models including generating cartesian products of run configurations, running the simulated experiment loop, and logging results.

TODO make a benchmarking tutorial and link/refer to it here.

Initialize benchmark.

Parameters
  • problems (List[Problem]) – Problem objects containing the test function to evaluate.

  • configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.

  • seed (int, optional) – Random seed to use for reproducible benchmarks. Defaults to randomized seeds.

  • n_reps (int, optional) – Number of repetitions to run of each configuration. Defaults to 1.

  • log_every (int, optional) – Logging interval during an experiment. Defaults to logging every 10 trials.

Return type

None

make_benchmark_list(**bench_config)[source]

Generate a list of benchmarks to run from configuration.

This constructs a cartesian product of config dicts using lists at the leaves of the base config

Returns

List of dictionaries, each of which can be passed

to aepsych.config.Config.

Return type

List[dict[str, float]]

materialize_config(config_dict)[source]
property num_benchmarks: int

Return the total number of runs in this benchmark.

Returns

Total number of runs in this benchmark.

Return type

int

make_strat_and_flatconfig(config_dict)[source]
From a config dict, generate a strategy (for running) and

flattened config (for logging)

Parameters

config_dict (Mapping[str, str]) – A run configuration dictionary.

Returns

A tuple containing a strategy

object and a flat config.

Return type

Tuple[SequentialStrategy, Dict[str,str]]

run_experiment(problem, config_dict, seed, rep)[source]

Run one simulated experiment.

Parameters
  • config_dict (Dict[str, str]) – AEPsych configuration to use.

  • seed (int) – Random seed for this run.

  • rep (int) – Index of this repetition.

  • problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as

of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, object]], SequentialStrategy]

run_benchmarks()[source]

Run all the benchmarks, sequentially.

flatten_config(config)[source]

Flatten a config object for logging.

Parameters

config (Config) – AEPsych config object.

Returns

A flat dictionary (that can be used to build a flat pandas data frame).

Return type

Dict[str,str]

log_at(i)[source]

Check if we should log on this trial index.

Parameters

i (int) – Trial index to (maybe) log at.

Returns

True if this trial should be logged.

Return type

bool

log(problem, flatconfig, metrics, trial_id, fit_time, gen_time, rep, seed, final=False)[source]

Log trial data.

Parameters
  • flatconfig (Mapping[str, object]) – Flattened configuration for this benchmark.

  • metrics (Mapping[str, object]) – Metrics to log.

  • trial_id (int) – Current trial index.

  • fit_time (float) – Model fitting duration.

  • gen_time (float) – Candidate selection duration.

  • rep (int) – Repetition index of this trial.

  • final (bool, optional) – Mark this as the final trial in a run? Defaults to False.

  • problem (aepsych.benchmark.problem.Problem) –

  • seed (int) –

Return type

Dict[str, object]

pandas()[source]
Return type

pandas.core.frame.DataFrame

class aepsych.benchmark.DerivedValue(args, func)[source]

Bases: object

A class for dynamically generating config values from other config values during benchmarking.

Initialize DerivedValue.

Parameters
  • args (List[Tuple[str]]) – Each tuple in this list is a pair of strings that refer to keys in a nested dictionary.

  • func (Callable) – A function that accepts args as input.

Return type

None

For example, consider the following:

benchmark_config = {
“common”: {

“model”: [“GPClassificationModel”, “FancyNewModelToBenchmark”], “acqf”: “MCLevelSetEstimation”

}, “init_strat”: {

“min_asks”: [10, 20], “generator”: “SobolGenerator”

}, “opt_strat”: {

“generator”: “OptimizeAcqfGenerator”, “min_asks”:

DerivedValue(

[(“init_strat”, “min_asks”), (“common”, “model”)], lambda x,y : 100 - x if y == “GPClassificationModel” else 50 - x)

}

}

Four separate benchmarks would be generated from benchmark_config:
  1. model = GPClassificationModel; init trials = 10; opt trials = 90

  2. model = GPClassificationModel; init trials = 20; opt trials = 80

  3. model = FancyNewModelToBenchmark; init trials = 10; opt trials = 40

  4. model = FancyNewModelToBenchmark; init trials = 20; opt trials = 30

Note that if you can also access problem names into func by including (“problem”, “name”) in args.

class aepsych.benchmark.PathosBenchmark(nproc=1, *args, **kwargs)[source]

Bases: aepsych.benchmark.benchmark.Benchmark

Benchmarking class for parallelized benchmarks using pathos

Initialize pathos benchmark.

Parameters

nproc (int, optional) – Number of cores to use. Defaults to 1.

run_experiment(problem, config_dict, seed, rep)[source]

Run one simulated experiment.

Parameters
  • config_dict (Dict[str, Any]) – AEPsych configuration to use.

  • seed (int) – Random seed for this run.

  • rep (int) – Index of this repetition.

  • problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as

of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, Any]], SequentialStrategy]

run_benchmarks()[source]

Run all the benchmarks,

Note that this blocks while waiting for benchmarks to complete. If you would like to start benchmarks and periodically collect partial results, use start_benchmarks and then call collate_benchmarks(wait=False) on some interval.

start_benchmarks()[source]

Start benchmark run.

This does not block: after running it, self.futures holds the status of benchmarks running in parallel.

property is_done: bool

Check if the benchmark is done.

Returns

True if all futures are cleared and benchmark is done.

Return type

bool

collate_benchmarks(wait=False)[source]

Collect benchmark results from completed futures.

Parameters
  • wait (bool, optional) – If true, this method blocks and waits

  • False. (on all futures to complete. Defaults to) –

Return type

None

class aepsych.benchmark.Problem[source]

Bases: object

Wrapper for a problem or test function. Subclass from this and override f() to define your test function.

n_eval_points = 1000
property eval_grid
property name: str
f(x)[source]
property lb
property ub
property bounds
property metadata: Dict[str, Any]

A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

p(x)[source]

Evaluate response probability from test function.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Response probability at queries points.

Return type

np.ndarray

sample_y(x)[source]

Sample a response from test function.

Parameters

x (np.ndarray) – Points at which to sample.

Returns

A single (bernoulli) sample at points.

Return type

np.ndarray

f_hat(model)[source]

Generate mean predictions from the model over the evaluation grid.

Parameters

model (aepsych.models.base.ModelProtocol) – Model to evaluate.

Returns

Posterior mean from underlying model over the evaluation grid.

Return type

torch.Tensor

property f_true: numpy.ndarray

Evaluate true test function over evaluation grid.

Returns

Values of true test function over evaluation grid.

Return type

torch.Tensor

property p_true: torch.Tensor

Evaluate true response probability over evaluation grid.

Returns

Values of true response probability over evaluation grid.

Return type

torch.Tensor

p_hat(model)[source]

Generate mean predictions from the model over the evaluation grid.

Parameters

model (aepsych.models.base.ModelProtocol) – Model to evaluate.

Returns

Posterior mean from underlying model over the evaluation grid.

Return type

torch.Tensor

evaluate(strat)[source]

Evaluate the strategy with respect to this problem.

Extend this in subclasses to add additional metrics. Metrics include: - mae (mean absolute error), mae (mean absolute error), max_abs_err (max absolute error),

pearson correlation. All of these are computed over the latent variable f and the outcome probability p, w.r.t. the posterior mean. Squared and absolute errors (miae, mise) are also computed in expectation over the posterior, by sampling.

  • Brier score, which measures how well-calibrated the outcome probability is, both at the posterior

    mean (plain brier) and in expectation over the posterior (expected_brier).

Parameters

strat (aepsych.strategy.Strategy) – Strategy to evaluate.

Returns

A dictionary containing metrics and their values.

Return type

Dict[str, float]

class aepsych.benchmark.LSEProblem[source]

Bases: aepsych.benchmark.problem.Problem

Level set estimation problem.

This extends the base problem class to evaluate the LSE/threshold estimate in addition to the function estimate.

threshold = 0.75
property metadata: Dict[str, Any]

A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

property f_threshold
property true_below_threshold: numpy.ndarray

Evaluate whether the true function is below threshold over the eval grid (used for proper scoring and threshold missclassification metric).

evaluate(strat)[source]

Evaluate the model with respect to this problem.

For level set estimation, we add metrics w.r.t. the true threshold: - brier_p_below_{thresh), the brier score w.r.t. p(f(x)<thresh), in contrast to

regular brier, which is the brier score for p(phi(f(x))=1), and the same for misclassification error.

Parameters

strat (aepsych.strategy.Strategy) – Strategy to evaluate.

Returns

A dictionary containing metrics and their values, including parent class metrics.

Return type

Dict[str, float]

aepsych.benchmark.make_songetal_testfun(phenotype='Metabolic', beta=1)[source]

Make an audiometric test function following Song et al. 2017.

To do so,we first compute a threshold by interpolation/extrapolation from real data, then assume a linear psychometric function in intensity with slope beta.

Parameters
  • phenotype (str, optional) – Audiometric phenotype from Dubno et al. 2013. Specifically, one of “Metabolic”, “Sensory”, “Metabolic+Sensory”, or “Older-normal”. Defaults to “Metabolic”.

  • beta (float, optional) – Psychometric function slope. Defaults to 1.

Returns

A test function taking a [b x 2] array of points and returning the psychometric function value at those points.

Return type

Callable[[np.ndarray, bool], np.ndarray]

Raises

AssertionError – if an invalid phenotype is passed.

References

Song, X. D., Garnett, R., & Barbour, D. L. (2017).

Psychometric function estimation by probabilistic classification. The Journal of the Acoustical Society of America, 141(4), 2513–2525. https://doi.org/10.1121/1.4979594

aepsych.benchmark.novel_detection_testfun(x)[source]

Evaluate novel detection test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Value of function at these points.

Return type

np.ndarray

aepsych.benchmark.novel_discrimination_testfun(x)[source]

Evaluate novel discrimination test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold. Adding to the difficulty is the fact that the function is minimized at f=0 (or p=0.5), corresponding to discrimination being at chance at zero stimulus intensity.

Parameters

x (np.ndarray) – Points at which to evaluate.

Returns

Value of function at these points.

Return type

np.ndarray

aepsych.benchmark.modified_hartmann6(X)[source]

The modified Hartmann6 function used in Lyu et al.

aepsych.benchmark.discrim_highdim(x)[source]
Parameters

x (numpy.ndarray) –

Return type

numpy.ndarray

aepsych.benchmark.run_benchmarks_with_checkpoints(out_path, benchmark_name, problems, configs, global_seed=None, n_chunks=1, n_reps_per_chunk=1, log_every=None, checkpoint_every=60, n_proc=1, serial_debug=False)[source]

Runs a series of benchmarks, saving both final and intermediate results to .csv files. Benchmarks are run in sequential chunks, each of which runs all combinations of problems/configs/reps in parallel. This function should always be used using the “if __name__ == ‘__main__’: …” idiom.

Parameters
  • out_path (str) – The path to save the results to.

  • benchmark_name (str) – A name give to this set of benchmarks. Results will be saved in files named like “out_path/benchmark_name_chunk{chunk_number}_out.csv”

  • problems (List[Problem]) – Problem objects containing the test function to evaluate.

  • configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.

  • global_seed (int, optional) – Global seed to use for reproducible benchmarks. Defaults to randomized seeds.

  • n_chunks (int) – The number of chunks to break the results into. Each chunk will contain at least 1 run of every combination of problem and config.

  • n_reps_per_chunk (int, optional) – Number of repetitions to run each problem/config in each chunk.

  • log_every (int, optional) – Logging interval during an experiment. Defaults to only logging at the end.

  • checkpoint_every (int) – Save intermediate results every checkpoint_every seconds.

  • n_proc (int) – Number of processors to use.

  • serial_debug (bool) – debug serially?

Return type

None