aepsych.benchmark¶

Submodules¶

aepsych.benchmark.benchmark module¶

class aepsych.benchmark.benchmark.Benchmark(problems, configs, seed=None, n_reps=1, log_every=10)[source]¶

Bases: object

Benchmark base class.

This class wraps standard functionality for benchmarking models including generating cartesian products of run configurations, running the simulated experiment loop, and logging results.

TODO make a benchmarking tutorial and link/refer to it here.

Initialize benchmark.

Parameters

problems (List[Problem]) – Problem objects containing the test function to evaluate.
configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.
seed (int, optional) – Random seed to use for reproducible benchmarks. Defaults to randomized seeds.
n_reps (int, optional) – Number of repetitions to run of each configuration. Defaults to 1.
log_every (int, optional) – Logging interval during an experiment. Defaults to logging every 10 trials.

Return type

None

make_benchmark_list(**bench_config)[source]¶

Generate a list of benchmarks to run from configuration.

This constructs a cartesian product of config dicts using lists at the leaves of the base config

Returns

List of dictionaries, each of which can be passed: to aepsych.config.Config.

Return type

List[dict[str, float]]

materialize_config(config_dict)[source]¶

property num_benchmarks: int¶

Return the total number of runs in this benchmark.

Returns: Total number of runs in this benchmark.
Return type: int

make_strat_and_flatconfig(config_dict)[source]¶

From a config dict, generate a strategy (for running) and: flattened config (for logging)

Parameters

config_dict (Mapping[str, str]) – A run configuration dictionary.

Returns

A tuple containing a strategy: object and a flat config.

Return type

Tuple[SequentialStrategy, Dict[str,str]]

run_experiment(problem, config_dict, seed, rep)[source]¶

Run one simulated experiment.

Parameters

config_dict (Dict[str, str]) – AEPsych configuration to use.
seed (int) – Random seed for this run.
rep (int) – Index of this repetition.
problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as: of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, object]], SequentialStrategy]

run_benchmarks()[source]¶: Run all the benchmarks, sequentially.

flatten_config(config)[source]¶

Flatten a config object for logging.

Parameters: config (Config) – AEPsych config object.
Returns: A flat dictionary (that can be used to build a flat pandas data frame).
Return type: Dict[str,str]

log_at(i)[source]¶

Check if we should log on this trial index.

Parameters: i (int) – Trial index to (maybe) log at.
Returns: True if this trial should be logged.
Return type: bool

log(problem, flatconfig, metrics, trial_id, fit_time, gen_time, rep, seed, final=False)[source]¶

Log trial data.

Parameters

flatconfig (Mapping[str, object]) – Flattened configuration for this benchmark.
metrics (Mapping[str, object]) – Metrics to log.
trial_id (int) – Current trial index.
fit_time (float) – Model fitting duration.
gen_time (float) – Candidate selection duration.
rep (int) – Repetition index of this trial.
final (bool, optional) – Mark this as the final trial in a run? Defaults to False.
problem (aepsych.benchmark.problem.Problem) –
seed (int) –

Return type

Dict[str, object]

pandas()[source]¶

Return type: pandas.core.frame.DataFrame

class aepsych.benchmark.benchmark.DerivedValue(args, func)[source]¶

Bases: object

A class for dynamically generating config values from other config values during benchmarking.

Initialize DerivedValue.

Parameters

args (List[Tuple[str]]) – Each tuple in this list is a pair of strings that refer to keys in a nested dictionary.
func (Callable) – A function that accepts args as input.

Return type

None

For example, consider the following:

benchmark_config = {

“common”: {
“model”: [“GPClassificationModel”, “FancyNewModelToBenchmark”], “acqf”: “MCLevelSetEstimation”

}, “init_strat”: {

“min_asks”: [10, 20], “generator”: “SobolGenerator”

}, “opt_strat”: {

“generator”: “OptimizeAcqfGenerator”, “min_asks”:

DerivedValue(
[(“init_strat”, “min_asks”), (“common”, “model”)], lambda x,y : 100 - x if y == “GPClassificationModel” else 50 - x)

}

}

Four separate benchmarks would be generated from benchmark_config:

model = GPClassificationModel; init trials = 10; opt trials = 90
model = GPClassificationModel; init trials = 20; opt trials = 80
model = FancyNewModelToBenchmark; init trials = 10; opt trials = 40
model = FancyNewModelToBenchmark; init trials = 20; opt trials = 30

Note that if you can also access problem names into func by including (“problem”, “name”) in args.

aepsych.benchmark.pathos_benchmark module¶

class aepsych.benchmark.pathos_benchmark.PathosBenchmark(nproc=1, *args, **kwargs)[source]¶

Bases: aepsych.benchmark.benchmark.Benchmark

Benchmarking class for parallelized benchmarks using pathos

Initialize pathos benchmark.

Parameters: nproc (int, optional) – Number of cores to use. Defaults to 1.

run_experiment(problem, config_dict, seed, rep)[source]¶

Run one simulated experiment.

Parameters

config_dict (Dict[str, Any]) – AEPsych configuration to use.
seed (int) – Random seed for this run.
rep (int) – Index of this repetition.
problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as: of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, Any]], SequentialStrategy]

run_benchmarks()[source]¶

Run all the benchmarks,

Note that this blocks while waiting for benchmarks to complete. If you would like to start benchmarks and periodically collect partial results, use start_benchmarks and then call collate_benchmarks(wait=False) on some interval.

start_benchmarks()[source]¶

Start benchmark run.

This does not block: after running it, self.futures holds the status of benchmarks running in parallel.

property is_done: bool¶

Check if the benchmark is done.

Returns: True if all futures are cleared and benchmark is done.
Return type: bool

collate_benchmarks(wait=False)[source]¶

Collect benchmark results from completed futures.

Parameters

wait (bool, optional) – If true, this method blocks and waits
False. (on all futures to complete. Defaults to) –

Return type

None

aepsych.benchmark.pathos_benchmark.run_benchmarks_with_checkpoints(out_path, benchmark_name, problems, configs, global_seed=None, n_chunks=1, n_reps_per_chunk=1, log_every=None, checkpoint_every=60, n_proc=1, serial_debug=False)[source]¶

Runs a series of benchmarks, saving both final and intermediate results to .csv files. Benchmarks are run in sequential chunks, each of which runs all combinations of problems/configs/reps in parallel. This function should always be used using the “if __name__ == ‘__main__’: …” idiom.

Parameters

out_path (str) – The path to save the results to.
benchmark_name (str) – A name give to this set of benchmarks. Results will be saved in files named like “out_path/benchmark_name_chunk{chunk_number}_out.csv”
problems (List[Problem]) – Problem objects containing the test function to evaluate.
configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.
global_seed (int, optional) – Global seed to use for reproducible benchmarks. Defaults to randomized seeds.
n_chunks (int) – The number of chunks to break the results into. Each chunk will contain at least 1 run of every combination of problem and config.
n_reps_per_chunk (int, optional) – Number of repetitions to run each problem/config in each chunk.
log_every (int, optional) – Logging interval during an experiment. Defaults to only logging at the end.
checkpoint_every (int) – Save intermediate results every checkpoint_every seconds.
n_proc (int) – Number of processors to use.
serial_debug (bool) – debug serially?

Return type

None

aepsych.benchmark.problem module¶

class aepsych.benchmark.problem.Problem[source]¶

Bases: object

Wrapper for a problem or test function. Subclass from this and override f() to define your test function.

n_eval_points = 1000¶

property eval_grid¶

property name: str¶

f(x)[source]¶

property lb¶

property ub¶

property bounds¶

property metadata: Dict[str, Any]¶: A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

p(x)[source]¶

Evaluate response probability from test function.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Response probability at queries points.
Return type: np.ndarray

sample_y(x)[source]¶

Sample a response from test function.

Parameters: x (np.ndarray) – Points at which to sample.
Returns: A single (bernoulli) sample at points.
Return type: np.ndarray

f_hat(model)[source]¶

Generate mean predictions from the model over the evaluation grid.

Parameters: model (aepsych.models.base.ModelProtocol) – Model to evaluate.
Returns: Posterior mean from underlying model over the evaluation grid.
Return type: torch.Tensor

property f_true: numpy.ndarray¶

Evaluate true test function over evaluation grid.

Returns: Values of true test function over evaluation grid.
Return type: torch.Tensor

property p_true: torch.Tensor¶

Evaluate true response probability over evaluation grid.

Returns: Values of true response probability over evaluation grid.
Return type: torch.Tensor

p_hat(model)[source]¶

Generate mean predictions from the model over the evaluation grid.

Parameters: model (aepsych.models.base.ModelProtocol) – Model to evaluate.
Returns: Posterior mean from underlying model over the evaluation grid.
Return type: torch.Tensor

evaluate(strat)[source]¶

Evaluate the strategy with respect to this problem.

Extend this in subclasses to add additional metrics. Metrics include: - mae (mean absolute error), mae (mean absolute error), max_abs_err (max absolute error),

pearson correlation. All of these are computed over the latent variable f and the outcome probability p, w.r.t. the posterior mean. Squared and absolute errors (miae, mise) are also computed in expectation over the posterior, by sampling.

Brier score, which measures how well-calibrated the outcome probability is, both at the posterior
mean (plain brier) and in expectation over the posterior (expected_brier).

Parameters: strat (aepsych.strategy.Strategy) – Strategy to evaluate.
Returns: A dictionary containing metrics and their values.
Return type: Dict[str, float]

class aepsych.benchmark.problem.LSEProblem[source]¶

Bases: aepsych.benchmark.problem.Problem

Level set estimation problem.

This extends the base problem class to evaluate the LSE/threshold estimate in addition to the function estimate.

threshold = 0.75¶

property metadata: Dict[str, Any]¶: A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

property f_threshold¶

property true_below_threshold: numpy.ndarray¶: Evaluate whether the true function is below threshold over the eval grid (used for proper scoring and threshold missclassification metric).

evaluate(strat)[source]¶

Evaluate the model with respect to this problem.

For level set estimation, we add metrics w.r.t. the true threshold: - brier_p_below_{thresh), the brier score w.r.t. p(f(x)<thresh), in contrast to

regular brier, which is the brier score for p(phi(f(x))=1), and the same for misclassification error.

Parameters: strat (aepsych.strategy.Strategy) – Strategy to evaluate.
Returns: A dictionary containing metrics and their values, including parent class metrics.
Return type: Dict[str, float]

aepsych.benchmark.test_functions module¶

aepsych.benchmark.test_functions.make_songetal_threshfun(x, y)[source]¶

Generate a synthetic threshold function by interpolation of real data.

Real data is from Dubno et al. 2013, and procedure follows Song et al. 2017, 2018. See make_songetal_testfun for more detail.

Parameters

x (np.ndarray) – Frequency
y (np.ndarray) – Threshold

Returns

Function that interpolates the given: frequencies and thresholds and returns threshold as a function of frequency.

Return type

Callable[[float], float]

aepsych.benchmark.test_functions.make_songetal_testfun(phenotype='Metabolic', beta=1)[source]¶

Make an audiometric test function following Song et al. 2017.

To do so,we first compute a threshold by interpolation/extrapolation from real data, then assume a linear psychometric function in intensity with slope beta.

Parameters

phenotype (str, optional) – Audiometric phenotype from Dubno et al. 2013. Specifically, one of “Metabolic”, “Sensory”, “Metabolic+Sensory”, or “Older-normal”. Defaults to “Metabolic”.
beta (float, optional) – Psychometric function slope. Defaults to 1.

Returns

A test function taking a [b x 2] array of points and returning the psychometric function value at those points.

Return type

Callable[[np.ndarray, bool], np.ndarray]

Raises

AssertionError – if an invalid phenotype is passed.

References

Song, X. D., Garnett, R., & Barbour, D. L. (2017).: Psychometric function estimation by probabilistic classification. The Journal of the Acoustical Society of America, 141(4), 2513–2525. https://doi.org/10.1121/1.4979594

aepsych.benchmark.test_functions.novel_discrimination_testfun(x)[source]¶

Evaluate novel discrimination test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold. Adding to the difficulty is the fact that the function is minimized at f=0 (or p=0.5), corresponding to discrimination being at chance at zero stimulus intensity.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Value of function at these points.
Return type: np.ndarray

aepsych.benchmark.test_functions.novel_detection_testfun(x)[source]¶

Evaluate novel detection test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Value of function at these points.
Return type: np.ndarray

aepsych.benchmark.test_functions.discrim_highdim(x)[source]¶

Parameters: x (numpy.ndarray) –
Return type: numpy.ndarray

aepsych.benchmark.test_functions.modified_hartmann6(X)[source]¶: The modified Hartmann6 function used in Lyu et al.

Module contents¶

class aepsych.benchmark.Benchmark(problems, configs, seed=None, n_reps=1, log_every=10)[source]¶

Bases: object

Benchmark base class.

This class wraps standard functionality for benchmarking models including generating cartesian products of run configurations, running the simulated experiment loop, and logging results.

TODO make a benchmarking tutorial and link/refer to it here.

Initialize benchmark.

Parameters

problems (List[Problem]) – Problem objects containing the test function to evaluate.
configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.
seed (int, optional) – Random seed to use for reproducible benchmarks. Defaults to randomized seeds.
n_reps (int, optional) – Number of repetitions to run of each configuration. Defaults to 1.
log_every (int, optional) – Logging interval during an experiment. Defaults to logging every 10 trials.

Return type

None

make_benchmark_list(**bench_config)[source]¶

Generate a list of benchmarks to run from configuration.

This constructs a cartesian product of config dicts using lists at the leaves of the base config

Returns

List of dictionaries, each of which can be passed: to aepsych.config.Config.

Return type

List[dict[str, float]]

materialize_config(config_dict)[source]¶

property num_benchmarks: int¶

Return the total number of runs in this benchmark.

Returns: Total number of runs in this benchmark.
Return type: int

make_strat_and_flatconfig(config_dict)[source]¶

From a config dict, generate a strategy (for running) and: flattened config (for logging)

Parameters

config_dict (Mapping[str, str]) – A run configuration dictionary.

Returns

A tuple containing a strategy: object and a flat config.

Return type

Tuple[SequentialStrategy, Dict[str,str]]

run_experiment(problem, config_dict, seed, rep)[source]¶

Run one simulated experiment.

Parameters

config_dict (Dict[str, str]) – AEPsych configuration to use.
seed (int) – Random seed for this run.
rep (int) – Index of this repetition.
problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as: of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, object]], SequentialStrategy]

run_benchmarks()[source]¶: Run all the benchmarks, sequentially.

flatten_config(config)[source]¶

Flatten a config object for logging.

Parameters: config (Config) – AEPsych config object.
Returns: A flat dictionary (that can be used to build a flat pandas data frame).
Return type: Dict[str,str]

log_at(i)[source]¶

Check if we should log on this trial index.

Parameters: i (int) – Trial index to (maybe) log at.
Returns: True if this trial should be logged.
Return type: bool

log(problem, flatconfig, metrics, trial_id, fit_time, gen_time, rep, seed, final=False)[source]¶

Log trial data.

Parameters

flatconfig (Mapping[str, object]) – Flattened configuration for this benchmark.
metrics (Mapping[str, object]) – Metrics to log.
trial_id (int) – Current trial index.
fit_time (float) – Model fitting duration.
gen_time (float) – Candidate selection duration.
rep (int) – Repetition index of this trial.
final (bool, optional) – Mark this as the final trial in a run? Defaults to False.
problem (aepsych.benchmark.problem.Problem) –
seed (int) –

Return type

Dict[str, object]

pandas()[source]¶

Return type: pandas.core.frame.DataFrame

class aepsych.benchmark.DerivedValue(args, func)[source]¶

Bases: object

A class for dynamically generating config values from other config values during benchmarking.

Initialize DerivedValue.

Parameters

args (List[Tuple[str]]) – Each tuple in this list is a pair of strings that refer to keys in a nested dictionary.
func (Callable) – A function that accepts args as input.

Return type

None

For example, consider the following:

benchmark_config = {

“common”: {
“model”: [“GPClassificationModel”, “FancyNewModelToBenchmark”], “acqf”: “MCLevelSetEstimation”

}, “init_strat”: {

“min_asks”: [10, 20], “generator”: “SobolGenerator”

}, “opt_strat”: {

“generator”: “OptimizeAcqfGenerator”, “min_asks”:

DerivedValue(
[(“init_strat”, “min_asks”), (“common”, “model”)], lambda x,y : 100 - x if y == “GPClassificationModel” else 50 - x)

}

}

Four separate benchmarks would be generated from benchmark_config:

model = GPClassificationModel; init trials = 10; opt trials = 90
model = GPClassificationModel; init trials = 20; opt trials = 80
model = FancyNewModelToBenchmark; init trials = 10; opt trials = 40
model = FancyNewModelToBenchmark; init trials = 20; opt trials = 30

Note that if you can also access problem names into func by including (“problem”, “name”) in args.

class aepsych.benchmark.PathosBenchmark(nproc=1, *args, **kwargs)[source]¶

Bases: aepsych.benchmark.benchmark.Benchmark

Benchmarking class for parallelized benchmarks using pathos

Initialize pathos benchmark.

Parameters: nproc (int, optional) – Number of cores to use. Defaults to 1.

run_experiment(problem, config_dict, seed, rep)[source]¶

Run one simulated experiment.

Parameters

config_dict (Dict[str, Any]) – AEPsych configuration to use.
seed (int) – Random seed for this run.
rep (int) – Index of this repetition.
problem (aepsych.benchmark.problem.Problem) –

Returns

A tuple containing a log of the results and the strategy as: of the end of the simulated experiment. This is ignored in large-scale benchmarks but useful for one-off visualization.

Return type

Tuple[List[Dict[str, Any]], SequentialStrategy]

run_benchmarks()[source]¶

Run all the benchmarks,

Note that this blocks while waiting for benchmarks to complete. If you would like to start benchmarks and periodically collect partial results, use start_benchmarks and then call collate_benchmarks(wait=False) on some interval.

start_benchmarks()[source]¶

Start benchmark run.

This does not block: after running it, self.futures holds the status of benchmarks running in parallel.

property is_done: bool¶

Check if the benchmark is done.

Returns: True if all futures are cleared and benchmark is done.
Return type: bool

collate_benchmarks(wait=False)[source]¶

Collect benchmark results from completed futures.

Parameters

wait (bool, optional) – If true, this method blocks and waits
False. (on all futures to complete. Defaults to) –

Return type

None

class aepsych.benchmark.Problem[source]¶

Bases: object

Wrapper for a problem or test function. Subclass from this and override f() to define your test function.

n_eval_points = 1000¶

property eval_grid¶

property name: str¶

f(x)[source]¶

property lb¶

property ub¶

property bounds¶

property metadata: Dict[str, Any]¶: A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

p(x)[source]¶

Evaluate response probability from test function.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Response probability at queries points.
Return type: np.ndarray

sample_y(x)[source]¶

Sample a response from test function.

Parameters: x (np.ndarray) – Points at which to sample.
Returns: A single (bernoulli) sample at points.
Return type: np.ndarray

f_hat(model)[source]¶

Generate mean predictions from the model over the evaluation grid.

Parameters: model (aepsych.models.base.ModelProtocol) – Model to evaluate.
Returns: Posterior mean from underlying model over the evaluation grid.
Return type: torch.Tensor

property f_true: numpy.ndarray¶

Evaluate true test function over evaluation grid.

Returns: Values of true test function over evaluation grid.
Return type: torch.Tensor

property p_true: torch.Tensor¶

Evaluate true response probability over evaluation grid.

Returns: Values of true response probability over evaluation grid.
Return type: torch.Tensor

p_hat(model)[source]¶

Generate mean predictions from the model over the evaluation grid.

Parameters: model (aepsych.models.base.ModelProtocol) – Model to evaluate.
Returns: Posterior mean from underlying model over the evaluation grid.
Return type: torch.Tensor

evaluate(strat)[source]¶

Evaluate the strategy with respect to this problem.

Extend this in subclasses to add additional metrics. Metrics include: - mae (mean absolute error), mae (mean absolute error), max_abs_err (max absolute error),

pearson correlation. All of these are computed over the latent variable f and the outcome probability p, w.r.t. the posterior mean. Squared and absolute errors (miae, mise) are also computed in expectation over the posterior, by sampling.

Brier score, which measures how well-calibrated the outcome probability is, both at the posterior
mean (plain brier) and in expectation over the posterior (expected_brier).

Parameters: strat (aepsych.strategy.Strategy) – Strategy to evaluate.
Returns: A dictionary containing metrics and their values.
Return type: Dict[str, float]

class aepsych.benchmark.LSEProblem[source]¶

Bases: aepsych.benchmark.problem.Problem

Level set estimation problem.

This extends the base problem class to evaluate the LSE/threshold estimate in addition to the function estimate.

threshold = 0.75¶

property metadata: Dict[str, Any]¶: A dictionary of metadata passed to the Benchmark to be logged. Each key will become a column in the Benchmark’s output dataframe, with its associated value stored in each row.

property f_threshold¶

property true_below_threshold: numpy.ndarray¶: Evaluate whether the true function is below threshold over the eval grid (used for proper scoring and threshold missclassification metric).

evaluate(strat)[source]¶

Evaluate the model with respect to this problem.

For level set estimation, we add metrics w.r.t. the true threshold: - brier_p_below_{thresh), the brier score w.r.t. p(f(x)<thresh), in contrast to

regular brier, which is the brier score for p(phi(f(x))=1), and the same for misclassification error.

Parameters: strat (aepsych.strategy.Strategy) – Strategy to evaluate.
Returns: A dictionary containing metrics and their values, including parent class metrics.
Return type: Dict[str, float]

aepsych.benchmark.make_songetal_testfun(phenotype='Metabolic', beta=1)[source]¶

Make an audiometric test function following Song et al. 2017.

To do so,we first compute a threshold by interpolation/extrapolation from real data, then assume a linear psychometric function in intensity with slope beta.

Parameters

phenotype (str, optional) – Audiometric phenotype from Dubno et al. 2013. Specifically, one of “Metabolic”, “Sensory”, “Metabolic+Sensory”, or “Older-normal”. Defaults to “Metabolic”.
beta (float, optional) – Psychometric function slope. Defaults to 1.

Returns

A test function taking a [b x 2] array of points and returning the psychometric function value at those points.

Return type

Callable[[np.ndarray, bool], np.ndarray]

Raises

AssertionError – if an invalid phenotype is passed.

References

Song, X. D., Garnett, R., & Barbour, D. L. (2017).: Psychometric function estimation by probabilistic classification. The Journal of the Acoustical Society of America, 141(4), 2513–2525. https://doi.org/10.1121/1.4979594

aepsych.benchmark.novel_detection_testfun(x)[source]¶

Evaluate novel detection test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Value of function at these points.
Return type: np.ndarray

aepsych.benchmark.novel_discrimination_testfun(x)[source]¶

Evaluate novel discrimination test function from Owen et al.

The threshold is roughly parabolic with context, and the slope varies with the threshold. Adding to the difficulty is the fact that the function is minimized at f=0 (or p=0.5), corresponding to discrimination being at chance at zero stimulus intensity.

Parameters: x (np.ndarray) – Points at which to evaluate.
Returns: Value of function at these points.
Return type: np.ndarray

aepsych.benchmark.modified_hartmann6(X)[source]¶: The modified Hartmann6 function used in Lyu et al.

aepsych.benchmark.discrim_highdim(x)[source]¶

Parameters: x (numpy.ndarray) –
Return type: numpy.ndarray

aepsych.benchmark.run_benchmarks_with_checkpoints(out_path, benchmark_name, problems, configs, global_seed=None, n_chunks=1, n_reps_per_chunk=1, log_every=None, checkpoint_every=60, n_proc=1, serial_debug=False)[source]¶

Runs a series of benchmarks, saving both final and intermediate results to .csv files. Benchmarks are run in sequential chunks, each of which runs all combinations of problems/configs/reps in parallel. This function should always be used using the “if __name__ == ‘__main__’: …” idiom.

Parameters

out_path (str) – The path to save the results to.
benchmark_name (str) – A name give to this set of benchmarks. Results will be saved in files named like “out_path/benchmark_name_chunk{chunk_number}_out.csv”
problems (List[Problem]) – Problem objects containing the test function to evaluate.
configs (Mapping[str, Union[str, list]]) – Dictionary of configs to run. Lists at leaves are used to construct a cartesian product of configurations.
global_seed (int, optional) – Global seed to use for reproducible benchmarks. Defaults to randomized seeds.
n_chunks (int) – The number of chunks to break the results into. Each chunk will contain at least 1 run of every combination of problem and config.
n_reps_per_chunk (int, optional) – Number of repetitions to run each problem/config in each chunk.
log_every (int, optional) – Logging interval during an experiment. Defaults to only logging at the end.
checkpoint_every (int) – Save intermediate results every checkpoint_every seconds.
n_proc (int) – Number of processors to use.
serial_debug (bool) – debug serially?

Return type

None

AEPsych

aepsych.benchmark¶

Submodules¶

aepsych.benchmark.benchmark module¶

aepsych.benchmark.pathos_benchmark module¶

aepsych.benchmark.problem module¶

aepsych.benchmark.test_functions module¶

Module contents¶

AEPsych

Navigation

Related Topics