SuperSonic.TasksDefinition¶

SuperSonic uses A SuperOptimizer to define policy strategies. A RLAlgorithms to support RL algorithms. A action_functions to support usr-defined action transition functions. A observation_function to support observation transition functions. A reward_function to support reward transition functions.

SuperSonic Tasks definition Classes:

SuperOptimizer
RLAlgorithms
action_functions
observation_function
reward_function

SuperOptimizer ¶

class SuperSonic.policy_definition.policy_define.SuperOptimizer(StateFunctions=['Word2vec', 'Doc2vec', 'Bert'], RewardFunctions=['relative_measure', 'tan', 'func', 'weight'], RLAlgorithms=['MCTS', 'PPO', 'APPO', 'A2C', 'DQN', 'QLearning', 'MARWIL', 'PG', 'SimpleQ', 'A3C', 'ARS', 'ES', 'BC'], ActionFunctions=['init'], datapath='')[source]¶

Class

SuperOptimizer includes candidate functions (or models) for representing the environment state, objective functions for computing the reward, and the set of possible actions that can be taken from a given state. The compiler developer first defines the optimization problem by creating an RL policy interface. The definition includes a list of client RL components for the meta-optimizer to search over.

PolicyDefined()[source]¶

Each of the components can be chosen from a pool of SuperSonic built-in candidate methods, and the combination of these components can result in a large policy search space.

Return policy_all: All policy strategies.
Return policy_amount: A list includes index for each policy strategy.

cross_valid()[source]¶: split dataset to train/valid set, default using 3-fold cross validation

RLAlgorithms ¶

class SuperSonic.policy_definition.Algorithm.RLAlgorithms[source]¶

Class

SuperSonic currently supports 23 RL algorithms from RLLib, covering a wide range of established RL algorithms.

A2C(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

A3C(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

APPO(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

ARS(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

Algorithms(policy_algorithm, task_config, environment_path)[source]¶

Algorithms, using to call different RL algorithms

Parameters

policy_algorithm –
task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

DQN(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

ES(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

MARWIL(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

MCTS(task_config, environment_path)[source]¶

MCTS, An interface to start RL agent with MCTS algorithm. MCTS is an RL agent originally designed for two-player games. This version adapts it to handle single player games. The code can be sscaled to any number of workers. It also implements the ranked rewards (R2) strategy to enable self-play even in the one-player setting. The code is mainly purposed to be used for combinatorial optimization.

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

PG(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

PPO(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

QLearning(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

SimpleQ(task_config, environment_path)[source]¶

Parameters

task_config – The task_config, parameters passed to RL agent.
environment_path – The environment_path, tasks’ environment path that RL agent called.

action_functions ¶

class SuperSonic.policy_definition.action.action_functions[source]¶

Class

action_functions, defines the action space, ovservation space by inheriting a default Action class.

init_actions(interleave_action_length, obsv_low, obsv_high, obsv_size, method)[source]¶

Construct and initialize action and observation space of different tasks.

Parameters

interleave_action_length – Action space. This must be defined for single-agent envs.
obsv_low – lower boundary of observation space.
obsv_high – higher boundary of observation space.
obsv_size – Observation space. This must be defined for single-agent envs.
method – Action methods, different parameters mapping to different definition approaches.

observation_function ¶

reward_function ¶

class SuperSonic.policy_definition.reward.reward_function[source]¶

Class

A reward function that reports the quality of the actions taken so far. It provides candidate reward functions like RelativeMeasure and tanh to compute the reward based on the metric given by the measurement interface.

get_rew(input, baseline=1, weight=1, reward_function='usr_define')[source]¶

Get reward with specific reward functions

Parameters

input – Input, usually as input of an transition function, e.g. runtime, speedup and hamming distance.
baseline – Using baseline to calculate speedup etc.
weight – Using weight parameter to set how important of specific action.
reward_function – reward functions, reward-transition method.

SuperSonic.TasksDefinition¶

SuperOptimizer¶

RLAlgorithms¶

action_functions¶

observation_function¶

reward_function¶

SuperOptimizer ¶

RLAlgorithms ¶

action_functions ¶

observation_function ¶

reward_function ¶