SuperSonic.TasksDefinition

SuperSonic uses A SuperOptimizer to define policy strategies. A RLAlgorithms to support RL algorithms. A action_functions to support usr-defined action transition functions. A observation_function to support observation transition functions. A reward_function to support reward transition functions.

SuperOptimizer

class SuperSonic.policy_definition.policy_define.SuperOptimizer(StateFunctions=['Word2vec', 'Doc2vec', 'Bert'], RewardFunctions=['relative_measure', 'tan', 'func', 'weight'], RLAlgorithms=['MCTS', 'PPO', 'APPO', 'A2C', 'DQN', 'QLearning', 'MARWIL', 'PG', 'SimpleQ', 'A3C', 'ARS', 'ES', 'BC'], ActionFunctions=['init'], datapath='')[source]
Class

SuperOptimizer includes candidate functions (or models) for representing the environment state, objective functions for computing the reward, and the set of possible actions that can be taken from a given state. The compiler developer first defines the optimization problem by creating an RL policy interface. The definition includes a list of client RL components for the meta-optimizer to search over.

PolicyDefined()[source]

Each of the components can be chosen from a pool of SuperSonic built-in candidate methods, and the combination of these components can result in a large policy search space.

Return policy_all

All policy strategies.

Return policy_amount

A list includes index for each policy strategy.

cross_valid()[source]

split dataset to train/valid set, default using 3-fold cross validation

RLAlgorithms

class SuperSonic.policy_definition.Algorithm.RLAlgorithms[source]
Class

SuperSonic currently supports 23 RL algorithms from RLLib, covering a wide range of established RL algorithms.

A2C(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

A3C(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

APPO(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

ARS(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

Algorithms(policy_algorithm, task_config, environment_path)[source]

Algorithms, using to call different RL algorithms

Parameters
  • policy_algorithm

  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

DQN(task_config, environment_path)[source]

DQN, An interface to start RL agent with DQN algorithm. A deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

ES(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

MARWIL(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

MCTS(task_config, environment_path)[source]

MCTS, An interface to start RL agent with MCTS algorithm. MCTS is an RL agent originally designed for two-player games. This version adapts it to handle single player games. The code can be sscaled to any number of workers. It also implements the ranked rewards (R2) strategy to enable self-play even in the one-player setting. The code is mainly purposed to be used for combinatorial optimization.

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

PG(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

PPO(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

QLearning(task_config, environment_path)[source]

Q-networks, An interface to start RL agent with Q-networks algorithm. Use two Q-networks (instead of one) for action-value estimation. Each Q-network will have its own target network.

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

SimpleQ(task_config, environment_path)[source]

PPO, An interface to start RL agent with PPO algorithm. PPO’s clipped objective supports multiple SGD passes over the same batch of experiences. Paper (https://arxiv.org/abs/1707.06347

Parameters
  • task_config – The task_config, parameters passed to RL agent.

  • environment_path – The environment_path, tasks’ environment path that RL agent called.

action_functions

class SuperSonic.policy_definition.action.action_functions[source]
Class

action_functions, defines the action space, ovservation space by inheriting a default Action class.

init_actions(interleave_action_length, obsv_low, obsv_high, obsv_size, method)[source]

Construct and initialize action and observation space of different tasks.

Parameters
  • interleave_action_length – Action space. This must be defined for single-agent envs.

  • obsv_low – lower boundary of observation space.

  • obsv_high – higher boundary of observation space.

  • obsv_size – Observation space. This must be defined for single-agent envs.

  • method – Action methods, different parameters mapping to different definition approaches.

observation_function

reward_function

class SuperSonic.policy_definition.reward.reward_function[source]
Class

A reward function that reports the quality of the actions taken so far. It provides candidate reward functions like RelativeMeasure and tanh to compute the reward based on the metric given by the measurement interface.

get_rew(input, baseline=1, weight=1, reward_function='usr_define')[source]

Get reward with specific reward functions

Parameters
  • input – Input, usually as input of an transition function, e.g. runtime, speedup and hamming distance.

  • baseline – Using baseline to calculate speedup etc.

  • weight – Using weight parameter to set how important of specific action.

  • reward_function – reward functions, reward-transition method.