# How can I convert a simple CLI RPG to a compatible environment for training an RL agent via stable-baselines?

Artificial Intelligence Asked by SeunOsiko on August 24, 2021

What would be the good choice of algorithm to use for character action selection in an RPG, implemented in Python?

I had previously asked this question in the hopes of getting headway on the AI portion of a project I have been working on, only to realized that I had the obstacle of converting the text based game I had created to a custom gym-environment before I could consider algorithm selection. I have found few papers relating to the task I am looking to achieve, hence I’ve come for advice on how to get started.

As my code for that particular task is extensive and rather messy, I have tried to create a minimized version which contains most of the core features for the development of an environment that, from my limited understanding, should be relatable to the larger program which looks to explore the possibility of an RL agent being a second player in a game akin to a simple Pokemon-esque RPG.

In this example, the player, or the agent, can choose a class prior to the start of the battle and on each turn, an action from a set can be chosen; for ease of access, I did not add differing action sets for each character class and to use the last ‘Special’ action, there is a requirement of a specific amount of ‘mp’ which cannot be increased.

I assume that if this game is converted to a compatible environment, an agent can learn how to optimally play this game.

I have a few questions aside from the major ‘how-to’ just because I lack the understanding in this area and for more directly transferable information to the larger project:

• How do I prevent an agent from selecting a particular action when the condition has not been met (e.g not enough MP)

• Can I define the agent’s action set after it’s character class selection or I have to give the complete list of actions usable in the game and then set probabilities based on the chosen character class?

• How do I handle the agent being unable to act when it’s HP is 0, but the game has not ended. Due to the existence of an allied player in the party, for example.

• To follow up, how to I allow action selection once it’s HP has been restored. (Not currently applicable to the example shown below)

• Can the agent learn to interact with another player in the game, can I reward the agent for using particular types of actions on a particular type of player? (Not currently applicable to the example shown below)

• If, for example, a particular condition was applied to the game environment for one episode but not another, such as healing instead harms, can the agent deduce that such a condition has been met under a particular episode and choose not to use healing-based actions.

If I need to make extensions of amendments to the code, just let me know.

Specified code is as follows:

# Based of original source-code by users: Citrus-Code and AlexV on Code Review Stack Exchange

import random

def classSelector(enemySelection = False):
if enemySelection:
classSelection = random.randint(1,3)
if classSelection is 1:
return 1700
elif classSelection is 2:
return 1750
elif classSelection is 3:
return 1300

else:
print("Choose a class: n1. Thiefn2. Warriorn3. Mage")
classSelection = int(input())
print("nPlayer Class selection complete! n")
if classSelection is 1:
return 1000
elif classSelection is 2:
return 1200
elif classSelection is 3:
return 900
def battle_simulation():
"""Run a simple interactive RPGChar battle simulation"""

class RPGChar:
def __init__(self, health):
self.hp = health
self.mp = 100
self.maxHealth = health
# used to denote if a character is defending
self.defenseState = 0

def getDefenseModifier(self):
if self.defenseState is 0:
return 1
else:
return 0.75

def heal(self, heal_amount):
self.hp += heal_amount
if self.hp > self.maxHealth:

self.hp = self.maxHealth
return heal_amount
def cast(self, mpCost):
self.mp -= mpCost
if self.mp <0:
self.mp = 0

def attack(self, target, damage):
target.hp -= int(damage * 1 - target.getDefenseModifier())
if target.hp < 0:
target.hp = 0
return damage

def defend(self):
self.defenseState = -self.defenseState + 1

def defendReset(self):
self.defenseState = 0

enemy = RPGChar(classSelector(enemySelection=True))
player = RPGChar(classSelector())
while True:
print("nATTACK CHOICESn1. Attackn2. Defendn3. Healn4. Special")
attack_choice = int(input("nSelect an attack: "))

# The enemy selects an attack by random, but will always attack if hp is full
enemy_choice = random.randint(1, 2 if enemy.hp == enemy.maxHealth else 4 )

if attack_choice is 2:
print(f"You defend yourself from incoming attacks!")
player.defend()

if enemy_choice is 2:
print(f"Enemy defends from incoming attacks!")
enemy.defend()

if attack_choice is 1:
print(f"You dealt {player.attack(enemy, 275)} damage.")

if enemy_choice is 1:
print(f"Mew dealt {enemy.attack(player,250)} damage.")

if attack_choice is 3:
print(
f"You healed {player.heal(random.randint(int(player.maxHealth * 0.1),int(player.maxHealth * 0.2)))} health points."
)

if enemy_choice is 3:
print(
f"Mew healed {enemy.heal(random.randint(int(player.maxHealth * 0.1),int(player.maxHealth * 0.15)))} health points."
)
if attack_choice is 4:
if player.mp > 0:
print(f"You dealt {enemy.attack(player,450)} damage.")
else:
print("You do not have MP to use this action!")
print("You do nothing on this turn.")
if enemy_choice is 4:
if enemy.mp > 0:
print(f"You dealt {enemy.attack(player,450)} damage.")
else:
print("Enemy does not have MP to use this action!")
print("Enemy does nothing on this turn.")
if enemy.hp is 0 or player.hp is 0:
break

print(f"Mew's current health is {enemy.hp}")

enemy.defendReset()
player.defendReset()

print("n Next Turn!")

print(f"Mew's final health is {enemy.hp}")

if player.hp < enemy.hp:
print("nYou lost! Better luck next time!")
else:
print("nYou won against Mew!")

def Main():
battle_simulation()

if __name__ == "__main__":
Main()



## Related Questions

### Classification or regression for deep Q learning

0  Asked on December 16, 2021

### Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

0  Asked on December 16, 2021 by sirfroggy

### Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input?

1  Asked on December 13, 2021

### How Graph Convolutional Neural Networks forward propagate?

1  Asked on December 13, 2021

### In which cases is the categorical cross-entropy better than the mean squared error?

3  Asked on December 11, 2021

### What are the keys and values of the attention model for the encoder and decoder in the “Attention Is All You Need” paper?

1  Asked on December 11, 2021

### Is my 57% sports betting accuracy correct?

1  Asked on December 11, 2021 by sports_stats

### Understanding the “unroling” step in the proof of the policy gradient theorem

2  Asked on December 9, 2021

### Forcing a neural network to be close to a previous model – Regularization through given model

0  Asked on December 9, 2021 by blba

### Why is DDPG not learning and it does not converge?

0  Asked on December 9, 2021 by i_al-thamary

### How artificial intelligence will change the future?

1  Asked on December 7, 2021

### Can residual neural networks use other activation functions different from ReLU?

1  Asked on December 7, 2021 by jr123456jr987654321

### Is it necessary to standardise the expected output

1  Asked on December 7, 2021

### Is CNN capable of extracting the descriptive statistics features

1  Asked on December 4, 2021 by nilsinelabore

### How to create Partially Connected NNs with prespecified connections using Tensorflow?

3  Asked on December 2, 2021 by pnar-demetci

### What is the best resources to learn Graph Convolutional Neural Networks?

2  Asked on December 2, 2021

### Is it possible to use AI to reverse engineer software?

2  Asked on November 29, 2021 by ipsumpanest

### Why do CNN’s sometimes make highly confident mistakes, and how can one combat this problem?

6  Asked on November 29, 2021

### Can you explain me this CNN architecture?

1  Asked on November 29, 2021 by sanmu

### In Deep Deterministic Policy Gradient, are all weights of the policy network updated with the same or different value?

1  Asked on November 29, 2021 by unter_983