Anthony Hevia

Genetic Programming System in Clojure

Brief Summary

For my senior year at Hamilton College, I took a seminar in genetic programming, culminating in my senior seminar project: an implementation of the PushGP system in the Clojure programming language. This process involved writing an interpreter in Clojure for the Push programming language in order to run the genetic programming algorithm. After implementing the base system, I implemented original changes and ran experiments to see how those changes affected the system's ability to produce solution programs.
Check out the GitHub repository for the full implementation!

What is Genetic Programming?

Genetic programming (GP) is a subfield of evolutionary algorithms within artificial intelligence. Inspired by biological evolutionary theory, the algorithm entails evolving a population of programs to fit for a given task or problem in order to arrive at a solution program. The algorithm can be broken down into a few stages: initialization, evaluation, selection, and variation.
First, a population of random programs (called individuals) is initialized. Then, the population is evaluated on a set of training data, wherein each individual is assigned a certain error based on its performance on the training data. Using the calculated errors, individuals from the population are then selected using a set of selection methods, under the idea that those with the lowest error are more likely to be selected. This is where the biological concept of "survival of the fittest" is mimicked in GP. Finally, selected individuals are varied through a combination of mutation and crossover techniques to create the individuals for the next generation.
The newly created population then repeates the cycle of evaluation, selection, and variation until a program is found that perfectly solves the training data, or some other stopping condition is met.

PushGP

PushGP is a particular implementation of the GP algorithm, which utilizes the Push programming langauge. Push is a stack-based programming langauge designed entirely to run the genetic programming language. Each data type in Push has its own corresponding stack (integer, string, float, etc.). The stack responsible for executing code is called the executation (or simply exec) stack. A Push program will run by popping instructions from the exec stack, and arguments from the appropriate data stacks, and pushing the results onto the appropriate return stack.
The individuals in PushGP are Push programs, that process inputs from the training set and push the resulting output onto the return type stack. However, Push is made specifially for genetic programming, and is not human-readable. So, the GP system is implemented in another programming language, and programs are run via a Push interpreter implemented in the chosen language. Conventionally, PushGP is implemented in Clojure, a functional LISP language.

My Project

For my senior year at Hamilton College, the final project for my genetic programming seminar entailed a full implementation of the PushGP system in Clojure, from scratch. As I was implementing the Push interpreter, a suite of utility functions, and the skeleton for the PushGP system, I wrote several files of test files to ensure each function was working as expected. To test the full system, I implemented a simple symbolic regression function, with a set of training data generated by mapping the target function onto a generated set of inputs. After confirming that the system was working as intended, it was time to approach the second main objective of the project.
Beyond a PushGP implementation, the other goal of the project was to implement an original change to the GP system up to my own discretion, and run experiments to gauge the degree to which the change(s) were benefitial in getting the system to produce solution programs. To summarize the changes I implemented to the system, I implemented new variation and parent selection methods to see if I could improve both the way the programs are chosen, and the way that those parent programs are changed to make a (hopefully) better child program.
The driving principle of my new variation technique was to be able to inject multiple new instructions sequentially into the genome of a parent program. Push is a programming language with a lot of neutral code, meaning that sometimes code in a Push program is inconsequential to the produced output. While this can lead to large Push programs, it is thought that this could be to the benefit of GP, as there is so much randomness in how programs are produced and varied. But, this lead me to the hypothesis that small changes in a Push program may be unlikely to lead to significant change in program behavior, which is the primary goal of variation in GP.
My second change was in how certain programs are selected to crossover their genetic material. Genetic crossover is a staple in most GP systems, and involves selecting 2 parent programs to crossover. Ideally, we take 2 programs that excel in different parts of the training data to take the best components of both, creating a child program that is an improvement over both its parents. To tackle this goal, I implemented what I call semantics-based selection. In short, when the first parent is selected for crossover, we take into account that individual's error on each individual training case, and prioritize selecting a second parent that excels where the first parent is weak.