Research Interests
February 2009

Gravity Trade Models

Gravity models have been long used in social sciences to characterize the pulling effect exerted by large economic entities such as countries and cities on products or people. Both their names and basic equations have obvious Newtonian connotations. 

           

GTM is one of the most conventional of the gravity models and attempts to describe the volume of trade between two countries as a function of their sizes – expressed in terms of populations and Gross Domestic Product (GDP) values – and the distance between them, typically measured between capital cities. In order to take into account additional factors that may influence trade, it is customary to add dummy variables that denote geographical adjacency, regional integration, cultural similarities, or common memberships into trade agreements.

           

Economic Modeling by Genetic Algorithms

Genetic Algorithms (GAs) are search and optimisation tools inspired from the principles of natural evolution. Since the publication of the Goldberg’s fundamental work on GAs in 1989, these tools have enjoyed tremendous popularity due to their ability to produce spectacular solutions for a variety of mathematically intractable or computationally prohibitive problems.

In addition, GAs have some degree of immunity to problems that plague classic search algorithms such as gradient descent. Perhaps the most important example of such problem is the premature convergence, when the algorithm becomes trapped in a local minimum while evolving into a direction of search that minimizes some predefined error function.

On the other hand, GAs do not guarantee the optimality of their solutions, although for reasonably small search spaces, the optimum solution is generally reached within a reasonable number of iterations. Broadly speaking, GAs do not generally find the best solution, but are able to provide an acceptable one when other approaches are computationally prohibitive or simply non-existent.

When solving a problem within a GA-based framework, one must first map the solution space of the problem into a genetic space whose elements consist of strings of numbers also referred to as genes. In the genetic space a GA evolves a set of solutions (population) for a number of iterations (generations). In each generation, the most successful (fittest) solutions share building blocks (groups of genes) in an attempt to produce new solutions (offspring) that outperform their “parents”. This mechanism is an imitation of natural reproduction and is typically implemented via an operator called crossover. The participants to crossover are selected with probabilities proportional to their fitness, i.e., their degree of success in solving the problem. This concept is the correspondent of natural selection and insures that the best solutions from a certain generation contribute to the next generation by combining their genetic material. In addition, an operator called mutation is employed to introduce additional diversity into the gene pool, thereby contributing to the robustness of the search and avoiding premature convergence.

In a data modelling application a solution is a succession of operators applied to one, two, or three operands that may be model variables or constants. In our approach, the available operators and operands are assigned numerical codes. By listing these codes in the order in which they appear in the model, we build the genetic representation of the solution.

Concretely, a model consists of a set of randomly selected operators applied to selected variables extracted from the historic data; the result of each operator is referred to as a meta-variable. Then, a linear regression is then applied to the set of meta-variables. The adjusted R2 coefficient that resulted from the linear regression was employed as a measure of fitness. When the entire population of solutions has been created, crossover is applied to selected pairs of solutions and new models are thereby generated. By random exchanges of information, or genes in the GA terminology, there is a potential that fitter specimens will be created. The success of crossover will be assessed during the next generation, when the fitness of the newly created chromosomes will be estimated.

The superiority genetic modeling against methods based on linear regression is evident. First, no prior knowledge about the system to be modeled is necessary. Secondly, if the unsupervised learning engine should discover nonlinear dependencies within the data, it has now the ability to capture them into an analytic expression using a wide variety of operators, such as max() or if-then-else.