# Current Students¶

The following are postgraduate students currently under my supervision or co-supervision.

# Project Students¶

## Computer Science Honours¶

Student | Project Title | Abstract |
---|---|---|

## Engineering Skripsies¶

Student | Project Title | Abstract |
---|---|---|

# Masters Students¶

## Computer Science¶

Student | Thesis Title | Abstract |
---|---|---|

Albertus Aribeb aaribeb@gmail.com Systems Engineer, Skorpion Zinc Mine (Vedanta Zinc International) | Time Series Data Clustering | A time series is a sequence of observations on a variable measured at successive points in time. The task of grouping a set of observations in a time series in a way that the measured values that are similar or have the same behaviour are grouped together is called time series clustering. This research will conduct a review of time series clustering. We will look at both stationary and non-stationary times series. The different approaches to characterize time series will be reviewed with a focus on feature based clustering. Analysis will be done on the applications of times series clustering and the approaches to quantify the performance of time series clustering algorithms will be investigated. |

Chelsea Barraball 19768125@sun.ac.za | Competitive Coevolutionary Particle Swarm Optimization for Dynamic Optimization Problems | This research will develop a competitive coevolutionary particle swarm optimization approach to solve dynamic optimization problems. Competitive coevolution models that arms race that is observed between populations of species that are in competition for survival. Such competing species are, for example, predator-prey behaviors. Due to the dynamic nature of this process in nature, it is believed that the algorithmic models of such predator-prey behaviors will lend themselve naturally towards solving dynamic optimization problems. The research will start by first developing a competitive coevolutionary particle swarm algorithm for solving static optimization problems and to investigate the impact on swarm diversity. Different approaches to computation of the relative fitness function, and selection of the competition pool will be evaluated. The approach will then be applied and evaluated on various types of dynamic optimization problems. |

Heinrich Cilliers 19035837@sun.ac.za | Adaptive Gaussian Mixture Models | A Gaussian mixture model (GMM) is used in unsupervised learning to represent clusters in a dataset as a mixture of Gaussian distributions. GMMs are usually fitted using the Expectation-Maximization (EM) algorithm, which is prone to yielding sub-optimal solutions. Additionally, the EM algorithm fits GMMs to stationary data and requires the number of clusters to be specified beforehand. This study aims to propose, evaluate and compare various approaches to fitting a GMM to stationary and non-stationary data, as well as dynamically determining the optimal number of Gaussians using particle swarm optimization. |

Rohan Chhipa rohan.chhipa@live.com | Community Detection using Set-based Particle Swarm Optimization | Community detection is an important tool in analyzing structural relationships within complex networks by enabling the detection of node clusters. Each node cluster is referred to as a community. Community detection has been applied to a wide variety of network-based problems such as criminal user groups detection in social networks, customer segmentation and smart advertising, and protein identification in biological networks. Using community-based quality indicators, such as the community modularity measure, it is possible to view community detection as an optimization problem and thus making it possible to use swarm intelligence algorithms such the Particle Swarm Optimization (PSO) algorithm. This study formulates community detection as a set-based optimization problem which is then solved using the Set-based PSO (SBPSO) algorithm. The effectiveness of SBPSO is measured empirically by comparing the results produced by SBPSO to the current state-of-the-art PSO-based community detection algorithms on a benchmark set of problems. |

Kyle Erwin kyle.erwin24@gmail.com Alan Gray | Set-based Particle Swarm Optimization for Portfolio Optimization | Portfolio optimization is a complex real-world problem where assets are selected such that profits are maximized while the risk is simultaneously minimized. Traditional portfolio optimization approaches make use of quadratic programming to determine portfolios that represent a balance between return and risk. However, as the number of assets increases, the efficiency of quadratic programming deteriorates. In recent years, nature-inspired algorithms have become a popular choice for efficiently identifying optimal portfolios. This research develops such an algorithm that, unlike previous algorithms, uses a set-based approach to reduce the dimensionality of the problem as well as determine the appropriate budget allocation for each asset. The set-based particle swarm optimization algorithm is extended to solve multi-objective and constrained formulation of the portfolio optimization problem using set-based representations. |

Jordand Daubinet daubinet.jordan@gmail.com | Multi-Agent Reinforcement Learning for Financial Trading | Financial trading is an activity undertaken by a "financial trader', in which the trader buys and sells financial assets from a trading venue with the goal of making a profit between security exchanges. Reinforcement learning is a machine learning algorithm that is trained to learn the optimal actions to take for a specific environment state. The machine learning algorithm learns from experiences, using positive and negative reinforcement based on the outcome of its actions. The recent performance improvements in modern reinforcement learning algorithms have brought about new opportunities for implementing these algorithms within the financial trading space. This research will implement a multi-agent reinforcement learning algorithm to act as an artificial financial trader with the objective of making a profit over a set time frame. Each individual agent will be fit on a unique data type, selected from either technical ticker information, stock fundamental information, sentiment values on social media or some other form of alternative data. Each trained agent’s action space will be used as an input into a last layer reinforcement learning algorithm that decides whether to make a trade or not. |

Ignazio Ferreira | Neural Network Ensembles and Concept Drift | This research developes an approach to train a neural network ensemble under the presence of concept drift. Particl swarm optimization algorithms developed for solving dynamic optimization problems will be used to train each member of the ensemble and to adapt learned decision boundaries as concept drift is experienced. A multi-modal particle swarm optimization algorithm will be developed to ensure that ensemble members are situated on different local minima of the neural network landscape. Different mechanisms to ensure diversity in enesmble member decision making will also be investigated. |

Ryan Lang | Landscape-aware Hyper-heuristics | A hyper-heuristic employs a heuristic pool consisting of a wide variety of different heuristics. A heuristic selection operator is then used to guide the search to the optimal heuristic(s) to use. Fitness landscape analysis is a formal approach to characterize search landscapes. The purpose of this research project is to attempt to find a mapping between algorithm performance and the characteristics of optimization problems, in order to determine for which characteristics certain algorithms perform well, or poorly. From this, the first landscape-aware hyper-heuristic selection rules will be developed. |

Werner Mostert werner.mostert1@gmail.com Amazon Web Services | Insights into the feature selection problem through landscape analysis | TBC |

Muhammed Rahman oncmmr@gmail.com | Genetic Programming to Induce Classification Trees in Dynamic Environments | Genetic programming has been used scucessfully to evolve classification trees for stationary data. This research will developed genetic programming approaches to evolve classification tress for non-stationary data, where concept drift occurs. In addition, approaches will be developed to include dynamically changing boundaries that are parallel and non-parallel to the axes. The set operator will also be included in the genetic programming language. As part of the study, approaches will be developed to quantify the diversity of the tree-based individuals found in genetic programming populations. |

Cian Steenkamp ciansteenkamp@gmail.com CSIR | Multi-guided Particle Swarm Optimization for Many-objective Optimization Problems | My research proposes to investigate the scalability of the multi-guided particle swarm optimization (MGPSO) algorithm. That is scalability only with regards to the number of objectives. Many-objective optimization problems (MaOPs) have four or more objectives to be optimized, whereas multi-objective problems (MOPs) have at most three objectives. The MGPSO has been shown to be suitable for effectively solving MOPs, but no research has been done on the scalability of the MGPSO to solve MaOPs. Therefore, my research will initially compare the performance of the MGPSO with that of other many-objective optimization (MaOO) algorithms on MaOPs consisting of 3, 5, 8, 10, and 15 objectives. Then mechanisms will be proposed and implemented to allow the MGPSO to scale and efficiently solve MaOPs. The benchmark problems used in my study include test functions from the Walking Fish Group (WFG) and Deb-Thiele-Laumanns-Zitzler (DTLZ) test suites. The empirical results will be compared by applying performance measures and statistical tests. |

Benjamin Strelitz benstrelitz@gmail.com | A Dynamic Multi-Modal Particle Swarm Optimization Algorithm for Dynamically Constrained Optimization Problems | Multi-modal optimization (MMO) PSOs exist for static, unconstrained environments. Additionally, there exist many PSO algorithms for solving statically constrained problems. The latter PSOs return only single solutions. There are also PSO algorithms designed to track a single solution in unconstrained, dynamic environments. However, there are very few MMO PSO algorithms developed for tracking multiple solutions in unconstrained, dynamic environments, and currently no MMO PSO algorithms capable of of tracking multiple solutions in dynamically constrained, dynamic environments. The primary objective of this study is to develop a MMO PSO algorithm capable of solving dynamic optimization problems with dynamic constraints, with the ability to find all feasible solutions. |

Aksel Thele athele@mtc.com.na Mobile Telecommunications Limited, Namibia | Honey bee optimization for dynamic environment | Many real life problems can be formulated as dynamic optimization problems (DOPs). In a DOP the environment changes over time presenting a challenge to optimization algorithms that optima have to be found and tracked as the environment changes. Efficient honey bee algorithms (HBAs) have been developed to find optima for static optimization problems. This thesis evaluates the performance of HBAs on DOPs. A number of modifications of HBAs are empirically evaluated on an extensive benchmark set of twenty seven DOP classes. The thesis quantify and compares the effectiveness of each modification strategy. In the end recommendations are made on which modification strategies are to be considered state-of-the-art and to be included in future studies. |

JP van Zyl 20706413@sun.ac.za | TBC | TBC |

## Engineering¶

Student | Thesis Title | Abstract |
---|---|---|

James Faure jamesfaure@icloud.com | Image Classification and Recognition of X-rays Used to Label Teeth and Teeth Abnormalities in Dental Analysis | Analysis of an X-ray for any dentist can be time consuming and subject to human error. This research will develop machine learning technologies to automate analysis of dental X-rays for the purpose to determine if there are any abnormalities with any of the teeth. The resulting model can be implemented on an app platform which can easily be accessed by orthodontic radiologists, and especially useful to those in rural areas where there are no dentists available. |

Werner van der Merwe 20076223@sun.ac.za | Model Tree Forests | This research will develop a model tree ensemble for use on large data sets where the predicted target values are numerical. The performance of this model tree forest will be compared with a single induced model tree. Various aspects that influence the performance of the model tree forest will be investigated, uncluding approaches to fuse the decisions of the individual model trees, to select a subset of features to construct model trees on, and how data subsampling should be done for each induced model tree. |

# Doctoral Students¶

## Computer Science¶

Student | Thesis Title | Abstract |
---|---|---|

Adekoya Adekunle | Multi-Objective Optimization For Dynamic Incremental Machine Learning Algorithms | Due to data streams becoming more prevalent, research to improve the understanding, analysis and processing of big data stream is very active. The main goal of these research is to improve prediction and decision-making based on data streams. However, many of these data streams are generated and processed in environments that are characterized by uncertainty, such as temporal changes to the statistical properties of the data stream. A number of research studies are ongoing on how to handle the uncertainty around data streams. As a result of the forgoing, this research aims to investigate the efficacy of evolutionary and swarm-based multi-objective optimization techniques to develop machine learning predictive models for data streams. An important considaration for for developing these predictive models is the presence of concept drift, where the statistical distribution of data and/or target variables may change with time. The consequences of concept drift include degradation in performance, and changes in the optimility of the resulting model architecture. This research will formulate machine learning in the presence of concept drift as a dynamic multi-objective optimization problem, where the objectives are to optimize prediction accuracy and to optimize model architecture (in order to prevent overfitting and underfitting). Both objectives are dynamic, due to the consequences of concept drift. Multi-objective machine learning predictive models for data streams will be developed and extensively evaluated. These predictive models will then combined into a heterogeneous ensemble model, and the performance of this ensemble will be evaluated in comparison with the individual machine learning models. |

Dave Bockus | High Dimensional Fitness Landscape Analysis | Fitness landscape analysis attempts to determine features of an error landscape defined by some function. Landscapes can be defined as having plateaus, gentle or severe gradients toward local or global optima, or by defining the ridges and barriers of the landscape. In essence landscapes of high dimensional error surfaces often are synonymized with geological landscapes to give a visual reference to the feature. Thus, the error surface will affect how one transcends over the landscape searching for an optimal point on the landscape. Search methods (e.g. particle swarm optimization, genetic algorithms and gradient descent, amongst others) which respond to the features of the landscape in order to move towards optima, do so largely independent of a-priori knowledge of the underlying error surface. Thus, any tuning of control parameters for any of the variety of algorithms transcending the error surface is done blindly, where dynamic alteration of those control parameters are independent of the local error surface features. Any dynamic tuning of control parameters to date is a result of the algorithms behaviour with respect to the error surface, but not directly to characteristics of the error surface. What is needed is a way of extracting local (possibly non-local) error surface features, ideally during the search process, which are applied back to the algorithms to tune control parameters in order to enhance that algorithms search capability. Two approaches can be followed: The first is to conduct tuning based on landscape characteristics prior to running the optimization algorithm, thereby using global landscape information. The second approach results in a self-adaptive approach where local landscape information is used to guide the control parameter tuning in real-time during the optimization process. Current methods of extracting error surface features encompass the use of random walks over the error surface in order to obtain a limited set parameters which are used for tuning. Parameters which measure the neutrality of the surface or the slope have been used in attempts to link the error surface to the search algorithm. Unfortunately, the concept of random walks does not put sufficient spatial context to sampled points on the error surface. Thus only the generalized metrics mentioned above can be extracted. Research has shown that those metrics have been used with limited success in tuning search algorithms. What is needed is an extraction of useful features from the error surface which can expand and thus define one’s image of what the error surface looks like. This leads to a more traditional view of an error surface, one which parallels that of natural geology, and include features such as hill, valleys, plateaus etc. The one underlying issue is that error surfaces in which PSO, GA and Neural Networks operate are high dimensional and do not lead to visualization. Extracting features from high dimensional surfaces has proven difficult in the realm of providing context between surface and algorithm. The objective of this study is to developed an approach where the search space is reduced to a smaller dimensional space, and the fitness landscape analysis is done in this smaller-dimensional space. |

Taiwo Omomule | Heterogeneous Mixtures of Experts | As is the case with human experts, machine learning algorithms have a learned bias, which results in different machine learning experts, created from the same dataset with different predictions. To address this problem, mixtures of experts have been developed. Mixtures of experts is an approach in machine learning to significantly improve the performance of predictive models by considering an aggregation of multiple machine learning algorithms such as neural network ensembles, random forests, k-nearest neighbour ensembles, amongst others. However, classical mixtures of experts are mostly homogeneous in that all the experts in the mixture model are multiple instances of the same machine learning algorithm. While such an approach is still efficient, performance of mixtures of experts can be significantly improved if different types of machine learning algorithms are included, thus capitalizing on the strengths and inductive biases of a diverse set of experts which will result in a good balance of the advantages of these different ML experts used in the mixture model. The rationale behind this approach to heterogeneous mixture of experts modeling is that no one machine learning algorithm performs best on all problems, and that different algorithms show different advantages and disadvantages based on the problem characteristics. |

Amani Saad adomad1983@hotmail.com | Differential Evolution and Optimal Population Sizes | Parameter control is a significant topic in the design of evolutionary algorithms (EAs). The performance of EAs is greatly affected by the selection of control parameters. Therefore, optimal selection for values of control parameters is particularly noteworthy research field. One common control parameter among all EAs is the population size. Differential Evolution (DE) is sensitive to its control parameters which are the crossover rate, the scale factor and the population size. Despite the fact of having population size as an important control parameter which significantly influences the performance of DE, the volume of work dedicated to address the population size indicates that this aspect is still under-investigated. A number of empirical studies have advised that setting the population size should be related to the problem dimensionality. Based on these empirical studies, a general perception within the DE research community that advocates setting the size of a DE population to 10 times the dimension of the problem prevailed. However, the conclusions derived from these studies were based on very limited benchmark suite containing only a few benchmark functions and hence are not suitable for all problems instances. Also, the common method of increasing the population size gradually to achieve better performance is subjective. A clear incremental strategy was not defined. Instead, rules of thumb were suggested as a user guide. The main objective of this research is to empirically analyze DE with respect to optimal population sizes, and to derive correlations between optimal population size and fitness landscape characteristics. The impact of different population sizes on search behavior will also be investigated. |

## Engineering¶

Student | Thesis Title | Abstract |
---|---|---|

Olabanji Asekun asekun@sun.ac.za | Dynamic Passenger train scheduling for South Africa using Particle Swarm Optimisation | The train time tabling problem is a complex problem because there are in most cases multiple dynamic objectives and dynamic constraints that are required to be satisfied. Optimization methods are mostly used to address these problems because of the ability to find a feasible solution in a reasonable amount of time. This research aims to develop a particle swarm optimisation optimisation algorithm to solve the train time tabling problem in South Africa to reduce delays caused as a result of aging infrastructure and vandalism. |

Emmanuel Buabin | Noncommutative Time Series Feature Extraction with Banach Lie Algebra | In this thesis, the focus is directed at algebraic evolutionary time series feature extractor conceptualization, design and implementation. To be specific, a mathematical theory that constitute 1) specialized Banach/Hilbert space, 2) specialized Banach Lie related Algebra and 3) specialized body of mechanics (quantum motivated), is motivated for the overarching goal of algebraic time series feature data production, machine learning framework modelling and other interactive concept modelling. The time series feature extractor, equipped with, constituting novel algebraic evolutionary (swarm) time series feature learning procedures, is adopted for feature extraction duty on produced (algebraic) time series datasets, within a specific time series problem context. To ascertain performance levels, experimentations are varied across different parameters. |

Timothy Carolus t.g.carolus@gmail.com | *Control Parameter Importance and Stability Analysis of Population-based Algorithms * | A common problem in the design of optimization algorithms is ensuring that the sequence of solutions converges. This problem becomes more problematic for population-based meta-heuristics. One such group of iterative algorithms are swarm intelligence based algorithms, such as particle swarm optimization (PSO). Stability conditions have been derived on the control parameters of a class of such population-based algorithms, where the position updates can be reformulated in a specific recurrence relation. This research will investigate a number of swarm intelligence based algorithms and work towards reformulation of their position updates in the standard recurrence relation. From this, stability conditions will be drived for these algorithms, to provide guidance on how values for control parameters should be initialized to guarantee that an equilibrium state will be reached. Furthermore, an analysis of the control parameter importance within this region is carried out using functional analysis of variance. This study is applied to both single objective and multi-objective optimization algorithms. |

Webster Gova webgster@yahoo.com Data Science Manager, Umuzi Academy | A novel machine learning approach to forecast production structure evolution | The product-space methodology (PSM) has emerged as a strong contender for the stochastic prediction of country level economic growth behaviours. Measures calculated from PSM provide measures a simplified way to identify a nation's global export positioning in an industry and industries it must target for export growth. Application of PSM to understand economic development also makes it easier to infer trade data on the likelihood of different products to be exported together. PSM has some shortcomings which have not been fully addressed to date. Some of the shortcomings include the methodology’s static nature in only analyzing one year at a time. The PSM’s insistence on attributing exports only to domestic factors while dismissing contributions from global supply chains have also faced great criticism, including the fact that it suffers from the limitations posed by trade classifications to reflect production structures or skills embedded in exported products. Investigations of mathematical interpretations to understand and interpret PSM metrics for possible optimization through Machine Learning (ML) algorithms have shown great promise. There is limited evidence in literature reviewed to date that ML approaches have been used to understand how changes in production structures over time have contributed to economic transformation through diffusion of knowledge and capabilities in the network of product relatedness. Our study is developing robust and efficient ML techniques capable of processing multiple time series on trade data capable of forecasting and inferring stochastic dependency of future economic growth on historical trade data. The study will formalize multiple-step forecasting problems as supervised learning tasks that can be achieved in three major steps; (i) feature extraction to characterise each time series to reduce dimensionality, (ii) using extracted features as local learning approximators for clustering of multiple time series and (iii) forecasting based on salient features of each cluster. The performance of ML approaches in this study will be benchmarked against economic growth measures from PSM before multi-step forecasting of economic development and changes in production structure are performed. |

Chucknorris Madamombe chuckygari@gmail.com Afriadi Group (Pty) Ltd | Review and Analysis of Swarm Based Algorithms for Optimization | Due to their powerful and resourceful performance for solving difficult optimization problems, swarm-based algorithms have been of much interest to many researchers in the scientific domain. All these swarm-based algorithms have been inspired by the natural behavior of swarms of biological organisms e.g. animals, birds, bacteria, insects, fish and amphibians. It has been shown that these organisms provide a unique set of characteristics that can be used to design new swarm algorithms. Thus, the fascinating activities that are observed on a day to day basis in nature has been used as the basis for the formulation of new techniques for solving sophisticated problems in real life. A surfeit of swarm-based algorithms have been proposed since 1992 when they were first published. These swarm-based algorithms have been successfully used to solve sophisticated real-life optimization problems. Even though each of these swarm-based algorithms is supported by an analogy from nature, based on some nature-inspired metaphor, their mathematical/algorithmic models are almost similar or at least share significant overlap. The initial phase of the proposed study will be to conduct an extensive literature review on the available swarm-based algorithms. A total of 80 swarm-based algorithms will be listed. The study will be further narrowed down to review only the most popular algorithms based on Google Scholar citation counts. Only 65 most popular swarm-based algorithms will be reviewed. The review will cover the background (source of inspiration) of each swarm-based algorithm, the mathematical model as well as the algorithmic model of each swarm-based algorithm. The major focus of the proposed study is to examine the mathematical models of each algorithm and to draw out similarities and differences from these swarm-based algorithms. The descriptions of these swarm-based algorithms will be as extensive as possible. The main goal of this research is to identify and categorize swarm-based algorithms for optimization based on different views such as nature-inspired view, application class view, optimization problems class view, computational complexity view and mathematical/algorithmic model view. A critical review of swarm-based algorithms will be done with reference to these different views. The critical review will develop a taxonomization based on the different views. The second goal of this research is to conduct an extensive empirical analysis of these algorithms on a large benchmark suite of continuous-valued, single-objective, static, boundary constrained optimization problems. The goal of the empirical analysis is to conduct a control parameter sensitivity analysis from which best values of the control parameters can be derived. The other goal of the empirical analysis is to identify the best algorithm(s) for specific optimization problem classes based on different performance criteria. The computational complexity, i.e. the actual execution time as well as the asymptotic complexity analysis, of each algorithm will be examined. |

Gary Pampara gpampara@gmail.com CircuitHub Inc. | Particle Swarm Optimisation for Dynamically Constrained, Dynamic Optimisation Problems | Real-world problems are usually time-dependent problems, which change over time. Due to the dynamic nature of real problems, it is important to be able to optimize problems that also change over time. The natural extension to problems that change over time are problems that change over time but may or may not have valid solutions when considering other factors about the optimization problem. Relating this class of dynamic yet constrained optimization problems back to the real world allows the comparison of such dynamic constrained optimization problems to the real problem of the stock market. Stocks are bought and sold on the stock market every day and at far greater velocities than a few decades ago. When a prospector attempts to acquire some stocks, a number of constraints are immediately present. Firstly, the prospector may only have a fixed budget from which to buy stocks. Secondly, the stock market is a dynamic environment with the price of stocks fluctuating frequently. Lastly, constraints exist for the stock purchasing on the stock market itself, such as a limited supply of stocks. When the prospector attempts to purchase stocks, a balance must be struck between the volatility of the dynamic environment (influencing the number of stocks the prospector can afford to purchase) and the actual availability of stocks (which constrains the number of any stocks that may be purchased). Dynamic constrained optimization problems define the combination of problem space changes and possible constraint changes. |

Zander Wessels zander@nmrql.com NMRQL Research | A Walk-Forward Multi-Factor Machine Learning Investment Process | The investment management industry is going through a paradigm shift: from biased and expensive human-centric investment decision making, to unbiased, scalable, adaptive, and testable algorithmic investment decision making at lower costs. This shift is being driven by cutting-edge machine learning algorithms, large amounts of structured and unstructured data, and processing power. Thus, the goal of this thesis is to propose an online collective intelligence framework where online machine learning algorithms and fundamental financial models can develop different views on the securities and assets in question. After the algorithms have voted on which assets they believe will go up or down in the future, portfolios can be constructed using heuristic algorithms, e.g. PSO. Because these models are unbiased and behave reliably, they can be simulated robustly through time. These simulations accounts for survivorship bias, lookahead bias, transaction costs, market impacts, liquidity risk, and risk management. |