None
Published Pages | peterli | Using Smilefinder to investigate genetic diversity

Using Smilefinder to investigate genetic diversity

A workflow from Guiblet et al., (2014) SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations. GigaScience 4:1.

Introduction

SmileFinder is a tool for detects diversity and divergence patterns associated with selection sweeps by evaluating allele frequencies. The documentation on this page shows how the collection of SmileFinder scripts (which have been deployed as tools here in GigaGalaxy) can be used within a workflow to identify differences in diversity between two African populations.

Workflow

The workflow above shows how the graph from Figure 1C in the SmileFinder paper can be produced by its tools. This workflow uses SmileFinder to investigate the diversity of two African populations, Biaka pygmies and Mbuti pygmies.

Tool execution

1. These two Biaka pygmies and Mbuti pygmies populations can be chosen for analysis using the Select populations tool which is available from the SmileFinder collection of tools:

Select populations returns a list of identifiers for individuals which were analysed in the Human Genome Diversity Project, for example:

2. The data for these individuals from the Human Genome Diversity Project are then processed by the Count tool:

This tool produces a file of data containing heterozygosity and FST scores which are in the format required by SmileFinder:

3. The mitochondrial and sex chromosomes can have different dynamics compared to the autosomes. For this reason, we can remove these data using the Filter tool with this condition: c2!='M' and c2!='X' and c2!='Y'.

4. For the Smilefinder tool to work properly, its input data needs to be sorted according to chromosome number and SNP position. We must first remove the column names at the top of the data set using the Remove beginning tool. Once this has been done, we can use the sort tool as follows:

5. After sorting the results using count, the data is ready to be analysed by the SmileFinder tool:

A tab-delimited file is produced by SmileFinder and these data contain the expected percentiles for extreme values of mean heterozygosities and F2FST. An example of the result generated by SmileFinder is shown here:

5. The results from SmileFinder can be plotted as a graph to show, for example, the selection in a specific gene such as CUL5 which was created using the Grapher tool: