SAMOVA 1.0
A program to define by a simulated annealing approach the genetic structure of populations
The program SAMOVA 1.0 implements an approach to define groups of populations
that are geographically homogeneous and maximally differentiated from each other.
As a by-product, it also leads to the identification of genetic barriers between
these groups. The method is based on a simulated annealing procedure that aims
at maximizing the proportion of total genetic variance due to differences
between groups of populations (SAMOVA, Spatial Analysis of MOlecular VAriance).
The method is described in Dupanloup, Schneider and Excoffier (2002).
SAMOVA 1.0 runs on PC. The Apple version is not available yet.
SAMOVA 1.0 takes two input files.
The first one (*.geo) must contain the geographic coordinates
of the sampling localities of your populations. The second one (*.arp) is in fact an Arlequin
input file (called Arlequin project file) containing the genetic data in your populations. The Arlequin
file must have the same name as the geographical file with the extension (*.arp).
The order of the populations in the two input files MUST BE THE SAME !!!
The file containing the geographic coordinates of the sampling localities of your populations
must have the *.geo extension. Important notice: SAMOVA 1.0 does not work if two
sampling localities have the same geographical coordinates.
The geographical input file must be structured the following way.
Each line corresponds to a population.
Each line must contain five fields separated by a tab character:
- an integer number corresponding to the line in the file
- the name of your population within quotes
- the longitude of your sampling point
- the latitude of your sampling point
- an integer (for example, 1).
Example of geographic file inputdata.geo :
1 "Egyptiens" 31.23 31.03 1
2 "Tunisiens" 10.13 36.5 1
3 "Albanais" 15.01 41.55 1
4 "Lithuaniens" 23.2 55.51 1
In the Arlequin project file, the order of the populations, which means the order
in which the genetic data in your samples is defined, MUST BE THE SAME than in the file
containing the geographic coordinates of your sampling points.
For more informations on Arlequin project files, you can download
Arlequin program (Schneider et al., 2000) and
Arlequin help file through Arlequin web site.
Example of Arlequin project file (for the same populations listed in
the geographical input file) inputdata.arp :
#AMOVA analysis
[Profile]
Title="A New Sample File Designed To Compute AMOVA"
NbSamples=4
GenotypicData=0
DataType=DNA
LocusSeparator=WHITESPACE
MissingData='?'
[Data]
[[Samples]]
SampleName="Egyptiens"
SampleSize=2
SampleData= {
Egy1 1 AAAAAAAAAAAAAATTAAAA
Egy2 1 AAAAAACCAAAAAATTAAAA
}
SampleName="Tunisiens"
SampleSize=2
SampleData= {
Tun1 1 TTTTTTTAAAAAAATTAAAA
Tun2 1 AAAAAACCAAAAAATTAAAA
}
SampleName="Albanais"
SampleSize=2
SampleData= {
Alb1 1 AAAAAGGGAAAAAATTAAAA
Alb2 1 AAAAAACCAAAAGATTAAAA
}
SampleName="Lithuaniens"
SampleSize=2
SampleData= {
Lit1 1 AAAAAAAAAAGGGATTAAAA
Lit2 1 AAAAAACCAAAAAATTAAAA
}
SAMOVA needs:
- the name of the input files (for example: inputdata, in this case, you
MUST have in the directory containing the soft the 2 inputfiles used by SAMOVA and
these files MUST be called inputdata.geo and inputdata.arp).
- the number K of groups of populations you wish to define (the final structure
defined by SAMOVA will contain K groups)
- the number of simulated annealing processes you wish to perform (100 seems a
good choice)
- the type of molecular distance between haplotypes you want to compute (SAMOVA
like AMOVA is based on a matrix of distances between haplotypes observed in the whole
set of samples). With this option, you can choose between pairwise differences
between haplotypes (for DNA data) or sum of square size differences between
haplotypes (for microsatellite data).
When the SAMOVA window disappears from your screen that means that the computations
are finished. It takes time and this time depends on the number of populations you
have and the number of simulated annealing processes you wish to perform.
A set of output files are created by SAMOVA:
- SAMOVA_results_arlequin.txt: the genetic structure
defined by SAMOVA as well as the fixation indices corresponding to this
group structure and their significance level evaluated by 1,000 permutations of
populations among groups.
- SAMOVA.log: this file contains all the steps done
by SAMOVA and, in case of problems, the location of the problems.
- SAMOVA_finalstructure.arp: an arlequin project file
created by appending the input arlequin project file with the genetic structure
defined by SAMOVA.
- SAMOVA_results.ps: this files (eps) can be read with
GSview for Windows or Adobe Illustrator 7.0; it contains
a map of the sampling points and the barriers between the groups of populations defined
by SAMOVA.
- Arlequin.log: this file is generated during
the computation of the fixation indices corresponding to the genetic structure
defined by SAMOVA. It contains all the run-time WARNINGS and ERRORS encountered during this computations.
- Dupanloup, I., Schneider, S., Excoffier, L. (2002)
A simulated annealing approach to define the genetic structure of populations.
Molecular Ecology 11(12):2571-81.
See also:
- Excoffier, L., Smouse, P., Quattro, J.M. (1992) Analysis of molecular variance inferred
from metric distances among DNA haplotypes : application to human mitochondrial DNA restriction data.
Genetics 131: 479-491.
- Schneider, S., Roessli, D., Excoffier, L. (2000) Arlequin: A software for population genetic data.
Genetics and Biometry Laboratory, University of Geneva, Switzerland.
Isabelle Dupanloup,
Dipartimento di Biologia,
University of Ferrara