BOUNDARIES

A program to test the impact of language-family boundaries on population differentiation and to evaluate the homogeneity of the genetic processes along these boundaries


Introduction Input files Running Output files References Download: authorized only

Introduction

This program permits to assess the impact of language-family boundaries on population differentiation and to evaluate the homogeneity of the genetic processes along these boundaries as described in Dupanloup de Ceuninck et al. (2000). The first estimator (delta-a) of the impact of the boundary is based on an isolation by distance (IBD) model and measures the added genetic distance between populations located on different sides of the boundary. This estimator is compared to another estimator of group differentiation (FCT) computed under an analysis of variance framework that does not assume any particular spatial structure of the populations (Excoffier et al., 1992).

Though, this method was originally developped to evaluate the genetic processes at work along and accross linguistic boundaries, it can be used to assess the impact on population differentiation of any other cultural boundaries or ecological barriers, at any geographical scale.


Input files

Boundaries takes two input files. The first one (*.geo) must contain the geographic coordinates of the sampling localities of your populations. The second one (*.arp) is in fact an Arlequin input file (called Arlequin project file) containing the genetic data in your populations. The Arlequin file must have the same name as the geographical file with the extension (*.arp). The order of the populations in the two input files MUST BE THE SAME !!!

The file containing the geographic coordinates of the sampling localities of your populations must have the *.geo extension. Important notice: Boundaries does not work if two sampling localities have the same geographical coordinates.
The geographical input file must be structured the following way. Each line corresponds to a population. Each line must contain five fields separated by a tab character:

  1. an integer number corresponding to the line in the file
  2. the name of your population within quotes
  3. the longitude of your sampling point
  4. the latitude of your sampling point
  5. the belonging of your population to one of the two linguistic groups (1 or 2).
Example of geographic file inputdata.geo :
1 "Egyptiens" 31.23 31.03 1
2 "Tunisiens" 10.13 36.5 1
3 "Albanais" 15.01 41.55 2
4 "Lithuaniens" 23.2 55.51 2

In the Arlequin project file, the order of the populations, which means the order in which the genetic data in your samples is defined, must be the same than in the file containing the geographic coordinates of your sampling points. For more informations on Arlequin project files, you can download Arlequin program (Schneider et al., 2000) and Arlequin help file through Arlequin web site.

Example of Arlequin project file (for the same populations listed in the geographical input file) inputdata.arp :
#AMOVA analysis

[Profile]
Title="A New Sample File Designed To Compute AMOVA"
NbSamples=4
LocusListing=TABLE
GenotypicData=0
LocusSeparator=WHITESPACE
DataType=FREQUENCY
Frequency=REL

[Data]
[[Samples]]
SampleName="Egyptiens"
SampleSize=250
SampleData= {
1 0.64
2 0.36
}
SampleName="Tunisiens"
SampleSize=474
SampleData= {
1 0.80
2 0.20
}
SampleName="Albanais"
SampleSize=117
SampleData= {
1 0.86
2 0.14
}
SampleName="Lithuaniens"
SampleSize=202
SampleData= {
1 0.45
2 0.55
}


Running

When Boundaries is launched, it asks you successively:

  1. the name of the geographic input file (you must type it without the extension (*.geo)).
  2. the type of geographic distances you wish to compute between your sampling points (you have the choice between Manhattan or Euclidean distances)
  3. the mode of segmentation of your boundary (either by number of segments or by length of segments)
  4. the mode of diffusion used by your populations (1 or 2 dimensions) (in case of 1 dimension, the test of IBD (see Rousset, 1997, for detailed informations) is performed by a test of correlation between the genetic distances computed as FST/(1-FST) and the geographic distances between the populations, in case of 2 dimensions, the test of IBD is performed by a test of correlation between the genetic distances computed as FST/(1-FST) and the logarithm of geographic distances between the populations).
  5. the number of permutations to be performed to test the correlation of the geographic and genetic distances (test of IBD) and the delta-a statistic (the number of permutations to test the FCT statistic is fixed to 1000).

Output files

The most important output file of Boundaries is the file called results.txt. It contains the results of the computations done by boundaries (test of IBD in the linguistic groups, values and p-values of the delta-a and FCT statistics).
A set of additional output files are created by Boundaries and listed above in alphabetical order:

  1. Arlequin.log: this file is generated during the computation of the genetic distances and the FCT statistic. It contains all the run-time WARNINGS and ERRORS encountered during these computations.
  2. boundaries.log: this file contains all the steps done by Boundaries and, in case of problems, the location of the problems.
  3. boundarysegment_i.arp: for each segment i of boundary, an arlequin project file is created by appending the input arlequin project file with a genetic structure defined as 2 groups of populations (located on either side of this segment).
  4. dgeo.txt: this file contains the geographic distance matrix between the sampling points
  5. boundaries.ps: this files (eps) can be read with GSview for Windows or Adobe Illustrator 7.0; it contains a map of the sampling points and the boundary. On these map, the width of each segment of boundary is drawn proportionally to the delta-a statistic computed for this segment.
  6. Fst.txt: this file contains the FST matrix between the populations
  7. Reg_i.xls: this file contains the list of geographic and genetic distances used to test IBD both within and between groups. This tabulated text file can be read directly by MS-Excel, for graphical output of the regressions.
  8. wholeboundary.arp: an arlequin project file created by appending the input arlequin project file with a genetic structure defined as 2 groups of populations (located on either side of the whole boundary).

References


Isabelle Dupanloup, Dipartimento di Biologia, University of Ferrara