Subscribe to The Occidental Observer Newsletter and be notified of updates through emails. To subscribe, go to our Subscribe Page
Racial Genetic Similarity and Difference: The Witherspoon et al. Study
May 19, 2010
May 19, 2010
scientific topic that has been often previously discussed
and at other similar sites is the biological validity of the race concept.
This, unfortunately, has become necessary, because some people, perhaps
with political motivations, assert, contrary to the evidence, that “race does
not exist” and that race is a “social construct” with “no biological
and misinterpreted finding that has been eagerly grasped at by those who preach
that “race is not real” is derived from the work of Richard Lewontin, which
demonstrated that more genetic variation exits within than between groups.
in this journal, I have explained how Lewontin’s finding in no way discredits
the race concept. However, there
are “anti-racist” activists who still claim, based on their misinterpretations
of population genetics, that it is possible for individual Europeans (“Whites”)
to be more genetically similar to sub-Saharan Africans (“Blacks”) than to other
Europeans. Until now, there has
been no formal proof that this assertion is incorrect.
I am now pleased to say that a recent scientific paper has delved into
this very topic and that the findings of this paper clearly demonstrate that the
race deniers are wrong. First, let
me give a brief introduction for the sake of clarity.
A number of
scientific studies have shown that it is possible to genetically cluster
individuals to their self-identified race with near 100% accuracy.
Further, racial categories can be determined by the genetic data even
without any a priori information about the groups involved.
In other words, racial groups can be empirically observed through genetic
analysis without any prior assumptions about these groups by the researchers.
does that imply that individual members of these races will always be more
genetically similar to members of their own racial group compared to members of
other groups? Or, are genetic clustering
and individual genetic similarity so different that this may not be always so?
Can individuals share more genetic similarity to members of other groups
rather than to members of their own group, even if everyone is properly
clustered with their self-identified race?
In other words, can there be significant genetic overlap between
individuals on the fringes of, say, the European and African clusters?
the questions asked, and answered, in the paper “Genetic Similarities Within and
Between Populations” by Witherspoon et
I will simplify the authors’ statements and analogies so as to make the
work more understandable to the broad readership; although this may mean that
certain detailed specifics are glossed over, the main “take home” points and
essential interpretations remain intact.
And, since the paper is available online at no cost, any reader
interested in delving into the scientific details can do so at their leisure.
introduced the metric “w”, which they defined as “…the frequency with which a
pair of individuals from different populations is genetically more similar than
a pair from the same population.”
In other words, what is being determined with “w” is the frequency with which,
for example, individual Whites and individual Blacks may be more similar to each
other than to members of their own race. This measurement, which is based upon
gene by gene comparisons between individuals, is different from the two
measurements of clustering that the authors compare to “w.”
Unlike “w”, the clustering measurements incorporate population-level
genetic information, and thus consider the “aggregate” qualities of the
population’s genetic information.
To put it simply, and bypassing many details, “w” compares individuals to each
other, while clustering is, essentially, comparisons of individuals to the
“genetic average” (or “centroid”) of different populations.
By crude analogy, we could consider physical traits. “W” would analogous
to how similar two individuals are to each other in height, weight, eye color,
skin color, hair color, facial features, etc.
Clustering, in contrast, is more analogous to how similar each individual
is to the average measurements of height, weight, eye color, etc. for any group.
Thus “w” can tell us how similar individuals are to each other, while
clustering tells us whether an individual is more similar to one group or
another. Clustering allows us to
“bin” (or “cluster”) individuals as belonging to one group or another.
possible for individuals from different groups to be more genetically similar to
each other than to members of their own group? More
importantly, can this occur even if all of these individuals are correctly
“binned” by genetic cluster analysis to their correct racial group?
In other words, is it possible to correctly cluster everyone to their
self-identified race, even though members of different groups are more similar
to each other than to some members of their own group?
In theory, yes, and the authors provide an example of how this may occur.
For the sake of understanding, I will simplify their explanation and
the measurement “q” represents the averaged gene frequencies for groups or for
individuals. The African genetic
average (or “centroid”) of “q” may be 0.46; the European “q”, 0.61.
This “q” measures the average frequency of different gene types at
various parts of the genome. Assume three individuals, two Africans and one
European, with their own individual “q” measurements of 0.4, 0.52, and 0.55
respectively. Consider the African
with q = 0.52. He is closer to the
African average of 0.46 than to the European average of 0.61.
Thus, he clusters with Africans; in fact all three individuals would
cluster with their identified group. Yet, at the individual level, the African
at 0.52 is closer to the European’s 0.55 value than to the other African’s 0.4
value. Thus, it would seem that
individual racial overlap can be possible even though clustering is absolutely
correct. Does this actually occur
al. (“Deconstructing The Relationship Between Genetics and Race”, Bamshad et
al., Nat. Rev. Genet. 5, 598-609,
2004.) , using 377 DNA markers in 1,056 individuals, found that in 38% of the
cases, individual Europeans were more similar to individual Asians than to other
Europeans. So it would seem that
significant genetic overlap across broad racial lines exists, even if everyone
is correctly binned to their own racial group.
But, is this really true?
Will that hold true when more markers are used?
These are the questions that the Witherspoon et al. paper attempted to address. What were their basic findings? The authors first examined the amount of genetic overlap between individual Europeans and sub-Saharan Africans using 175 markers, comparing the “w” metric with two measurements of clustering. Since clustering is a less stringent measurement than is genetic similarity (“w”), it is not surprising that, with a given number of genetic markers, there is less overlap with clustering than with “w.” For example, in the case of Africans vs. Europeans and using 175 markers, the two measures of clustering gave overlaps of 4.9% and 1.9%; in contrast, the “w” measure of similarity has an overlap of 23%. This “w” means that, given these 175 markers, nearly one quarter of the time an individual European will be genetically more similar to an African than to another European. This tracks fairly well with the findings of Bamshad, discussed above. At the same time, 175 markers were sufficient to yield clustering at an accuracy of ~95–98%.
a moderate number of markers, accurate racial clustering of individuals may not
coincide with individual members of a group always being more similar to members
of their own compared to individuals of another group.
Are the racial liberals then correct?
It is possible for a Dane to be more similar, genetically, to a Nigerian
than to a fellow Dane, even if the error rate is less than 25% of the time?
The answer is, simply put, no.
This genetic overlap between individuals from the major racial groups is
an artifact of not using sufficient numbers of markers.
authors used more and more markers to compare the three major racial groups
(Europeans, East Asians, and sub-Saharan Africans), the less stringent
clustering measurements rapidly fell to a 0% overlap, as expected from previous
studies. What about the more
stringent measurement “w”, which looks at comparisons between individuals, and
does not consider group data?
Once the authors reached 1,000 (or more)
markers, the genetic overlap between these groups essentially reached zero.
It is useful at this point to quote the authors about this fundamentally
This implies that, when enough loci are considered, individuals from these population groups will always be genetically more similar to members of their own group.
to the question of whether individual members of one group may be genetically
more similar to members of another group, they write:
However, if genetic similarity is measured over many thousands of loci, the answer becomes ‘never’ when individuals are sampled from geographically separated populations.
na´ve “anti-racist” view, actually stated at times (e.g., the
NOVA program on race),
that it is possible for individual Europeans and Africans to be more genetically
similar to each other than to members of their own race, is simply false.
Any such “finding” is simply due to
insufficient numbers of DNA markers being used.
adequate methodology, individual members of the major racial groups will always
be more similar to members of their own group than to members of other groups.
Some may not like this and deem it “racist”, but these are the scientific
reason, the authors were not satisfied with ending their study with these
findings and decided to repeat their data analysis incorporating populations
they term “intermediate” or “admixed.”
These included New Guineans, South Asians, Native Americans, African
Americans and “Hispano-Latino” groups.
Not unexpectedly, it became somewhat more difficult to distinguish
between groups, with a given number of markers, when these additional
“intermediate/admixed” populations were added.
Even with more than 10,000 markers, the “w” measurement and the
clustering measurements never quite reached zero with respect to overlap,
although the numbers were low. For
example the authors state that with 1,000 or more markers the “w” measurement
reached a value of 3.1%, meaning that even with the intermediate/admixed
populations, genetic overlap was at a frequency of less than 5%.
latter findings mean that there will always be genetic overlap between members
of more closely related groups, especially when so-called “intermediate” and
“admixed” populations are considered?
Although some people may fervently wish that 100% accurate classification
will remain impossible, except for the most widely divergent groups, this may
well not be the case. We are
entering an era in which reasonably affordable whole genome sequencing will be
possible, and with the proper methodologies, it will be possible to compare a
number of markers considerably larger than what is used in the current paper.
While 10,000 markers may not be sufficient to eliminate overlap between
all groups completely – although it does reduce the overlap to very low levels –
it is possible that larger numbers of markers, or even whole genome comparisons,
could do so. With more data, it may
well be possible to distinguish, with near 100% accuracy, between groups that
still demonstrate a low level of “w” with current data.
We must also consider the issue of genetic structure, not directly addressed in this study. Although structure can include such genetic phenomena as inversions, deletions, and copy-number variation, the major component of genetic structure is the co-inheritance of specific genes. In other words, we must consider not only the frequencies of each gene taken in turn, but the frequencies of specific genes together. For example, there are genes that code for eye color, skin color, hair color, etc. One can examine the frequency of each gene on a one-by-one basis in an individual (or group) and do all the pairwise comparisons to another individual (or group) and determine “w.” But what are the frequencies of particular combinations of gene types inherited together? For example, what is the frequency of having genes for blue eyes and blonde hair and fair skin, etc. co-inherited, rather than measuring the frequencies of each of these genes in turn and averaging the results? Genetic structure superimposes further genetic differences on top of one-by-one consideration of genes; therefore, differences between groups are going to be larger when structure is considered compared to when only frequency differences of individual genes are measured and averaged.
explain the difference between genetic similarity and genetic structure, I
present an analogy using colored marbles.
Assume that individuals of different races each have a set of marbles,
numbered from one to 100, with the marbles being of various colors.
Genetic similarity (the basis of the “w” metric) would be analogous to
comparing the marbles of two individuals one-by-one; first comparing the color
of marble #1, then #2, then #3, and so forth, on an individual basis and then
counting the total number of matches.
Genetic structure, on the other hand, would be analogous to asking if the
two individuals have similar, or even identical, combinations of colors for
specific marbles. For example,
person A may have red marbles for #1, #6, and #15; blue marbles for # 3, #10,
#33, and #95; green marbles for #7, #8, #22, and #84, and a yellow marble for
#38. If this particular, specific
combination of colored marbles is of importance, we can then ask if person B has
a similar combination. What is
important here is not the one-by-one counting of matches, but whether the whole
pattern is replicated, or almost replicated, between two individuals (or
the relation between genetic ancestry and individual phenotype? The authors
state that: “Thus it may be possible to infer something about an individual’s
phenotype from knowledge of his or her ancestry.” However, since phenotypic
traits are coded for by a number of genes smaller than that required to yield
low genetic overlap, the authors assert that there may be significant phenotypic
overlap between people of different groups.
They give an example of a trait “determined by 12…loci”, which would
yield a 36% overlap of phenotypes between individuals of different groups.
Yet, racial groups show markedly different phenotypes.
How is this so, if what the authors state is true?
two points that the authors neglect to emphasize.
First, many phenotypic traits, including racially relevant ones, have
been selected for because of their adaptive value, or the populations commonly
exhibiting these traits have been subject to genetic drift isolated from other
populations. Thus, it is not
reasonable to assume that genes that code for a particular phenotype are going
to have the same “worldwide distributions” as markers used in this study.
For example, gene alleles coding for skin color show markedly higher
frequency differences between populations than do the neutral markers used in
point is that racial phenotypes are the result of genetic structure, of many
types of traits co-inherited together. It is the sum total of all these
differences that allow for racial distinction at the phenotypic level.
Looking at individual phenotypic traits, just like looking at individual
gene frequencies, is going to provide a markedly incomplete picture of human
findings powerfully support Frank Salter’s concept of
After all, there is essentially zero genetic overlap between individual
members of different major racial groups; a member of one of these groups is
always going to be more similar to a member of their own group than to that of
another. Multiplying over the large
numbers of people that constitute racial groups yields a very substantial
Even if we
take at face value this paper’s findings concerning the intermediate/admixed
populations, the ethnic genetic interest concept holds as well. In the vast
majority of cases, individuals will be more similar to members of their own
group; overlap, while not zero, is low.
When one multiples these differences over the large numbers of people
involved, then there are very large and crucial differences of genetic interests
regardless of which populations are considered.
But that is
not all. First, consider that with
sufficient numbers of genes assayed, the small degree of overlap observed with
the intermediate/admixed groups may disappear; it would almost certainly
disappear if genetic structure is considered.
perhaps most important, the ethnic genetic interest concept is not based on
overall genetic similarity/difference, but rather on differences in frequencies
of distinctive genes, above and beyond random gene sharing.
After all, those genes that do not differ in frequency between groups do
not contribute to differences in genetic interests, because their frequency
stays unchanged regardless of the outcome of competition.
Even if an entire racial group were to die out, the frequency of these
“shared genes” would remain unchanged.
Note that measurements of overall genetic similarity, such as “w”, will
as a matter of course also include genes that do not differ in frequency between
groups. Therefore, even when “w”
shows a low degree of overlap, there may well be no overlap at all with respect
to those genes that are distinctive, that vary in frequency between populations.
To further explain the importance of distinctive genes vs. “w”, I will go back to my colored marbles analogy. Imagine that the distribution of colors for marbles 1–80 was completely random, but the colors for marbles 81–100 were specific to a person’s race. Overall similarity in marble color (analogous to “w”) would consider all 100 marbles. However, if we were to ask how the color frequencies of the marbles were to change if people of one race were completely removed from the example, we would observe that only marbles 81-100 would be affected. For marbles 1–80, since the color distribution is completely random with respect to race, it doesn’t matter if one race or another is eliminated from this marble counting exercise. Only the “population-distinctive marbles” are at issue here.
when considering competition and conflicting genetic interests between human
groups, the gene frequencies that really matter are those that exhibit
differences in frequency between the groups, not those that are randomly
distributed between the groups.
the Witherspoon et al. paper strongly supports the concept of ethnic genetic
interests, we need to remember that ethnic genetic interests is a more stringent
and specific concept than simply measuring the degree of genetic similarity.
If we are not careful, we may otherwise conclude that a group of mice
constitute a greater genetic interest for a person than does another person,
since the group of mice would contain more copies of the person’s gene sequences
than would another single person! (By some measurements, mice and humans are
~90% genetically similar.)
But this is
not the case: Genetic interests are determined by the gene frequencies that are
distinctive between humans and mice (as well as differences in genetic structure
between the two species). They are not determined by overall genetic similarity,
and they are not determined by counting the numbers of gene sequences held in
this is a crucially important paper that demonstrates that individual members of
the major racial groups will always be more genetically similar to members of
their own group than to individuals of the other major races.
The paper demonstrates the importance of using sufficient numbers of
markers in these studies, and the findings also underscore the differences
between the concepts of clustering (“binning”) of individuals into groups vs.
measurements of the genetic similarity between individual members of these
inclusion of “intermediate” and “admixed” populations prevented the genetic
overlap of cross-racial individuals from reaching zero, with a sufficient number
of markers the overlap was at a very low level.
Further, it is quite possible that when utilizing a greater number of
markers, or even a whole genome analysis, this genetic overlap may vanish
important point to consider when evaluating this (and any other) genetic study
is that genetic structure is an important part of human genetic variation that
has not yet been carefully examined, but which will likely amplify the
differences in genetic variation between human population groups.
When considering the totality of genetic structure, individual overlap
between racial population groups, including “intermediate” and “admixed” group,
will almost certainly be nil.
data from this paper support Frank Salter’s conception of ethnic genetic
interests, although we must remember that genetic interests are properly thought
of as derived from differences in the frequencies of distinctive genes, rather
than counting total copies of genes shared in common.
In the final analysis, the primary findings of this paper are a devastating blow to politically motivated assertions of “no genetic differences between human races.”
to the issue of clustering itself, there has been some controversy, which has
been laid to rest with a recent article “Geography and genography: prediction of
continental origin using randomly selected single nucleotide polymorphisms”,
Allocco et al.,
BMC Genomics 8:68, 2007;
deniers, as we know, claim that there are no genetic differences at all, of any
significance, between even the major continental racial groups.
When confronted with the ease by which people can be “binned” (or
“clustered”) into specific racial groups, the deniers bluster that such
clustering requires an enormous number of markers and/or requires the choice of
“biased” markers specifically picked because these markers are known, in
advance, to sharply vary in frequency between groups.
assertions and accusations are incorrect.
Allocco et al. have demonstrated that only 50 randomly chosen markers
(with the emphasis on random) can cluster individuals into the major continental
racial groups (Europeans, sub-Saharan Africans, and East Asians) with 95%
accuracy. The “misclassifications”
resulting in the 5% “error” rate were of two African Americans, likely of
admixed racial heritage, who were observed to be in between the European and
African clusters. The authors also
demonstrated that as few as 5 completely random markers are sufficient to yield
a 63% accuracy rate in clustering individuals into racial groups.
The authors state that “differences between continentally defined groups
are sufficiently large that even a randomly selected, minute fraction of the
genetic variation in the human genome can be used to characterize ancestral
geographic origin in an accurate and reproducible manner”, and they conclude
that their findings “argue strongly against the contention that genetic
differences between groups are too small to have biomedical significance.”
The authors also assert that the clustering methodology can be “easily
extended” for distinguishing more closely related groups and those with mixed
origins, as long as more genetic data is obtained, sufficient to make these
Much of this
type of work is freely available to the public. It would seem that the race
deniers are running out of excuses as to why they continue to promote what
amounts to fraudulent pseudo-science to an unsuspecting public.
Ted Sallis (email
him) writes on scientific issues.