doggie dnaprint® 1.0 background
DNAPrint Genomics, Inc. has established itself as
a pioneer in the development of innovative consumer genomics testing products.
Many of our consumer testing products, such as AncestrybyDNA 2.5, EurasianDNA 1.0
and European DNA 2.0 were enabled by the recent completion of the sequencing phase
of the human genome project. These products allow customers to measure their genomic
ancestry admixture, or BioGeographical Ancestry – the genetic component to “race” or
“ethnicity”.For example, with AncestrybyDNA 2.5 one customer may register as 90% European,
10% sub-Saharan African admixture and another may register with 100% East Asian genomic
ancestry. The human genome sequence provides test developers with a database of genetic
markers from which an admixture panel can be constructed and then validated.Until recently,
it has not been possible to make similar measurements for species other than homo sapiens
because genome sequence drafts for these other species have not been available.
This scenario has recently changed. As of April 2007, 7.6X coverage of the dog genome
has been made publicly available. Positions along the dog genome that vary from dog to dog,
called Single Nucleotide Polymorphisms (SNPs), have been databased as well.On July 2004,
the CanFam1.0 draft of the dog SNP collection was released, and in May 2005, the CanFam2.0 of
the dog SNP collection was released. Both sets have been deposited in the publicly available
National Center for Biotechnology Information (NCBI) database.
As a result of the release of this data, and our own internal R&D efforts, we are pleased
to offer Doggie DNAPrint® 1.0 to the general public. Doggie The SNPs we target with Doggie DNA
were selected from the Canine Genome SNP database based on their information content for dog
ancestry and breed.
Breed Data
Scientists at UC Berkeley and Seattle’s Fred Hutchinson Cancer Research Center have carried out
analyses of various breeds using large SNP panels (called “chips”) as well as with sets of other
polymorphisms called microsatellites. For example, in 2004 Parker et al. sequenced 75 SNP and 96
microsatellite loci in 85 dog breeds and showed that modern dog breeds are distinct genetic
units and that breed can be accurately determined from dog DNA. Looking back relatively far in the
dog family tree, Parker et al. noted that there appear to be 4 main dog breed types:
- The wolf-like (yellow in the K=4 part of Figure 1)
- The Herders (green in the K=4 part of Figure 1)
- The Hunters (red in the K=4 part of Figure 1), and
- The Mastiff (blue in the K=4 part of Figure 1)
Figure 1 shows some of their data (Figure 3 of Parker et al., 2004), where dog
breeds are clustered based on their affinity with the founders or parental dog
populations for each of these 3 ancestral groups. Dog breeds such as the Akita, Chow Chow and Siberian
Husky fall into the Wolf-Like group, Collies, Greyhounds, Borzoi etc. fall into the Herder group,
Beagles, Pointers and Terriers (etc.) fall into the Hunter group and Bulldogs, Mastiffs, Boxers (etc.)
fall into the Mastiff group.You will note that, due to the history of their origins, each breed is
characterized by a unique ratio of admixture among these families.

Figure 1. Results from Parker et al., 2004’s analysis of 85
dog breeds with 75 SNP and 96 microsatellite loci. The analysis involves the two
most (K=2), three most (K=3) and four most (K=4) basic elements of dog ancestry.
Each column represents an individual dog and each element of dog ancestry by a
unique color. The proportion of colors for each column represents the ancient
dog ancestry admixture for each dog. This assay is exceptionally powerful for
breed analysis when more complex population models are used, but would be
significantly more expensive than Doggie DNAPrint® 1.0 and is currently only run
for medical and academic research.
A downside of the Parker et al. assay is that it uses microsatellites, which are cumbersome
and expensive to analyze., which are cumbersome and expensive to analyze. Because of this expense,
and because academic institutions that developed it are not in the business of consumer
genetic testing, the Parker et al.., panel is not available for dog breed testing by lay-customers.
More recently, a US company has developed DNA chips containing large numbers of dog SNPs (two types,
one with 26,000 SNPs and another with 125,000 SNPs).These chips would seem to provide the highest
possible quality data for breed admixture assessment, but panels with such a large number of markers
are expensive to run and in most cases of dog ancestry, the extra expense is not justified. The type
of ancestry admixture we measure in modern-day dogs is quite old and old ancestry is distributed evenly
over chromosomes. With an even spread of ancestry, there is less of a concern about missing “chunks” of
ancestry larger marker panels promise to capture. For example, if 20% of a dog's ancestry is
“Group III”, 20% of the dogs markers spread among its chromosomes should be derived from Group III,
whether 200 markers are measured (40) or 200,000 markers are measured (40,000). We gain precision with
the larger marker set, but as described in the textbook written by DNAPrint Genomics' Chief
Scientific Officer (Frudakis, 2008), the number of markers required to achieve reasonably small standard
deviations in admixture estimates is not large to begin with, and beyond a few tens of markers of high
information content, the decrease in statistical error drops off exponentially.As with the
microsatellite based panel of dog markers just discussed, these chips were developed not for
lay-customers but for medical researchers and for the academic research community, price is less of a
concern than for individual lay-customers. The costs would also be high for a company developing a
chip-based dog breed testing service. Any company developing a dog-breed assay needs to build a
database
The Test
Recognizing the need for an affordable, mass-production dog breed test, scientists at DNAPrint® scoured
the dog genome database and selected 204 especially informative SNPs. This small number of SNPs can be read
efficiently with high-throughput instrumentation and so the low cost to DNAPrint®, and hence the customer,
renders the analysis within reach of many dog owners. Each Doggie DNAPrint® 1.0customer receives a breakdown
of their dog's deep, or ancient ancestry, which considers the 4 most basic branches of the canine family
tree, as well as a more detailed breakdown of their dog's more recent, breed-specific ancestry.
With respect to the basic and ancient breakdown of the dog family tree into Wolf/Herder/Hunter/Mastiff
elements, Doggie DNAPrint® 1.0 results are very similar to those reported by Parker et al., in 2004 (Figure 2).
In Figure 2, each dog is shown as a column of colors, and the colors indicate the % of ancestry for
each type of dog ancestry. We can call these elements of dog ancestry by colors, numbers or we could give them names as
Parker et al., 2004 did - though the names are fairly arbitrary. Note that each breed is characterized with
its own unique proportions. For the most part, breeds belonging to a given family in the Parker et al., 2004 analysis
belong to the same family in the 4-population Doggie DNAPrint® 1.0 analysis (for example, compare
Chow Chows, Siberian Huskies, Collies, Shetland Sheepdogs, Beagles, Spaniels, English Bulldogs and
Bulldogs between Figures 1 and 2). There are some differences between our Doggie DNAPrint® 1.0 analysis and
that described by Parker et al., 2004. For example, Doggie DNAPrint® 1.0 groups German Shepherds with Hunters
while Parker groups with Mastiffs. Doggie DNAPrint® 1.0 groups Belgian Sheepdogs with Hunters while Parker
groups this breed with Herders. There are a few other differences like this, which the careful reader will be
able to find, but for the most part the results are largely concordant. The differences may be due to
inter-individual differences within breeds, and sampling effects (both analyses use fairly small samples
for each breed, to accommodate a large number of breeds), or may be due to the differences in
information content between SNPs (Doggie DNAPrint® 1.0) and microsatellite markers (Parker et al., 2004).

Figure 2. Doggie DNAPrint® 1.0 analysis of 163 dogs from 79 reference breeds. Each column of
stacked colors represents a different dog. Each of four basic elements of ancient dog ancestry are shown
with colors, and the proportion of colors for each dog represents the ancestry mix for that dog. Dogs of
each breed are grouped together with black lines separating each breed (note that the sample size of each
breed is different). With this analysis, we are using Doggie DNAPrint® 1.0 to peer back farther in time to
more ancient ancestry affiliations than we do with more detailed analysis, as shown in Figure 4. Dogs of
the same breed show similar proportions, and most breeds are characterized by a unique ancestry composition,
though many of the breeds share similar ancient origins as indicated by similarity in proportion of colors.
Blue – Wolf-like ancestry, Yellow – Hunter ancestry, Red- Herder ancestry and Green – Mastiff ancestry.
Breeds are listed by number in the legend above and below the plot, as well as below the plot itself, and
lines connect the breed in the legend with the dogs in the plot
Doggie DNAPrint® 1.0 assessments of canine ancestry with respect to this ancient view are robust, meaning
that dogs of one breed type consistently from run to run. For example, consider the analysis in Figure 3.
On the left hand side are the same dogs and breeds shown in Figure 2, and on the right hand side we have
a set of customer results, as well as test dogs of various breeds (not present in the reference sample of
Figure 2 or the left hand side of Figure 3). The test Chow Chows for example, typed similarly, as did
the Collies, Rough Collies, Siberian Huskies etc. In fact, the results per breed are essentially identical
between the reference and the test samples.

Figure 3. Doggie DNAPrint® 1.0 analysis of test dogs using the ancient snapshot of canine
ancestry. Reference samples shown in the left hand side of the figure are those shown in Figure 2, and as in
this figure, each column of colors represents an individual dog with black lines separating breeds. On the
right hand side of the solid black line, we have a set of customer samples as well as a large number of test
dogs of known breed. The purpose of this figure is to compare the results for these test dogs to those of
the same breed among the reference set on the left. Many of the breeds subject of this test are indicated
by boxed names, and lines connect the dogs of each given breed within the reference set to those within the
test set. The proportions of colors are essentially identical for dogs of the test and reference samples. In
this analysis, a Green color represents Wolf ancestry, Red represents Herder, Blue represents Hunter and Yellow
represents Mastiff. Most modern-day breeds are affiliated with the Hunter group, and most of the customers' dogs
run in this particular analysis show predominantly Hunter ancestry (though each is characterized by its own
proportion of colors).
Results such as these indicate that the assessment of ancient dog ancestry with respect to this deep, 4-population
snapshot of dog history are reliable in that ancestry proportions are highly characteristic for each breed, as we
expect, since each modern-day breed has a unique origin in time/space as well as admixture among ancient dog breeds.
More informative analyses
If you imagine the dog family tree starting with a single trunk, growing into large branches, smaller sub-branches
and eventually out to leaves (today’s breeds), this 4-population analysis is essentially taking a snapshot of the
dogs ancestry at the base near the trunk. This is very interesting for understanding your dog's ancient ancestry,
but is not very useful for estimating your dog's breed or breed admixture. For example, we can see in Figure 2
that many of the breeds share similar proportions of Hunter deep ancestry. Breeds such as Spaniels, Beagles, Pointers,
Terriers and Hounds were all derived from the same ancient dog population (Hunter) and it is often not possible to
unambiguously identify to which breed a dog belongs using the proportions of Hunter/Wolf-like, or Hunter/Herder etc.
To assess breed, we must perform an analysis that takes a snapshot of more recent dog ancestry – assessing the
sub-branches of the dog family tree, rather than the main branches.
The most likely breakdown of our reference dog sample using our 204 canine SNPs reveals 15 different elements of more
recent dog ancestry. This was determined by computing the probability that the sample broke out into 5, 6, 7.…20 elements
and choosing the breakdown with the highest probability. These more recent elements of dog ancestry are useful for
inferring dog breeds, as we know them today. With Doggie DNAPrint® 1.0, each breed shows a unique mix of the 15 types
of dog ancestry, and more similarity is observed among dogs of a given breed than among dogs of different breeds (Figure 3).
Inferring the breed type with these 15 elements of ancestry therefore, is a matter of matching the 15-element ancestry
profile of an unknown dog with that for the most closely matching breed, or, in the case of mixed breeds, with the most
closely matching breeds. The 15 elements of dog ancestry we detect with Doggie DNAPrint® 1.0 are shown with different
colors in Figure 4 (note that some of the colors are similar to one another). For example, Siberian Huskies are
characterized with a type of canine ancestry indicated with a peach color, while Bulldogs are characterized with a
type indicated with a dark green ancestry. Rough Collies are characterized with a type of dog ancestry indicated with
light grey color, while Dachshunds are characterized with a type of ancestry indicated with a darker grey color. Tests
similar to that of Figure 3 have shown that the proportion of the 15 elements of canine ancestry (colors in Figure 4)
is highly characteristic for each breed.
How might a customer use this information?

Figure 4. Doggie DNAPrint® 1.0 analysis of test dogs using a more recent snapshot of canine
ancestry which recognizes 15 basic elements. The figure is laid out precisely as shown in Figure 2, where each of
the 15 elements of dog ancestry are shown with colors, and the proportion of colors for each dog represents the
ancestry mix for that dog. Dogs of each breed are grouped together, black lines separating each breed (note that
the sample size of each breed is different).
How to infer the actual breed – the Doggie DNAPrint® 1.0 database method and problems with other methods based
on breed assignment
With Doggie DNAPrint® 1.0, the measurement of your dog's affiliation with more recent elements of dog ancestry can be
used to infer the breed and/or breed admixture. This is a different, and fundamentally superior method of breed
assessment than used by most other purveyors of canine genetic tests, which perform breed assignment. With assignment
methods, a genetic distance between your dog and the average member of each breed is measured, and the dog is then
assigned to the breed with the lowest value. The problem with measures based on genetic distance are that genetic
distance can arise by admixture as well as phylogenetic history, and as Parker et al.., 2004 put it, “the true
evolutionary history of dog breeds is not well represented by … (such methods)… because existing breeds were mixed
to create new breeds…”. In other words, using genetic distance measures, a dog of mixed breed may be assigned to a
third breed rather than to either of the correct breeds, and there is no good way to assign the dog to a breed mixture.
Again, as Parker et al., 2004 stated, “methods based on genetic distance matrices lose information by collapsing all
genotype data for pairs of breeds into a single number.” In lay terms, this is to say that it is overly simplistic to
assign any dog to a breed based on a single number, such as a measure of genetic distance. Rather, we must assess
the dogs ancestry in terms of elements of dog ancestry inferred from global analyses, using many breeds (such as
those shown in Figure 4), and then use the proportion of ancestral elements within each dog to infer the most
likely breed and/or breed admixture. This is accomplished using databases. If your dog shows a pattern of ancient
dog ancestries characteristic of Huskies, for example, searching a database with the dog's proportions should reveal
that other Huskies share similar proportions, to the exclusion of other breeds. If your dog is a Husky/Bulldog mix,
searching the database would reveal that other Husky/Bulldog mixes show similar proportions, and that purebred Huskies
and Bulldogs show less similar proportions, whereas other breeds and breed mixtures show vastly different proportions.
The customer can then infer that their dog is most likely a Husky/Bulldog mix. DNAPrint®’s founder pioneered this
empirical method and described it extensively in his textbook Molecular Photofitting – Predicting Ancestry and
Phenotype using DNA (2008).
Each Doggie DNAPrint® 1.0 customer receives an ancient (4-population) ancestry breakdown of their dog, a more
recent (14-population) ancestry breakdown, and access to our database of purebred and mixed dogs for searching
using their recent (14-population) ancestry breakdown as a query. From the results of this on-line query, the
customer can conclude their dog's most likely breed and/or breed mixture. This method of inferring breed/breed
admixture using matches of empirically observed elements of canine ancestry is fundamentally more accurate than
assignment methods based on genetic distance. We expect fewer false positive and false negative results, and
the method easily accommodates missing breeds in the database. Thus, if your dog'ss breed is not a part of our
database, their 14-population breakdown is not affected, but query of the database should reveal that there are
no breeds with high score matches. In this case, as we add more breeds and breed mixtures to the database over
time, the customer can repeat their on-line query (for free, at their leisure) and obtain updates of increasing
accuracy.
Example of Doggie DNAPrint® 1.0 in action
The author owns two purebred beagles – Daisy and Duke. Prior to developing Doggie DNAPrint® 1.0, the members of
the family had often remarked how Duke resembles a Basset Hound, whereas Daisy looks like most other Beagles we
had met in the dog parks. Note in Figure 5 that Duke has a more elongated snout and a slightly different skull
shape than Daisy, whose features more closely resemble the typical Beagle. At these dog parks, we learned that
there are at least 2 types of Beagles, with one likely bred more recently in human history using crosses to
achieve more of a scent-hound variety of Beagle. We had thus always assumed that Duke represented one of these
“scent hound” varieties. Doggie DNAPrint® 1.0 gave us a chance to test this hypothesis. In Figure 4 you can see
that Basset Hounds are typified by a large proportion of grey, a moderate amount of blue and a small amount of
yellow canine ancestry and that this particular set of grey/blue/yellow proportions are fairly characteristic
of this breed. Duke is the first bar among the Beagles in Figure 4 and Daisy is the second. Their proportions,
as well as those of the Basset Hounds are reproduced in Figure 5 for easier viewing.

Figure 5. The author's dogs, Duke and Daisy, both Beagles, show different morphology,
with Duke exhibiting a longer snout and a more spherical skull shape, somewhat reminiscent of that for the
typical Basset Hound. Doggie DNAPrint® 1.0 results using the 15 basic elements of canine ancestry appear
to confirm that Duke is more genetically similar to Basset Hounds. Note that Duke possesses significant
“grey” ancestry, typical of Basset Hounds whereas Daisy does not.
One will note Duke's canine ancestry includes a significant level of grey ancestry, which is typical of Basset
Hounds whereas Daisy does not. Note also that both Duke and Daisy cluster clearly with Beagles, based on the
presence of predominantly “red” ancestry, and that the ear length of both Duke and Daisy are typical for Beagles,
as is the coat pattern, but Bassets have much longer ears and less black/white mottling (assuring us that Duke
and Daisy are indeed beagles). The matching of patterns in this way suggests that Duke indeed is more similar
to the typical Basset Hound than Daisy, confirming what the family had always expected based on skull shape
and snout length.
Accuracy
Error in doggie DNA is caused by the fact that the difference in sequence for the AIMs between ancestry groups
is not absolute, but continuous. For example, the frequency of the minor allele of one AIM may be 0.35 in the
HUNTER group, but only about 0.10 in the other groups so it provides some “power” for resolving ancestry between
these two groups but not absolute power. Over 240 AIMs, the HUNTER vs. “other” power would be much better than
for one AIM, but still not absolute, and we can consider this departure from absolute power a form of test deficiency.
Though Doggie DNAPrint® 1.0 has exceptional power to resolve between ancient dog ancestry groups, it is not perfect
and this imperfection is what we measure with the computer simulations.
Since our canine AIMs (or CanAIMs) are not linked to one another, we can easily create simulated samples in a computer,
and measure the mathematical error encountered in measuring admixture with Doggie DNAPrint® 1.0 by comparing the
results for these simulated samples with their expected results. Inspection of the results shows that the average
simulated 100% Hunter sample, for example, registers with about 97.8% Hunter ancestry and erroneously with 0.8%
Herder, 0.7% Wolf-like and 0.7% Mastiff ancestry (Table I). Thus, the average sample of primarily Hunter ancestry is
expected to be about 2.2% away from the real value. The variance (VAR) of the estimates for samples primarily of
Hunter ancestry is about 0.02%, and the mean squared error (MSE) about 0.07%. Considering both non-admixed and admixed
individuals of all possible types of ancestry we can see from Table I that the average estimate is biased by 2.5%,
with a variance of 0.06% and a mean square error of 0.12%, indicating that the estimates are very solid indeed –
significantly more accurate than those for human subpopulations obtained with human Ancestry Informative Markers.
Table I. Average results for various simulated samples and their summary statistics.

Each individual estimate is provided with its 90% confidence intervals. The correct answer may be different than the
one provided for any given sample, but the correct answer on average will differ from the true value by about 3.4%
(RMSE, Table I) and is expected to be within the confidence intervals shown in Table 2 90% of the time. The breadth
of the average 90% confidence interval is about 9 percentage points (Table 2).
Table II. Breadth of confidence Intervals

For interpreting Doggie DNAPrint® 1.0 results, it is more useful to understand what levels of admixture are
required in order to conclude with 95% certainty that the admixture is real. Table 3 shows these values,
which were obtained from simulated samples used for Table 1. From Table 3 we can see that an individual of
primarily HUNTER ancestry needs to see greater than 0.2% HERDER admixture in order to safely conclude that
the HERDER admixture is bona-fide, as opposed to statistical noise. As another example from Table 3, an
individual of primarily MASTIFF ancestry would need to see greater than 0.3% WOLF-like admixture to conclude
that the admixture was real as opposed to statistical noise. Overall, taking the average of all of these
values, we can see that in general admixture levels over 0.25% are required in order to safely conclude
that the admixture is real. These are exceptionally low values, and they show that, generally speaking,
Doggie DNAPrint® 1.0 will easily detect the ancestry contributed by a single great-great-great-great grandparent!
Table 3. Level above which one needs to observe in order to conclude with 95% certainty that the reading is the
result of bona-fide ancestry rather than statistical noise
You will note that for individual dogs of most types of ancestry backgrounds, for most types of ancestry,
readings 1% or higher are significant indications of that type of ancestry, but the value differs depending
on the primary ancestry backgrounds and type of admixture.

References
Frudakis T. 2008. Molecular Photofitting – Predicting Ancestry and Phenotype from DNA.
Academic Press/Elsiever Publishers, Burlington, MA.
Parker, H., Kim, L., Sutter, N., Carlson, S., Lorentzen, T., Malek, T., Johnson, G.,
DeFrance, H., Ostrander, E. and L. Kruglyak. (2004). Genetic structure of the purebred
domestic dog. Science 304:1160-1164.