doggie dnaprint® 1.0 background

DNAPrint Genomics, Inc. has established itself as a pioneer in the development of innovative consumer genomics testing products. Many of our consumer testing products, such as AncestrybyDNA 2.5, EurasianDNA 1.0 and European DNA 2.0 were enabled by the recent completion of the sequencing phase of the human genome project. These products allow customers to measure their genomic ancestry admixture, or BioGeographical Ancestry – the genetic component to “race” or “ethnicity”.For example, with AncestrybyDNA 2.5 one customer may register as 90% European, 10% sub-Saharan African admixture and another may register with 100% East Asian genomic ancestry. The human genome sequence provides test developers with a database of genetic markers from which an admixture panel can be constructed and then validated.Until recently, it has not been possible to make similar measurements for species other than homo sapiens because genome sequence drafts for these other species have not been available.

This scenario has recently changed. As of April 2007, 7.6X coverage of the dog genome has been made publicly available. Positions along the dog genome that vary from dog to dog, called Single Nucleotide Polymorphisms (SNPs), have been databased as well.On July 2004, the CanFam1.0 draft of the dog SNP collection was released, and in May 2005, the CanFam2.0 of the dog SNP collection was released. Both sets have been deposited in the publicly available National Center for Biotechnology Information (NCBI) database.

As a result of the release of this data, and our own internal R&D efforts, we are pleased to offer Doggie DNAPrint® 1.0 to the general public. Doggie The SNPs we target with Doggie DNA were selected from the Canine Genome SNP database based on their information content for dog ancestry and breed.

Breed Data

Scientists at UC Berkeley and Seattle’s Fred Hutchinson Cancer Research Center have carried out analyses of various breeds using large SNP panels (called “chips”) as well as with sets of other polymorphisms called microsatellites. For example, in 2004 Parker et al. sequenced 75 SNP and 96 microsatellite loci in 85 dog breeds and showed that modern dog breeds are distinct genetic units and that breed can be accurately determined from dog DNA. Looking back relatively far in the dog family tree, Parker et al. noted that there appear to be 4 main dog breed types:

  1. The wolf-like (yellow in the K=4 part of Figure 1)
  2. The Herders (green in the K=4 part of Figure 1)
  3. The Hunters (red in the K=4 part of Figure 1), and
  4. The Mastiff (blue in the K=4 part of Figure 1)

Figure 1 shows some of their data (Figure 3 of Parker et al., 2004), where dog breeds are clustered based on their affinity with the founders or parental dog populations for each of these 3 ancestral groups. Dog breeds such as the Akita, Chow Chow and Siberian Husky fall into the Wolf-Like group, Collies, Greyhounds, Borzoi etc. fall into the Herder group, Beagles, Pointers and Terriers (etc.) fall into the Hunter group and Bulldogs, Mastiffs, Boxers (etc.) fall into the Mastiff group.You will note that, due to the history of their origins, each breed is characterized by a unique ratio of admixture among these families.

Elements of dog ancestry

Figure 1. Results from Parker et al., 2004’s analysis of 85 dog breeds with 75 SNP and 96 microsatellite loci. The analysis involves the two most (K=2), three most (K=3) and four most (K=4) basic elements of dog ancestry. Each column represents an individual dog and each element of dog ancestry by a unique color. The proportion of colors for each column represents the ancient dog ancestry admixture for each dog. This assay is exceptionally powerful for breed analysis when more complex population models are used, but would be significantly more expensive than Doggie DNAPrint® 1.0 and is currently only run for medical and academic research.

A downside of the Parker et al. assay is that it uses microsatellites, which are cumbersome and expensive to analyze., which are cumbersome and expensive to analyze. Because of this expense, and because academic institutions that developed it are not in the business of consumer genetic testing, the Parker et al.., panel is not available for dog breed testing by lay-customers. More recently, a US company has developed DNA chips containing large numbers of dog SNPs (two types, one with 26,000 SNPs and another with 125,000 SNPs).These chips would seem to provide the highest possible quality data for breed admixture assessment, but panels with such a large number of markers are expensive to run and in most cases of dog ancestry, the extra expense is not justified. The type of ancestry admixture we measure in modern-day dogs is quite old and old ancestry is distributed evenly over chromosomes. With an even spread of ancestry, there is less of a concern about missing “chunks” of ancestry larger marker panels promise to capture. For example, if 20% of a dog's ancestry is “Group III”, 20% of the dogs markers spread among its chromosomes should be derived from Group III, whether 200 markers are measured (40) or 200,000 markers are measured (40,000). We gain precision with the larger marker set, but as described in the textbook written by DNAPrint Genomics' Chief Scientific Officer (Frudakis, 2008), the number of markers required to achieve reasonably small standard deviations in admixture estimates is not large to begin with, and beyond a few tens of markers of high information content, the decrease in statistical error drops off exponentially.As with the microsatellite based panel of dog markers just discussed, these chips were developed not for lay-customers but for medical researchers and for the academic research community, price is less of a concern than for individual lay-customers. The costs would also be high for a company developing a chip-based dog breed testing service. Any company developing a dog-breed assay needs to build a database

The Test

Recognizing the need for an affordable, mass-production dog breed test, scientists at DNAPrint® scoured the dog genome database and selected 204 especially informative SNPs. This small number of SNPs can be read efficiently with high-throughput instrumentation and so the low cost to DNAPrint®, and hence the customer, renders the analysis within reach of many dog owners. Each Doggie DNAPrint® 1.0customer receives a breakdown of their dog's deep, or ancient ancestry, which considers the 4 most basic branches of the canine family tree, as well as a more detailed breakdown of their dog's more recent, breed-specific ancestry.

With respect to the basic and ancient breakdown of the dog family tree into Wolf/Herder/Hunter/Mastiff elements, Doggie DNAPrint® 1.0 results are very similar to those reported by Parker et al., in 2004 (Figure 2). In Figure 2, each dog is shown as a column of colors, and the colors indicate the % of ancestry for each type of dog ancestry. We can call these elements of dog ancestry by colors, numbers or we could give them names as Parker et al., 2004 did - though the names are fairly arbitrary. Note that each breed is characterized with its own unique proportions. For the most part, breeds belonging to a given family in the Parker et al., 2004 analysis belong to the same family in the 4-population Doggie DNAPrint® 1.0 analysis (for example, compare Chow Chows, Siberian Huskies, Collies, Shetland Sheepdogs, Beagles, Spaniels, English Bulldogs and Bulldogs between Figures 1 and 2). There are some differences between our Doggie DNAPrint® 1.0 analysis and that described by Parker et al., 2004. For example, Doggie DNAPrint® 1.0 groups German Shepherds with Hunters while Parker groups with Mastiffs. Doggie DNAPrint® 1.0 groups Belgian Sheepdogs with Hunters while Parker groups this breed with Herders. There are a few other differences like this, which the careful reader will be able to find, but for the most part the results are largely concordant. The differences may be due to inter-individual differences within breeds, and sampling effects (both analyses use fairly small samples for each breed, to accommodate a large number of breeds), or may be due to the differences in information content between SNPs (Doggie DNAPrint® 1.0) and microsatellite markers (Parker et al., 2004).

Figure 2. Doggie DNAPrint® 1.0 analysis of 163 dogs from 79 reference breeds. Each column of stacked colors represents a different dog. Each of four basic elements of ancient dog ancestry are shown with colors, and the proportion of colors for each dog represents the ancestry mix for that dog. Dogs of each breed are grouped together with black lines separating each breed (note that the sample size of each breed is different). With this analysis, we are using Doggie DNAPrint® 1.0 to peer back farther in time to more ancient ancestry affiliations than we do with more detailed analysis, as shown in Figure 4. Dogs of the same breed show similar proportions, and most breeds are characterized by a unique ancestry composition, though many of the breeds share similar ancient origins as indicated by similarity in proportion of colors. Blue – Wolf-like ancestry, Yellow – Hunter ancestry, Red- Herder ancestry and Green – Mastiff ancestry. Breeds are listed by number in the legend above and below the plot, as well as below the plot itself, and lines connect the breed in the legend with the dogs in the plot

Doggie DNAPrint® 1.0 assessments of canine ancestry with respect to this ancient view are robust, meaning that dogs of one breed type consistently from run to run. For example, consider the analysis in Figure 3. On the left hand side are the same dogs and breeds shown in Figure 2, and on the right hand side we have a set of customer results, as well as test dogs of various breeds (not present in the reference sample of Figure 2 or the left hand side of Figure 3). The test Chow Chows for example, typed similarly, as did the Collies, Rough Collies, Siberian Huskies etc. In fact, the results per breed are essentially identical between the reference and the test samples.

Figure 3. Doggie DNAPrint® 1.0 analysis of test dogs using the ancient snapshot of canine ancestry. Reference samples shown in the left hand side of the figure are those shown in Figure 2, and as in this figure, each column of colors represents an individual dog with black lines separating breeds. On the right hand side of the solid black line, we have a set of customer samples as well as a large number of test dogs of known breed. The purpose of this figure is to compare the results for these test dogs to those of the same breed among the reference set on the left. Many of the breeds subject of this test are indicated by boxed names, and lines connect the dogs of each given breed within the reference set to those within the test set. The proportions of colors are essentially identical for dogs of the test and reference samples. In this analysis, a Green color represents Wolf ancestry, Red represents Herder, Blue represents Hunter and Yellow represents Mastiff. Most modern-day breeds are affiliated with the Hunter group, and most of the customers' dogs run in this particular analysis show predominantly Hunter ancestry (though each is characterized by its own proportion of colors).

Results such as these indicate that the assessment of ancient dog ancestry with respect to this deep, 4-population snapshot of dog history are reliable in that ancestry proportions are highly characteristic for each breed, as we expect, since each modern-day breed has a unique origin in time/space as well as admixture among ancient dog breeds.

More informative analyses

If you imagine the dog family tree starting with a single trunk, growing into large branches, smaller sub-branches and eventually out to leaves (today’s breeds), this 4-population analysis is essentially taking a snapshot of the dogs ancestry at the base near the trunk. This is very interesting for understanding your dog's ancient ancestry, but is not very useful for estimating your dog's breed or breed admixture. For example, we can see in Figure 2 that many of the breeds share similar proportions of Hunter deep ancestry. Breeds such as Spaniels, Beagles, Pointers, Terriers and Hounds were all derived from the same ancient dog population (Hunter) and it is often not possible to unambiguously identify to which breed a dog belongs using the proportions of Hunter/Wolf-like, or Hunter/Herder etc. To assess breed, we must perform an analysis that takes a snapshot of more recent dog ancestry – assessing the sub-branches of the dog family tree, rather than the main branches.

The most likely breakdown of our reference dog sample using our 204 canine SNPs reveals 15 different elements of more recent dog ancestry. This was determined by computing the probability that the sample broke out into 5, 6, 7.…20 elements and choosing the breakdown with the highest probability. These more recent elements of dog ancestry are useful for inferring dog breeds, as we know them today. With Doggie DNAPrint® 1.0, each breed shows a unique mix of the 15 types of dog ancestry, and more similarity is observed among dogs of a given breed than among dogs of different breeds (Figure 3). Inferring the breed type with these 15 elements of ancestry therefore, is a matter of matching the 15-element ancestry profile of an unknown dog with that for the most closely matching breed, or, in the case of mixed breeds, with the most closely matching breeds. The 15 elements of dog ancestry we detect with Doggie DNAPrint® 1.0 are shown with different colors in Figure 4 (note that some of the colors are similar to one another). For example, Siberian Huskies are characterized with a type of canine ancestry indicated with a peach color, while Bulldogs are characterized with a type indicated with a dark green ancestry. Rough Collies are characterized with a type of dog ancestry indicated with light grey color, while Dachshunds are characterized with a type of ancestry indicated with a darker grey color. Tests similar to that of Figure 3 have shown that the proportion of the 15 elements of canine ancestry (colors in Figure 4) is highly characteristic for each breed.

How might a customer use this information?

Elements of dog ancestry

Figure 4. Doggie DNAPrint® 1.0 analysis of test dogs using a more recent snapshot of canine ancestry which recognizes 15 basic elements. The figure is laid out precisely as shown in Figure 2, where each of the 15 elements of dog ancestry are shown with colors, and the proportion of colors for each dog represents the ancestry mix for that dog. Dogs of each breed are grouped together, black lines separating each breed (note that the sample size of each breed is different).

How to infer the actual breed – the Doggie DNAPrint® 1.0 database method and problems with other methods based on breed assignment

With Doggie DNAPrint® 1.0, the measurement of your dog's affiliation with more recent elements of dog ancestry can be used to infer the breed and/or breed admixture. This is a different, and fundamentally superior method of breed assessment than used by most other purveyors of canine genetic tests, which perform breed assignment. With assignment methods, a genetic distance between your dog and the average member of each breed is measured, and the dog is then assigned to the breed with the lowest value. The problem with measures based on genetic distance are that genetic distance can arise by admixture as well as phylogenetic history, and as Parker et al.., 2004 put it, “the true evolutionary history of dog breeds is not well represented by … (such methods)… because existing breeds were mixed to create new breeds…”. In other words, using genetic distance measures, a dog of mixed breed may be assigned to a third breed rather than to either of the correct breeds, and there is no good way to assign the dog to a breed mixture. Again, as Parker et al., 2004 stated, “methods based on genetic distance matrices lose information by collapsing all genotype data for pairs of breeds into a single number.” In lay terms, this is to say that it is overly simplistic to assign any dog to a breed based on a single number, such as a measure of genetic distance. Rather, we must assess the dogs ancestry in terms of elements of dog ancestry inferred from global analyses, using many breeds (such as those shown in Figure 4), and then use the proportion of ancestral elements within each dog to infer the most likely breed and/or breed admixture. This is accomplished using databases. If your dog shows a pattern of ancient dog ancestries characteristic of Huskies, for example, searching a database with the dog's proportions should reveal that other Huskies share similar proportions, to the exclusion of other breeds. If your dog is a Husky/Bulldog mix, searching the database would reveal that other Husky/Bulldog mixes show similar proportions, and that purebred Huskies and Bulldogs show less similar proportions, whereas other breeds and breed mixtures show vastly different proportions. The customer can then infer that their dog is most likely a Husky/Bulldog mix. DNAPrint®’s founder pioneered this empirical method and described it extensively in his textbook Molecular Photofitting – Predicting Ancestry and Phenotype using DNA (2008).

Each Doggie DNAPrint® 1.0 customer receives an ancient (4-population) ancestry breakdown of their dog, a more recent (14-population) ancestry breakdown, and access to our database of purebred and mixed dogs for searching using their recent (14-population) ancestry breakdown as a query. From the results of this on-line query, the customer can conclude their dog's most likely breed and/or breed mixture. This method of inferring breed/breed admixture using matches of empirically observed elements of canine ancestry is fundamentally more accurate than assignment methods based on genetic distance. We expect fewer false positive and false negative results, and the method easily accommodates missing breeds in the database. Thus, if your dog'ss breed is not a part of our database, their 14-population breakdown is not affected, but query of the database should reveal that there are no breeds with high score matches. In this case, as we add more breeds and breed mixtures to the database over time, the customer can repeat their on-line query (for free, at their leisure) and obtain updates of increasing accuracy.

Example of Doggie DNAPrint® 1.0 in action

The author owns two purebred beagles – Daisy and Duke. Prior to developing Doggie DNAPrint® 1.0, the members of the family had often remarked how Duke resembles a Basset Hound, whereas Daisy looks like most other Beagles we had met in the dog parks. Note in Figure 5 that Duke has a more elongated snout and a slightly different skull shape than Daisy, whose features more closely resemble the typical Beagle. At these dog parks, we learned that there are at least 2 types of Beagles, with one likely bred more recently in human history using crosses to achieve more of a scent-hound variety of Beagle. We had thus always assumed that Duke represented one of these “scent hound” varieties. Doggie DNAPrint® 1.0 gave us a chance to test this hypothesis. In Figure 4 you can see that Basset Hounds are typified by a large proportion of grey, a moderate amount of blue and a small amount of yellow canine ancestry and that this particular set of grey/blue/yellow proportions are fairly characteristic of this breed. Duke is the first bar among the Beagles in Figure 4 and Daisy is the second. Their proportions, as well as those of the Basset Hounds are reproduced in Figure 5 for easier viewing.

Duke and Daisy

Figure 5. The author's dogs, Duke and Daisy, both Beagles, show different morphology, with Duke exhibiting a longer snout and a more spherical skull shape, somewhat reminiscent of that for the typical Basset Hound. Doggie DNAPrint® 1.0 results using the 15 basic elements of canine ancestry appear to confirm that Duke is more genetically similar to Basset Hounds. Note that Duke possesses significant “grey” ancestry, typical of Basset Hounds whereas Daisy does not.

One will note Duke's canine ancestry includes a significant level of grey ancestry, which is typical of Basset Hounds whereas Daisy does not. Note also that both Duke and Daisy cluster clearly with Beagles, based on the presence of predominantly “red” ancestry, and that the ear length of both Duke and Daisy are typical for Beagles, as is the coat pattern, but Bassets have much longer ears and less black/white mottling (assuring us that Duke and Daisy are indeed beagles). The matching of patterns in this way suggests that Duke indeed is more similar to the typical Basset Hound than Daisy, confirming what the family had always expected based on skull shape and snout length.

Accuracy

Error in doggie DNA is caused by the fact that the difference in sequence for the AIMs between ancestry groups is not absolute, but continuous. For example, the frequency of the minor allele of one AIM may be 0.35 in the HUNTER group, but only about 0.10 in the other groups so it provides some “power” for resolving ancestry between these two groups but not absolute power. Over 240 AIMs, the HUNTER vs. “other” power would be much better than for one AIM, but still not absolute, and we can consider this departure from absolute power a form of test deficiency. Though Doggie DNAPrint® 1.0 has exceptional power to resolve between ancient dog ancestry groups, it is not perfect and this imperfection is what we measure with the computer simulations.

Since our canine AIMs (or CanAIMs) are not linked to one another, we can easily create simulated samples in a computer, and measure the mathematical error encountered in measuring admixture with Doggie DNAPrint® 1.0 by comparing the results for these simulated samples with their expected results. Inspection of the results shows that the average simulated 100% Hunter sample, for example, registers with about 97.8% Hunter ancestry and erroneously with 0.8% Herder, 0.7% Wolf-like and 0.7% Mastiff ancestry (Table I). Thus, the average sample of primarily Hunter ancestry is expected to be about 2.2% away from the real value. The variance (VAR) of the estimates for samples primarily of Hunter ancestry is about 0.02%, and the mean squared error (MSE) about 0.07%. Considering both non-admixed and admixed individuals of all possible types of ancestry we can see from Table I that the average estimate is biased by 2.5%, with a variance of 0.06% and a mean square error of 0.12%, indicating that the estimates are very solid indeed – significantly more accurate than those for human subpopulations obtained with human Ancestry Informative Markers.

Table I. Average results for various simulated samples and their summary statistics.

Simulated Results

Each individual estimate is provided with its 90% confidence intervals. The correct answer may be different than the one provided for any given sample, but the correct answer on average will differ from the true value by about 3.4% (RMSE, Table I) and is expected to be within the confidence intervals shown in Table 2 90% of the time. The breadth of the average 90% confidence interval is about 9 percentage points (Table 2).

Table II. Breadth of confidence Intervals

Confidence Intervals

For interpreting Doggie DNAPrint® 1.0 results, it is more useful to understand what levels of admixture are required in order to conclude with 95% certainty that the admixture is real. Table 3 shows these values, which were obtained from simulated samples used for Table 1. From Table 3 we can see that an individual of primarily HUNTER ancestry needs to see greater than 0.2% HERDER admixture in order to safely conclude that the HERDER admixture is bona-fide, as opposed to statistical noise. As another example from Table 3, an individual of primarily MASTIFF ancestry would need to see greater than 0.3% WOLF-like admixture to conclude that the admixture was real as opposed to statistical noise. Overall, taking the average of all of these values, we can see that in general admixture levels over 0.25% are required in order to safely conclude that the admixture is real. These are exceptionally low values, and they show that, generally speaking, Doggie DNAPrint® 1.0 will easily detect the ancestry contributed by a single great-great-great-great grandparent!

Table 3. Level above which one needs to observe in order to conclude with 95% certainty that the reading is the result of bona-fide ancestry rather than statistical noise

You will note that for individual dogs of most types of ancestry backgrounds, for most types of ancestry, readings 1% or higher are significant indications of that type of ancestry, but the value differs depending on the primary ancestry backgrounds and type of admixture.

95 Percentage Confidence Intervals

References

Frudakis T. 2008. Molecular Photofitting – Predicting Ancestry and Phenotype from DNA. Academic Press/Elsiever Publishers, Burlington, MA.

Parker, H., Kim, L., Sutter, N., Carlson, S., Lorentzen, T., Malek, T., Johnson, G., DeFrance, H., Ostrander, E. and L. Kruglyak. (2004). Genetic structure of the purebred domestic dog. Science 304:1160-1164.

Copyright © 2007, 2008 DNAPrint Genomics, Inc.