838 F. Supp. 1054 | D.V.I. | 1993
GOVERNMENT OF the VIRGIN ISLANDS, Plaintiff,
v.
William PENN, Defendant.
District Court, Virgin Islands, D. St. Thomas and St. John.
Susan R. Via, Asst. U.S. Atty., D.V.I., Charlotte Amalie, St. Thomas, VI, for plaintiff.
Iver A. Stridiron, Charlotte Amalie, VI, for defendant.
MEMORANDUM OF DECISION
McGLYNN, District Judge, Sitting by Designation.
Before the court, in this prosecution for rape, is the government's motion in limine for pretrial determination of the admissibility of deoxyribonucleic acid ("DNA") profiling test results.
FACTUAL BACKGROUND
The defendant William Penn is charged with raping a woman on March 1, 1991. After the incident was reported, the Virgin Islands Police Department removed from the woman and her home a number of items for examination, including semen stained tissue paper, a semen stained sheet, a semen stained condom, and a cotton swab sample of the semen left in the woman's vagina. Additionally, the woman gave a sample of her blood. The FBI laboratory subsequently instituted a DNA profiling procedure using the evidence taken from the woman and her home, the woman's blood, and the defendant's blood, taken after the defendant was charged. The FBI informed the government that the DNA extracted from the defendant's blood sample matched the DNA found in the various semen stains found in the house and the woman.
PROCEDURAL HISTORY
At the hearing on the motion, the government presented the testimony of FBI Agent Robert Coffin and scientist Kenneth K. Kidd. The defense presented a witness, scientist William M. Shields, at a subsequent hearing. Agent Coffin again appeared in rebuttal and the government provided further testimony by way of an affidavit from Bruce Budowle, the FBI's Program Manager for DNA Research. Defense counsel responded with an affidavit from Shields.
WITNESS CREDENTIALS
Budowle. Bruce Budowle is the FBI's Program Manager for DNA Research in Quantico, Virginia. (Budowle Aff. ("Budowle") at 1.) Coffin. Robert Coffin has been employed with the FBI for over ten years. (R. 11/5/91 ("Coffin") at 5.) Coffin *1055 has a B.S. and M.S. in chemistry with a special emphasis on biochemistry from Ball State University in Muncie, Indiana. (Coffin at 5.) After completing his degree programs Coffin worked for three years as a biochemist for an unidentified college. (Coffin at 8-9.)
With respect to DNA profile testing, Coffin underwent a six month training period. (Coffin at 9.) The training involved taking college level courses, performing numerous test and actual cases, all under the supervision of qualified examiners. (Coffin at 9.) Coffin now works with the FBI's DNA analysis unit, part of the FBI's Washington laboratory. (Coffin at 5.) The DNA analysis unit performs DNA profile testing using evidence submitted to the unit by law enforcement agencies from across the United States. (Coffin at 6.) Coffin has performed over 130 DNA profile tests. (Coffin at 8.) Coffin remains current in the latest journals, articles, and publications relating to DNA profile testing. (Coffin at 9.)
Shields. William M. Shields has been a Professor of Biology with the State University of New York's College of Environmental Science and Forestry since 1979. (R. 11/5/91 ("Shields") at 4-5.) He teaches courses in animal behavior, ornithology, conservation biology, conservation genetics, evolution, systematic biology, and population genetics. (Shields at 5.) His research is in the field of animal behavior, behavioral ecology, evolutionary biology with an emphasis on population genetics. (Shields at 5.) He holds a B.A. in biology from Rutgers University, New Jersey and an M.S. and PhD. in zoology from Ohio State University. (Shields at 5.) Shields has published numerous paper on population genetics as well a book on the subject called Philopatry, Inbreeding and the Evolution of Sex. (Shields at 6.) As a Colorado Plateau Distinguished Scholar in Residence at Northern Arizona University, Flagstaff, Shields taught himself molecular techniques related to population biology studies. (Shields at 6.)
Since 1987 Shields has conducted DNA typing of rattlesnakes, Swallow, wolf, deer, and chipmunks. (Shields at 6.) He has received grants to do DNA typing from the United States Department of Agriculture, and the states of New Mexico and New York. (Shields at 13.) Shields published a paper on forensic DNA typing for the Promega Symposium in 1992. (Shields at 13.) He has been invited to teach DNA forensic testing to the California Association of Criminologists, the New Hampshire Public Defenders, the Maryland Criminal Defense Association, the Criminal Jurisprudence Society, and to the town of Ithaca and Tomkins County in New York State. (Shields at 14.) He has been invited to debate population genetics issues by the University of Ottawa and its law school at Carlton University. (Shields at 14.) He declined invitations to do consulting work in Germany and England (Shields at 14.)
Kidd. Kenneth K. Kidd has been with Yale University for over eighteen years. He is currently employed there as a Professor of Genetics, Psychiatry, and Biology. (R. 11/5-6/91 ("Kidd") at 148.) He has a B.A. in biology, a masters and PhD. in population genetics, which was the subject of his post-doctoral work at Stanford University and in Italy with Professor Cavalli-Forca, a preeminent population geneticist. (Kidd at 148-49.) He has focused his entire professional career on biology and genetics. (Kidd at 149.) At Yale he teaches courses in human genetics, human population genetics, molecular biology, demography, and human evolution. (Kidd at 152-154.) By 1991, Kidd had either authored himself or in collaboration with others 226 publications. (Kidd at 162.) In the ten years prior to his testifying, three quarters of his publications related to molecular biology or population genetics. (Kidd at 162-64.)
The last twenty-five years he has focused his research on human population genetics. (Kidd at 155.) When DNA and molecular technology became powerful research tools, Kidd took a sabbatical to retrain as a molecular biologist at Harvard. (Kidd at 155.) During this year he learned how to actually use the technology in the laboratory. (Kidd at 156.) At Yale he supervises graduate students who are doing laboratory research, which requires that he run a very large laboratory. (Kidd at 155.) His laboratory is one of the few in the world that focuses on *1056 both molecular biology and population genetics. (Kidd at 156.)
One of his research efforts is an attempt to identify genes that cause neuropsychiatric disorders such as schizophrenia and manic depression. (Kidd at 164.) To this end he uses a laboratory procedure that is similar to that used by the FBI when it creates a DNA profile, though his efforts have no immediate forensic applications. (Kidd at 165, 168.) His efforts also include studying DNA polymorphisms from various peoples around the world. (Kidd at 165.) His laboratory has the largest collecting of DNA from peoples as diverse as African Pygmies, Chinese, and Melanesians from the New Guinea highlands. (Kidd at 165-66.)
Kidd is one of twelve elected council members who run the Genetics Society of America. (Kidd at 150.) In the American Association for the Advancement of Science, Kid is a Fellow, an honorary elected position held by individuals who are regarded as having made significant contributions to American science. (Kidd at 150.) He was awarded this honorary position in his capacity as a human geneticist. (Kidd at 151.) He has been on the editorial board of various journals and has participated in human gene mapping workshops that were the predecessors of the Human Genome Project, an international scientific endeavor to map, locate, and sequence all pieces of human DNA. (Kidd at 151, 157.) Kidd was elected to the international body of scientists that is coordinating the Human Genome Project called the Human Genome Organization ("HUGO"). (Kidd at 151.)
In connection with the Human Genome Project and its predecessor workshops, Kidd coorganized one of the Project's international meetings attended by seven hundred scientists from around the world. (Kidd at 157.) He also sits on the Hugo Committee for Gene Mapping, a body of fifteen persons from around the world who coordinate the projects being conducted in different laboratories around the world. (Kidd at 158.)
In the early 1980s Kidd, along with Frank Ruddle, also a Yale Professor, commenced building an electronic database of all genes that have been mapped, which developed into the "human gene mapping library." (Kidd at 159.) For a period of five years Kidd was responsible for assigning identification labels to DNA probes used to help locate genes. (Kidd at 159.) In the project's final two or three years, its budget was one million dollars with a staff of seventeen. Kidd not only ran the research laboratory but was also the project's director. (Kidd at 159.)
STANDARD OF ADMISSIBILITY
In deciding whether this novel scientific evidence is admissible at trial, the court must determine whether "the expert is proposing to testify to (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact in issue." (Daubert v. Merrell Dow Pharmaceuticals, Inc., ___ U.S. ___, ___, 113 S.Ct. 2786, 2796, 125 L.Ed.2d 469 (1993); FED.R.EVID. 402, 702.)
The first requirement, "scientific knowledge," is "an inference or assertion [that is] derived by the scientific method." (Daubert, ___ U.S. at ___, 113 S.Ct. at 2795.) To reflect "scientific knowledge," the expert's testimony "must be supported by appropriate validation i.e., "good grounds," based on what is known." (Id.) The Court warned, however, that "it would be unreasonable to conclude that the subject of scientific testimony must be "known" to a certainty; arguably, there are no certainties in science." (Daubert, ___ U.S. at ___, 113 S.Ct. at 2795.) The "scientific knowledge" requirement, if met, "establishes a standard of evidentiary reliability." (Id.) The focus of this requirement is on the technique or theory that produces the result, not the result itself. (See Daubert, ___ U.S. at ___, 113 S.Ct. at 2797.)
To determine reliability, the court must consider: (1) whether the proffered "theory or technique ... can be (and has been) tested," (Id.), (2) "whether the theory or technique has been subjected to peer review and publication," (Id.), (3) if a "particular scientific technique" is the subject of the expert's testimony, "the court should consider the known or potential rate of error, (Id. (citing United States v. Smith, 869 F.2d 348, 353-354 (7th Cir.1989))), [as well as] the existence and maintenance of standards controlling the *1057 technique's operation," (Id. ___ U.S. at ___, 113 S.Ct. at 2797 (citing United States v. Williams, 583 F.2d 1194, 1198 (2d Cir. 1978))), and lastly, (4) the degree to which the theory or technique is accepted by a "`relevant scientific community,'" (Id. (quoting United States v. Downing, 753 F.2d 1224, 1238 (3d Cir.1985))).
The second requirement, that the evidence must assist the trier of fact, "goes primarily to relevance." (Daubert, ___ U.S. at ___, 113 S.Ct. at 2795.) Here, the court must determine "`whether expert testimony proffered in the case is sufficiently tied to the facts of the case that it will aid the jury in resolving a factual dispute.'" (Id. (quoting United States v. Downing, 753 F.2d 1224, 1242 (3d Cir.1985))).
In short, when deciding whether to admit scientific evidence at trial, the court must assess "whether the reasoning or methodology underlying the testimony is scientifically valid and ... whether that reasoning or methodology properly can be applied to the facts in issue." (Daubert, ___ U.S. at ___, 113 S.Ct. at 2796.)
This analysis must be done within the framework of the applicable evidentiary rules. Expert opinions based on otherwise inadmissible data are admissible "only if the facts or data are `of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject.'" (Daubert, ___ U.S. at ___, 113 S.Ct. at 2798 quoting FED.R.EVID. 703.) Further relevant evidence may be excluded "if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury...." (Id. quoting FED.R.EVID. 403.) In order to apply the Daubert standard to the FBI's DNA profiling procedure, it is first necessary to understand the structure of the DNA molecule and how it is used to create a DNA profile.
THE STRUCTURE OF THE DNA MOLECULE
Within each human somatic cell nucleus are twenty-three pairs of chromosomes. (Coffin at 18.)[1] A human inherits one set of the chromosomes from his or her mother and the other set from the father. (Coffin at 18.) Each chromosome houses a molecule of DNA. There is DNA in almost every cell of the human body, including white blood cells, skin cells, semen cells, saliva cells, and cells surrounding hair roots. (See Coffin at 19; Eric Lander, DNA Fingerprinting: Science, Law, and the Ultimate Identifier, in THE CODE OF CODES 191, 192 (Daniel J. Kevles & Leroy Hood eds., 1st ed. 1993 ("Lander").) The DNA molecule looks like a ladder twisted into a spiral, or, like a spiral staircase. (See Coffin at 20.) The ladder's vertical sidepieces are composed of phosphate and deoxyribose sugar. The crosspieces, or rungs, of the ladder are called "nucleotides." (Coffin at 21.) The most important component of the nucleotide is its organic "base." (Coffin at 21.) A base is organic material that occurs in any given nucleotide in one of four forms: adenine ("A"), guanine ("G"), cytosine ("C"), or thymine ("T"). (Coffin at 21.) In any given nucleotide, A is always paired with T while C is always paired with G. Consequently, if it is known that a nucleotide, or ladder rung, contains A on one side, then it can be inferred that that nucleotide contains T on the other side. (Coffin at 23.)
In humans, the various sequences in which A, T, C, and G (and their complements) occur along the DNA ladder create a code that determines the various physiological traits of each human being. These different "rung sequences" correspond with different physiological traits. It is as if each rung sequence is a different word, each with a different meaning. (See Coffin at 22.) There are rung sequences along the ladder that are "recognized" and rung sequences that are "anonymous." (See Kidd at 10.) Recognized rung sequences determine traits such as hair color and non-visible traits such as insulin production. (See Kidd at 10.) These recognized rung sequences are often referred to as "genes." (See Coffin at 24.) Anonymous rung sequences are inherited with the recognized *1058 rung sequences but are not known to have any effect, visual or otherwise. (See Coffin at 25.) One kind of anonymous rung sequence is a type of polymorphism called "variable number of tandem repeats (VNTRs)," whose name, as explained below, is self-descriptive. (See Coffin at 25.) Four particular VNTRs found in the genetic code are the focus of the DNA profiling process. (See Coffin at 25.)
Like recognized rung sequences, each VNTRs is composed of its own particular sequence of nucleotides. (Coffin at 26.) The number of nucleotides, or ladder rungs, that constitute the "core sequence" of each VNTRs is different for each particular VNTRs. (Coffin at 26.) For example, the core sequence of one hypothetical VNTRs that contains four nucleotides may be
T-A A-T T-A T-A
While the core sequence of another hypothetical VNTRs that contains three nucleotides may be
G-C T-A C-G.
The core sequence of each particular VNTRs, however, is identical for every person. (Coffin at 26.) Put differently, each VNTRs is like a word in the genetic code that is common to everyone. On each person's DNA ladder, normally two or more of each particular VNTRs core sequence occur in tandem. Thus, if each VNTRs is like a word, then the genetic code stutters when it speaks that word. But in each person, the number of times a VNTRs core sequence reoccurs, or repeats, varies. (See Coffin at 25.) In other words, each person's DNA code is different in how many times it "stutters" that word. One person may have eight blocks of a particular VNTRs while another person may have three blocks of that VNTRs. Obviously a VNTRs consisting of eight blocks is longer than a VNTRs consisting of three blocks. A person who has eight blocks of a given VNTRs can therefore be distinguished from a person with three blocks of that VNTRs. The following is a representation of eight blocks and three blocks of a hypothetical VNTRs with the core sequence:
G-C T-A C-G. G-C G-C T-A T-A C-G C-G G-C G-C T-A T-A C-G C-G G-C G-C T-A T-A C-G C-G G-C T-A C-G G-C T-A C-G G-C T-A C-G G-C T-A C-G G-C T-A C-G[2]
Measuring the differing lengths of particular VNTRs that occur in humans is the key concept of the FBI's DNA profiling procedure. (Coffin at 27; see Kidd at 11.) The FBI's procedure centers on four particular VNTRs, each one chosen because the degree to which their lengths vary between persons is high. (See Lander at 193.) The chance of these VNTRs occurring with identical lengths in different persons is therefore relatively low. This is significant because, as *1059 discussed below, the FBI extracts the four particular VNTRs from a human tissue sample found at a crime scene and measures them to see if they match the lengths of VNTRs extracted from a suspect's blood cells. The procedure would be less effective if the chosen VNTRs occurred with identical lengths in many people because the procedure's goal is to identify the VNTRs with distinctive lengths and which therefore distinguish the individual from whom they are derived.
THE FBI'S TESTING PROCEDURE FOR DNA PROFILING
The DNA profiling process can be divided into three parts: the laboratory procedure, matching, and applying principles of population genetics and statistics.
A. The Laboratory Procedure
The laboratory procedure is also called "restriction fragment length polymorphism analysis," because the focus of the procedure is "restriction fragment length polymorphisms ("RFLPs" or "ReFLiPs")." As noted above, the VNTRs is a type of polymorphism. As described below, the laboratory procedure entails the "fragmenting" of the DNA ladder in order to isolate VNTRs into their respective "lengths." The fragmentation is done with "restriction enzymes," which act like molecular scissors.
The procedure begins with the collecting at a crime scene of a cell sample, which may be derived from particles of human tissue, such as skin found under fingernails or a blood stain on a sheet. The DNA molecule is stable and can be preserved even in a dried blood stain. (Lander at 192.) The first step is to separate the DNA in the stain from contaminants and non-DNA components. (Coffin at 27-28; Kidd at 3.) For example, if a blood stained shirt is found at the crime scene, the technician will isolate the DNA in the blood stain from the shirt cloth fibers as well as from the blood's non-DNA components. (Coffin at 28.)
To isolate the DNA, in a procedure called "extraction," the technician breaks open any unbroken cell found in the stain and chemically extracts and purifies the DNA found in the cell's chromosomes. (Coffin at 28.) Sometimes, however, there are more than one person's cells in a stain. Such a problem can arise in a rape case where the semen sample is taken from inside the victim with a cotton swab. The swab may contain the victim's vaginal skin cells along with the rapist's semen cells. (Coffin at 28.) Semen cells can be distinguished from other kinds of cells by examining the cells through a microscope. (Coffin at 62.)[3] In such a case, with a procedure called "differential extraction," the laboratory first removes and isolates the DNA from the vaginal skin cells and then removes and isolates the DNA from the semen cells. (Coffin at 28-29.)
After the DNA is isolated and purified, the next step is to cut the DNA ladders into fragments. To fragment the DNA, a "restriction enzyme" called "HAE III" is used which, like molecular scissors, cleaves the DNA ladder. (Coffin at 29.) HAE III, an enzyme purified from a bacteria, is commercially produced and is often used by the scientific community. (Coffin at 30; Kidd at 4.) HAE III is engineered to target the nucleotide sequence
G G C Cand cleaves the DNA ladder from each chromosome between the G nucleotide and C nucleotide each time this sequence occurs, which is often. (Coffin at 30; Kidd at 4, 6.) HAE III is specific in that it will neither target nor cut the double helix at the locus of any other nucleotide sequence. (Coffin at 31.) The HAE III is invariant in that it will target the same locus regardless of from which human tissue the DNA sample is extracted and regardless of the DNA's age. (Coffin at 31.)
*1060 The DNA fragments, including the VNTRs fragments, are identified either with "Q" and a number (i.e., Q-1) or "K" and a number (i.e., K-1), according to whether the DNA fragment is taken from a known ("K") source or from a questioned ("Q") source. (Coffin at 33.) DNA fragments derived from the victim and suspects' blood are "known" fragments, while fragments derived from stains found at the crime scene or in the victim are "questioned." (Coffin at 33.) There will be more than one group of questioned fragments when more than one stain is found at the scene of the crime. In the case under consideration, for example, "[s]emen was found on a condom, ... it was also found on a piece of toilet paper and a sheet," each containing a number of stains. (Coffin at 63-64.)
Next, the laboratory separates the fragments according to size through "electrophoresis." Electrophoresis is a technique commonly used in laboratories. (See Coffin at 31.) For this technique, the technician uses an agarose gel plate that is roughly five inches long, four inches wide, and one quarter inch thick. (Coffin at 32.) The DNA fragments from all twenty-three chromosomes derived from each known and questioned cell sample are grouped into separate lanes in the agarose, much like horses at a race track starting gate. (Coffin at 32-33, 143.) Each lane corresponds to the designation of the fragments. For example, all of the fragments designated "Q-1" are placed in lane "Q-1." A thirty volt negative electrical charge is applied to the agarose plate for seventeen hours. (Coffin at 32; Kidd at 12.) Since DNA is negatively charged, the negative electric charge repels all of the DNA fragments, including the VNTRs, which consequently travel within their respective lanes toward the positive pole. (Coffin at 32; OFFICE OF TECHNOLOGY ASSESSMENT, U.S. CONGRESS, GENETIC WITNESS FORENSIC USES OF DNA TESTS 46 (Washington, DC: U.S. Gov't Printing Office, July 1990) ("GENETIC WITNESS").) The agarose plate acts like a sieve in that the shorter DNA fragments, including the VNTRs, travel away from the electric charge faster than the longer DNA fragments. (See Coffin at 32.) Consequently, when the electrophoresis is complete, the shorter DNA fragments are at a point more distant from the negative electrical charge-point than the longer fragments. This event is significant because the distance travelled by a given VNTRs depends on that VNTRs' size; the FBI can therefore measure each VNTRs' size by measuring the distance each VNTRs travels. (See Coffin at 73.)
After electrophoresis, the fragments are transferred to a nylon membrane that is approximately the same length and width as the agarose gel column. (Coffin at 34.) This process of transferring the fragments is called "Southern blotting," named after Edwin Southern, a biochemist who developed the process in 1975. The membrane retains the DNA fragments in the same positions in which they rested after travelling along the agarose gel. (Kidd at 14; GENETIC WITNESS at 46.) During this transfer, the fragments, including the VNTRs, are sliced down the middle of their respective nucleotides, or ladder rungs. (See Coffin at 34.) The result is that on the membrane, the sliced VNTRs rest among thousands of other single strands of DNA, each with its respective sequence of half-nucleotides. (Coffin at 34.) The single stranded VNTRs may be represented as:
G G T T C C G G T T C C G G T T C C G T C G T C G T C G T C G T C
The laboratory next applies the "DNA probes." The DNA probes' targets are the four target VNTRs upon which this process focuses. The four VNTRs are derived from *1061 four different chromosomes. Each VNTRs is identified by its particular location, or "locus," on the DNA ladder. (See Coffin at 143; Kidd at 54.) The name of one VNTRs is "D2S44." The "D" is an abbreviation of "DNA." The "2" refers to the second chromosome, (see supra note 1), which is where this VNTRs is found. (See Coffin at 36.) The "S" is an abbreviation of the word "segment" and "44" refers to the forty-fourth segment of DNA on the second chromosome that has been identified. (Coffin at 36.) The other VNTRs are D17S79, D1S7, and D4S139. (Coffin at 40.) Of course when the DNA probe is applied, these VNTRs are no longer parts of the DNA ladder structure, they are laid out, as the result of Southern blotting, on the nylon membrane with thousands of other single strands of DNA.
The FBI uses four different DNA probes. (Coffin at 35.) Each probe is named for the VNTRs that it targets. One therefore identifies the probe that targets the locus D4S139 as "D4S139." (See Coffin at 35.) The probe is a single strand of DNA that has a sequence of half-nucleotides that complements the half-nucleotide sequence of its target single stranded VNTRs. (See Coffin at 35; Kidd at 16.) For each VNTRs, the technician bathes the nylon membrane in a solution that contains a number of probes targeted to that VNTRs. (Government's Memorandum at 14.)
The single stranded probes then chemically bind, or "hybridize," to the target single stranded VNTRs because complementary single strands of DNA attract one another like magnets. (Coffin at 35; see Kidd at 16; GENETIC WITNESS at 46.) For example, a hypothetical probe with the sequence "CAG" will hybridize with a hypothetical single strand of VNTRs with the sequence "GTC." Each probe will attach to two single strands of VNTRs. This happens because there are only two particular VNTRs sequences to which the probe is targeted. Each of these VNTRs comes from the same pair of chromosomes, one from each of the chromosomes of the chromosomal pair from which they were derived. (See supra, note 2; Kidd at 18.) The result is partially restored DNA ladder fragments that may be represented as:
G G T T C C G G-C T T-A-probe C C-G G G T T C C G T C G-C T-A-probe C-G G T C G T C G T C
As discussed in note 2, supra, the only difference between the two VNTRs in each person's set is that they have different lengths. (See Kidd at 18.) Their core sequences, however, are identical. Both target VNTRs may be regarded as different "versions" of one another. The DNA probe therefore binds to both versions of that VNTRs. The resultant hybridized single strands of DNA probe and VNTRs are called "hybrids." Since there are two versions of each VNTRs, each probe will normally produce two hybrids, and ultimately, two "bands" on the "autorad," as discussed below. (Kidd at 49.)
So that the hybrids can be distinguished from the thousands of single strands of DNA on the membrane, a radioactive molecule, "phosphorous-32," is affixed to the probe prior to hybridization. (Coffin at 35.) Phosphorous-32 is x-ray sensitive and can expose x-ray film. (Coffin at 35.) Two sheets of x-ray film are then taped to the nylon membrane. (Coffin at 38, 95.) The phosphorous-32 labels "light" the areas on the membrane where the hybrids came to rest after electrophoresis. The sheets of x-ray film are consequently exposed at each "lighted" point. The exposure appears on the film as a dark horizontal "band." (Coffin at 38.) Each sheet of exposed x-ray film is called an "autoradiogram," "autorad," or "rad," (Coffin at 38), *1062 and it is the final product of the laboratory procedure. Once the procedure is complete, the technician chemically strips the probe off of the membrane, (Coffin at 39), and applies the other three probes (Coffin at 41; Kidd at 32-3). Ultimately, the laboratory produces four sets of two autorads, each set reflecting the distance on the agarose travelled by the four target VNTRs.
B. Interpreting the Autorad to Find a Match
The next step is to read, or to "interpret," each autorad. The autorad is an image of the nylon membrane after the DNA probe is applied; the autoradiogram is therefore a visual record of what occurred as a result of electrophoresis and the application of the DNA probe.
The three reference columns on the attached appendix represent commercially purchased viral DNA fragments that are placed on the agarose plate along with the known and questioned DNA fragments. (Coffin at 43.) Since the FBI laboratory has used duplicates of these fragments thousands of times, (Coffin at 45), it knows the length and appearance of each one. These fragments are used as "size markers" which, like yard sticks, measure the distance that the DNA fragments travel during electrophoresis. (Coffin at 43.)[4]
The FBI first determines with the human eye whether there is a match between a known sample and a questioned sample. (Coffin at 47.) In the diagram, there is a visual match between K-2 and Q-1. (Coffin at 46.) The match is due to the fact that the VNTRs in the K-2 and Q-1 lanes travelled the same distance during electrophoresis. The match therefore points to the possibility that these came from the same person. The profiles of Q-2 and Q-3, however, are very different from those of K-1 and K-2. The FBI can determine that the neither of the persons who contributed K-1, and K-2, could be the source of the Q-2 or Q-3 samples. (Coffin at 47.) In this manner, the FBI excludes approximately 26 percent of its suspects. (Coffin at 42, 47.)
It may be noted that the diagram shows two bands in each lane. There are two bands because, as discussed above, each person has two versions of each VNTRs, one from each chromosome of the chromosomal pair that houses that VNTRs. (See Coffin at 37.) Just as the length of VNTRs vary between people, the lengths of VNTRs also differ within the chromosomal pair. Because the VNTRs in each column are of different lengths, both VNTRs in each column travel different distances during electrophoresis. Hence, when the DNA probe and its phosphorous-32 label bind to each VNTRs, each resultant hybrid is expressed on the autorad in a double banded pattern. (Coffin at 37; GENETIC WITNESS at 44.)[5]
When a visual inspection of the autorad reveals an approximate match, the laboratory uses a computer imaging program to increase the accuracy of the measurements. (Coffin at 48, 75.) The program measures the hybrids according to the number of nucleotides, also called "base pairs," each hybrid contains. (Coffin at 48.) Though the program may be more accurate then the human eye, it is nonetheless an imprecise tool in that it produces measurements that vary. (See Coffin at 76.) For example, if the program were to measure the same hybrid ten times, the program could produce ten *1063 different measurements. The program's ability to err, however, is limited. It will never produce a measurement that varies from the hybrid's true length that is more or less than the hybrid's true length plus or minus 2.5 percent of that length. (Coffin at 76-77.) For example, if the hybrid's true length is 100 base pairs, then the program will never produce a measurement that is less than 97.5 base pairs or greater than 102.5 base pairs.
To compensate for the program's uncertainty, the FBI laboratory creates a confidence "window" for each hybrid measurement produced by the program. The window, commonly called the "matching widow," spans from the hybrid's measured length minus 2.5 percent of that length to the hybrid's measured length plus 2.5 percent of that length. (Coffin at 76-77.) Thus the hybrid's true length falls somewhere inside that window. The windows of each hybrid are compared to see if they overlap. (Coffin at 77.) If the window of a known hybrid and a window of a questioned hybrid overlap, then the FBI declares that the hybrids "match," meaning that they are indistinguishable. (Coffin at 49.) In the case under consideration, none of the hybrids varied in length from one another by more than 1.4 percent, well within the 2.5 percent tolerance. (Coffin at 50.)
C. Applying Population Genetics and Statistics
There are two possible reasons for a match: either because matching VNTRs fragments came from the same person, that is, the suspect and the assailant are the same person, or, the suspect and the assailant are different persons but, by coincidence, the suspect's VNTRs fragment lengths are identical to those of the assailant. (Kidd at 33.) The next step is to determine the likelihood that such a coincidence would occur. This requires a determination of the probability that a person chosen at random from a given population would have a DNA profile identical to that of the suspect's. To this end, the FBI uses a method called "fixed bin analysis."
An understanding of fixed bin analysis requires an understanding of certain concepts of population genetics, namely "assortative mating" and "population structure." "Assortative mating" is the term used to describe the tendency of humans with similar attributes to mate with each other. (Shields at 48.) For example, members of a racial group tend to mate more with other members of their racial group than they do with members of other racial groups. Similarly, members of a given religion tend to mate more with other members of their religion than with members of other religions. (Shields at 48.) Since humans pass genetic characteristics through reproduction, there is consequently a higher probability that a genetic characteristic of a group member will occur in other members of that group than it will in members of other groups. (See Shields at 48.) This phenomenon is reflected in each group's DNA, wherein each group has rung sequences that occur more often among members of that group than among members of other groups. Similarly, the lengths of VNTRs found in members of the same group are more likely to be similar or identical than are VNTRs lengths between members of different groups.
The way in which each group can be distinguished by its characteristic rung sequences and VNTRs lengths demarcates that group's "genetic structure," or simply, that group's "structure." (Shields at 48.) For the purposes of DNA profiling, the FBI classified groups of humans, each group with its own structure, according to race. In order to measure the frequency with which given lengths of the four target VNTRs occur in a racial groups, the FBI constructed racial "population databases." As detailed below, once the frequency with which the VNTRs' lengths occur in a suspect's population is known, one can determine how often VNTRs equal in length to the suspect's occur in a given racial population. The less often they occur, the more distinctive the suspect's DNA profile becomes.
To construct a population database, the FBI first makes DNA profiles of a number of members of a racial population that is statistically significant, meaning that the number *1064 is large enough to give a sufficiently accurate and valid "portrait" of the genetic makeup of the entire racial population. (Kidd at 39-40.) The FBI's caucasian database, for example, consists of DNA profiles of 750 caucasians from South Florida, Texas, and California. (Coffin at 140.) The FBI divides the agarose gel plates into thirty-one sections, or "bins." (Coffin at 78.) The size of each bin is determined by locations on the size marker bands. (Coffin at 56, 78.) The FBI examines the resultant autorads to see into which bins the four VNTRs bands of each population member fall. (Coffin at 57.) The number of bands from the whole population that fall into a particular bin determines that bin's frequency. (Coffin at 52-53.) For example, if VNTRs were derived from a hypothetical population of 100 caucasians, then nine bands occurring in a given bin would make that bin's frequency for caucasians .09. (Coffin at See 54.) Put differently, in this hypothetical example, nine percent of the caucasian population's VNTRs occurs in that bin. Each bin is assigned a frequency in this manner. Similarly, if VNTRs were derived from a hypothetical population of 100 hispanics, then six bands occurring in a given bin would make that bin's frequency for hispanics .06. The result of this counting and bin-classification of a racial population's VNTRs is a "population database."
In fixed bin analysis, the FBI looks at the suspect's autorad and determines into which bin each of the suspect's VNTRs bands fall. The FBI then assigns to each band the frequency of the bin into which the band falls. That is the band's "bin frequency." For example, if a suspect's band falls into a bin with a .09 frequency, then the frequency of the suspect's band is .09. Since each DNA probe and its attendant phosphorous-32 molecule attaches to two versions of the same VNTRs in each person, each DNA probe will be expressed on the autorad as two bands ("band 1" and "band 2"), each assigned its own bin frequency. Once the frequency of each band of a probe is determined, the laboratory can then determine the frequency of the probe itself. This is expressed by the Hardy-Weinberg Principle, (Coffin at 88), a principle upon which the science of population genetics is based that is taught worldwide in introductory probability classes, (see Kidd at 49; 5 Encyclopedia Britannica 702 (15th ed. 1985)). The formula is:
2 × p × q = probe frequency where p = frequency of band 1 q = frequency of band 2(See Gov.['s] ex. 7; 88.)
Since the FBI uses up to a total of four probes in each DNA profiling procedure, the FBI may obtain up to four matches. In such a case, the FBI will determine the "composite frequency" of the multiple probes. (Coffin at 90.)
The formula for determining the composite frequency, like that for determining a probe's frequency, is rooted in the fact that the DNA probes were affixed to VNTRs that occurred on different chromosomes. (See Coffin at 89.) In terms of probability mathematics, the respective occurrences of the distinct VNTRs on the different chromosome pairs are "independent events." (See Coffin at 89; Kidd at 50.) They are independent events because the occurrence of a given VNTRs on one chromosome pair is, as far as is known to science, not governed by the occurrence of another VNTRs on a different chromosome pair. (See Coffin at 50.) This principle is expressed in a formula based on the "Products Rule," also taught worldwide in introductory probability classes, (Kidd at 49), which states that for two or more independent events, the probability of both or all events occurring is the product of the product of the probabilities, (Coffin at 91).[6] For four probes, the formula is stated as:
*1065 2pq × 2pq × 2pq × 2pq = composite frequency where p = frequency of band 1 of each probe q = frequency of band 2 of each probeOnce the FBI obtains the frequency or the composite frequency, as the case may be, the FBI divides the product into the number one (1) in order to determine the odds that the suspect's DNA profile could randomly occur elsewhere in the suspect's racial population. (Coffin at 94.) In this case, for example, using the United States black database, the FBI determined that the odds in favor of finding a random DNA profile that matches the defendant's is one in approximately 41 million. (Kidd at 55.)
DAUBERT APPLIED
There is little question that a DNA profile is relevant to this case. A major issue here is whether the defendant was present at the scene of the crime. Evidence that links a defendant's DNA to DNA gathered at the crime scene is helpful to the jury in that it would tend to make the existence of the fact that defendant was present at the crime scene more probable than it would be without that evidence. (See Daubert, ___ U.S. at ___, 113 S.Ct. at 2796; FED. R.EVID. 401.) The disputes in this case concern whether the DNA profiling process and the way in which the FBI uses the rules of population genetics is reliable.
1. Whether the Proffered Technique Can Be (and Has Been) Tested.
The FBI's DNA profiling process can be verified because the protocol for this process is available in a document entitled "Procedures For The Detection of Restriction Fragment Length Polymorphisms in Human DNA." (See Gov.['s] ex. 10 ("Procedures For The Detection of Restriction Fragment Length Polymorphisms in Human DNA").) With the appropriate knowledge and resources, any person can create a DNA profile in a manner that is identical to that of the FBI. This technique, developed in the late 1970s and early 1980s, has clearly been tested because processes similar to the FBI's DNA profiling process are widely used for diagnostic and research purposes by the medical community as well as numerous laboratories. (Coffin at 59-60; Kidd at 20.) Kidd testified that his laboratory at Yale, along with thousands of laboratories around the world, executes a similar process on a daily basis. (See Kidd at 19-20.) Such widespread repetition and use of the process increases the likelihood that the FBI's system and the results its system obtains are not false. (See Daubert, ___ U.S. at ___, 113 S.Ct. at 2798. Accordingly, this factor weighs in favor of admitting this evidence.
2. Whether the Theory or Technique Has Been Subject to Peer Review
The FBI's DNA profiling process was subject to peer review from its inception. Before the FBI used the process in case work, it invited many scientists, including Kidd, to Quantico in order to advise and offer criticism regarding the FBI's methodology and use of standards controlling the procedure. (Kidd at 22.) Kidd testified that the FBI implemented at least one of his suggestions. (Kidd at 22.) Also, Agent Coffin stated that the FBI contacted various colleges and laboratories in order to obtain their input. (Coffin at 123.) Given that DNA profiling was subjected to such vigorous peer review, it was likely that "substantive flaws in [the] methodology" were detected." (See Daubert, ___ U.S. at ___, 113 S.Ct. at 2797.)
3. The Existence and Maintenance of Standards Controlling the Technique's Operation and the Known or Potential Rate of Error
Daubert requires the court to examine the standards controlling the DNA profiling process *1066 and the known or potential rate of error that might result from the process. (Daubert, ___ U.S. at ___, 113 S.Ct. at 2797.) In the DNA profiling context, this examination is divided into two parts. In the first part, the court examines the means by which the FBI prevents both potential sources of human error in the execution of the process and potential error caused by imperfections inherent in the process. The second part arises from the fact that the DNA profiling process involves statistics. Since statistics concern estimates, the DNA profiling process involves degrees of uncertainty. The second part of this analysis examines to what extent the FBI attempts to resolve the uncertainties in the defendant's favor.
1. Avoiding Potential Errors.
If an error, caused either by humans or imperfections in the process, prevents the FBI from reaching any stage of the process, the FBI halts the process and no further testing is done on the sample. (Coffin at 41-42.)
A. Cell Identification. Potential error can occur when the FBI identifies the source of the cells gathered at the crime scene. For example, the FBI could identify cells that come from the victim as cells belonging to the assailant. (See Coffin at 29.) Agent Coffin testified, however, that the FBI can distinguish types of cells with a microscope. (Coffin at 29.) There is no difficulty, for example, in distinguishing a sperm cell from a cell taken from the vaginal wall. (See Coffin at 29.) Using the microscope, therefore, seems to be a sufficient safeguard against the error of misidentifying the source of a cell.
B. Degraded DNA. Though the DNA molecule is a stable molecule that can survive in cell samples that are four to five years old, it is capable of degrading. (See Coffin at 117.) DNA degradation, however, is not an insidious source of error. If a DNA profiling analysis is done with degraded DNA, the autorad plainly shows streaks or other distortions where the bands would have appeared had the DNA not degraded. (Coffin at 117.) This conspicuous result prevents the FBI from making a faulty autorad interpretation caused by the use of degraded DNA.
C. Diluted or Degraded HAE III. Potential error arises from the use of the HAE III enzyme which cleaves the DNA ladder at each G-G-C-C rung sequence. (See Coffin 29-30.) If the enzyme somehow becomes diluted, the enzyme could cause error if it were to cleave the DNA ladder at unintended rung sequences without the FBI's knowledge. Agent Coffin testified that to ensure that this does not happen, the FBI verifies each "batch" of HAE III that it uses for the DNA profiling procedure. (Coffin at 122.) To verify each batch, the FBI uses a certain DNA identified as "phi X 174." (Coffin at 119.) The way in which HAE III cuts phi X 174 is known to the FBI. The FBI cuts the phi X 174 with each batch of HAE III. If the cuts are made in the places where the FBI expects them to be made, then the HAE III is sound. (See Coffin at 119-20; Gov.['s] ex. 10 p. 32-33.)
As for the suggestion that HAE III may degrade between the time it is verified and the time it is used, (Coffin at 120), degraded HAE III will simply not cut the DNA molecule. (Coffin at 130.) If there is no cutting, there will be no fragments, and ultimately, no bands. (Coffin at 130.) Consequently, if the HAE III degrades prior to its use in the procedure, then, according to Kidd, the degradation will be evident from examining the autorad. (See Kidd at 27.) The conspicuousness of this result is a sufficient safeguard against the FBI interpreting an autorad that is produced with degraded HAE III. In fact, Kidd stated that every kind of error that can be made during the laboratory procedure will become evident on the autorad. (Kidd at 82.)
D. Imprecise Hybridization. It is possible for the DNA probe to hybridize with the wrong single stranded VNTRs. (Kidd at 17.) But imprecise hybridization can occur only under abnormal laboratory conditions. (Kidd at 17.) Furthermore, if imprecise hybridization does occur, then the resultant autorad would show not just two bands in each lane, but would show multiple bands in each lane, each with different intensities. (Kidd at 26.) The chance that the FBI would *1067 interpret an autorad produced by imprecise hybridization is therefore minimal.
E. The Cell Line Control. With electrophoresis, one concern is that an undetected error might affect the distance that the DNA fragments travel through the agarose. If the fragments do not travel their "true" distances, then the procedure will produce erroneous results. To ensure that the electrophoresis is done properly, the FBI uses a "cell line control." The cell line control, which is a piece of organic matter, is placed on the agarose plate along with the DNA fragments and size markers. (Coffin at 44.)[7] The FBI uses duplicates of the cell line control in every test. (Coffin at 70.) The FBI therefore knows how far along the agarose plate the cell line control should travel during electrophoresis. (Coffin at 44.) The FBI measures that distance with the size markers. If the cell line control does not travel the expected distance, then the FBI knows that the execution of the electrophoresis was in some way faulty. (Coffin at 45-46.) In such a case, the autorad is not interpreted. (Coffin at 45.) If, however, the cell line control's location is that which the FBI expects, the FBI has at least one sign that the execution of the electrophoresis was sound. The use of the cell line control is a sufficient safeguard against the FBI interpreting an autorad produced through faulty electrophoresis. In the case under consideration, the cell line control on the autorad for probe D1S7, and for presumably the other three probes, was in the location that the FBI expected. (Coffin at 96.)
F. The Victim's Blood Sample. Another control is the victim's blood sample. The FBI compares the victim's sample collected from the crime scene with the sample drawn from the victim's blood. (Coffin at 76.) Since both samples contain the same DNA, the samples are expected to match. (Coffin at 76.) If the samples do not match, then the FBI knows something in the process has gone awry. (See Coffin at 76.)
G. Staggered Size Markers. After electrophoresis, the locations of the size markers are slightly staggered because they travel slightly different distances during electrophoresis. (See Coffin at 75.) They travel different distances because the electrical current varies in different parts of the gel. (See Coffin at 75.) If a size marker is placed in an area of the gel where the current is lower, then that size marker will travel slowly. (See Coffin at 75.) If a size marker is placed in an area of the gel where the current is higher, then that size marker will travel quickly. (See Coffin at 75.) The speedier size markers ultimately travel further than the slower size markers. Consequently, the size markers come to rest in a slightly staggered formation. (See Coffin at 75.) Since all measurements in the DNA profiling procedure are based on the size markers, the potential for error here is manifest. Accordingly, the FBI's computer assisted sizing method compensates for the irregularity of the size markers' locations, (Coffin at 75), thus rendering the size markers sound bases of measurement. In the case under consideration, the size markers on the autorads for probe D1S7, and presumably for the autorads for the other three probes, were in the locations that the FBI expected. (Coffin at 96-95.)
H. Human Proficiency. The first step in the matching stage is using the human eye to determine whether any of the questioned bands on the autorad match any of the known bands. (Coffin at 47.) The potential for error in this step plainly depends on the degree to which the human eye can draw inaccurate matches and the degree to which the human brain can draw erroneous conclusions. In fact, the potential for error in the entire DNA profiling process depends in part on the proficiency of the people by whom it is executed. (See Kidd at 83-84.) To address this concern, every quarter year the FBI gives the technicians and agents in the FBI's DNA unit an open proficiency test. (Coffin at 106-07.) The technicians and agents are *1068 also given "blind" proficiency tests in which they work in the laboratory on what they think is a real case without knowing that it is actually a test case. (Coffin at 107.) Moreover, each time an agent draws a conclusion concerning a case, the conclusion is reviewed by another agent and ultimately by a supervisor. (Coffin at 107-08.) These measures minimize the potential for error that could occur due to lack of proficiency.
I. Defective X-Ray Film. The FBI is assured by the x-ray film itself that the x-ray film used to make autorads is not defective. (Coffin at 121.) If the film is defective, it will not expose. If the film is not defective, it will expose. (Coffin at 121.) The FBI needs only to look at the film to determine whether it is defective. (Coffin at 121.) Though sometimes black smudges appear on the autorad as a result of static charges, there have been no known instances where the film is exposed by something other than the phosphorous-32 as a result of a defect in the equipment used to make the autorad. (Coffin at 121-22.)
J. Melted Bands. A long exposure to phosphorous-32 sometimes causes the VNTRs hybrid bands on the autorad to "melt," meaning that the bands become so wide and distorted that there appears to be two adjacent bands instead of a single band. (Shields at 33.) Melting bands would plainly hinder any attempt to accurately determine whether an autorad contains a match. (See Shields at 99.) To guard against this kind of error, the FBI exposes the two sheets of x-ray film that are attached to the nylon membrane for different lengths of time. (Coffin at 95.) The technician is therefore not limited to examining the longer-exposed autorad, which may contain melted bands. If the longer-exposed autorad does contain melted bands, the technician can examine the lesser-exposed autorad, which is less likely to contain melted bands. (Shields at 99.) Consequently, it can be determined whether a match exists despite the existence of melted bands. (Shields at 99.)
2. Resolving Uncertainties In the Defendant's Favor.
A. Straddling Bands. When the FBI uses the matching window to adjust the length of a band, the plus or minus 2.5 percent adjustment may cause the band to "straddle" two bins. The band "straddles" two bins in the sense that without a 2.5 percent adjustment, the band falls into one bin, but with a 2.5 percent adjustment, the band falls into a neighboring bin. (Coffin at 57; Kidd at 47-48.) When this occurs, the FBI compares the frequencies of the two bins and assigns to the band the frequency of the bin with the higher frequency. (Coffin at 57-58; Kidd at 48.) Put differently, the band is deemed to have fallen into the bin with the higher frequency. For example, if a band falls into a bin with a .07 frequency without a 2.5 percent adjustment and falls into a bin with a .71 frequency with a 2.5 percent adjustment, then the band is deemed to fall into the .71 bin. (See Coffin at 58; Kidd at 48.) The suspect's band will therefore be expressed in the final probability calculation as occurring more frequently in the population than it would had the FBI deemed it as falling into the lower frequency bin. (Coffin at 58.) In this way, the FBI resolves the uncertainty created by the straddling band in the suspect's favor.
B. Declaring A Match. Agent Coffin testified that when he interprets an autorad, if he is "99.9 percent certain" that two bands match, then he will declare the result "inconclusive." (Coffin at 102.) Only when he is "100 percent sure" will he declare the result a "conclusive" match. (Coffin at 102.) Agent Coffin demonstrated this by declining to declare the autorad for probe D17S79 a conclusive match. (Coffin at 100; Kidd at 51.) Agent. Coffin declared the autorad inconclusive for two reasons. First, the "known" band pattern for both the victim and the defendant were similar, meaning that, in one case, both the victim and defendants' target VNTRs travelled similar distances during electrophoresis. Consequently, Agent Coffin chose not to declare this autorad conclusive even though the defendant's known band pattern was "consistent" with the band patterns from the semen samples gathered at the crime scene. (Coffin at 101.) Second, in one of the lanes of this autorad, there appears to be three bands, *1069 which raised the "remote possibility" that there was female DNA mixed in with one of the semen samples. (Coffin at 101; Kidd at 80.) Though these shortcomings appear to be insubstantial, they were nonetheless sufficient to make Agent Coffin less than "100 percent sure" that this autorad was a conclusive match. (See Coffin 103-04.) On the other hand, if Kidd were evaluating this autorad for diagnostic purposes in a medical context, Kidd would have declared this autorad a match. (Kidd at 52.) This difference in opinion demonstrates two related aspects of the FBI's DNA profiling procedure: one, there is a degree of subjectivity in the conclusions that may be drawn from the procedure's results and two, the uncertainty that accompanies this subjectivity, is resolved in the suspect's favor.
C. The Matching Window. As described above, if a technician determines that two hybrids match, then a computer imaging program is used to increase the accuracy of the measurements. (Coffin at 48, 75.) The program measures the hybrids in terms of how many base pairs each hybrid contains and then produces for each hybrid a measurement called a "base pair value." The base pair values are framed with matching windows; if the windows overlap, then the FBI declares a confirmed match. Throughout the instant proceedings there was considerable discussion regarding the FBI's matching window which revolved around whether Agent Coffin correctly represented the span of the FBI's window and whether the FBI's window is so wide as to prejudice the defendant. (Shields at 29-31, 93-94, 115-121, 140-41, 148-49.) As to the latter issue, Shields stated that a statistically acceptable window should have a span of 1.75 percent. (Shields at 175.) But while these issues point out the potential for statistical error in the use of the matching window, for the purposes of this case, these issues are moot. When the FBI compared the base pair values of the hybrids using the computer imaging program, none of the measurements varied more than approximately 1.4 percent. (Coffin at 50; Shields at 111.) All the experts agree that because the maximum variance was 1.4 percent, no statistical error resulted from the measurement in this case. (Shields at 30-31, 111.) Since there is no dispute over this point, further analysis is unnecessary.
D. Bin Collapsing. As a rule, the FBI will not base its statistical calculations on a bin that contains less than five bands, that is, a bin with a frequency that is lower than .005. (Coffin at 83.) If a given bin happens to have a frequency that is lower than .005, the FBI adjusts the bin to conform to the rule. (Coffin at 83.) The FBI makes the adjustment by "collapsing" bins. In "bin collapsing," the FBI combines a bin with a frequency lower than .005 with a neighboring bin to create a single bin with a total frequency that is higher than .005. (Coffin at 83.) For example, if two neighboring bins have respective frequencies of .002 and .038, the FBI will collapse the two bins to create a bin with the frequency of .040. (Coffin at 83.) Since the resultant bin's frequency is greater than .005 it can be used for the FBI's calculations.
Bin collapsing favors those suspects whose bands, without bin collapsing, would fall into bins with frequencies that are lower than .005. (Coffin at 83.) When a suspect's band falls into a low frequency bin, that means that that suspect's band occurs less frequently in the population than a band that falls into a higher frequency bin. A suspect would prefer that his bands fall into higher frequency bins because that would reflect that his bands occur more frequently in the population and would therefore make him less distinctive. Bin collapsing permits every suspect to avoid the dubious distinction of having bands that are assigned frequencies that are lower than .005. Referring to the above example, a suspect whose band would otherwise fall into the .002 bin will be assigned a .040 frequency for that band. The suspect will therefore benefit because his band will appear to occur twenty times more frequently in the population than it would have appeared had the bins not been collapsed. (See Coffin at 83.)
E. Binning the Bands. In a given population database, there are more bands than there are bins. (See Kidd at 47.) Moreover, most bands have a base pair value that is distinctive. Any given bin of the database *1070 must therefore contain bands that have base pair values that vary from one another. (See Kidd at 51.) This fact reflects the FBI's decision to "bin" each of the suspect's bands with bands that have comparable, but nonetheless different, base pair values. (Kidd at 46.)[8] In so doing, the FBI treats the base pair value of the suspect's bands as being indistinguishable from the base pair values of the persons whose bands the FBI bins with the suspect's. (Kidd at 46.) Put differently, rather than assigning each band its own distinct frequency, the FBI assigns all of the bands that fall into a given bin one common frequency: the frequency of the bin. (Kidd at 47.) If the FBI assigned to each band a distinct frequency, then each band's frequency would be minuscule. Each suspect's DNA profile would therefore appear to occur infrequently in the population. But since the FBI assigns to each band the frequency of a bin, the suspect's DNA profile will appear to occur more often in his population and therefore make the suspect less distinctive in the final probability calculation. (Kidd at 47.)
F. Assigning Bin Frequencies Derived from the United States Black Database to the Bands of A Black Suspect from St. Thomas.
In this case, the FBI assigned to the bands of the defendant, a black man from St. Thomas, the bin frequencies derived from the black United States database. (Coffin at 93-94.)[9] Shields claimed that assigning bin frequencies derived from United States blacks to the DNA profile of the defendant could have produced inaccurate probability calculations that were biased against the defendant. According to Shields, the probability calculations may express defendant's DNA profile as being far rarer than it really is. (Shields at 73.) Shields could not be certain about this claim because a black St. Thomas database has yet to be constructed. (See Shields at 74.) Shield's opinion was in part an extrapolation based on a study, called "VNTR Population Data, A Worldwide Study (`Worldwide Study')," which is a compilation of data reflecting bin frequencies on population "substructures" around the world. (See Shields 49-50.)
A racial population has "substructures" when it is composed of constituent groups that have bin frequencies that are distinct from the bin frequencies of its other constituent groups. For example, the black populations of St. Thomas and of the United States may be substructures of the black race. (See Shields at 57.) The authors of the Worldwide Study found that its estimations of differences in bin frequencies between various population substructures are statistically significant, meaning that there is at least a 95 percent probability that the Study's author's are correct in stating that bin frequencies were different. (Budowle at 2; Shields at 56.) This does not mean, however, that the differences in the bin frequencies themselves are necessarily large, it simply means that there is a very good chance that they are different.
If the differences in bin frequencies between two substructures are large, then the potential for error here is clear. The FBI applied the bin frequencies of the United States black population to the defendant's bands. If the bin frequencies between the St. Thomas population's substructure and that of the United States black population are sufficiently large, then the appearance of how often defendant's bands occur in the defendant's population substructure would be distorted. (See Shields at 48-49.) For example, hypothetically, if bin five of the United States database were to have the frequency of .001 while bin five's frequency is .1 in St. Thomas's database, if one existed, the defendant would be prejudiced if the FBI assigned to his band the frequency of bin five from the United States database, 001. The number .001 is one hundred times smaller than .1. The defendant would be prejudiced because his band would appear to be one hundred times less common, which would *1071 make his DNA profile appear to be much more distinctive than it would be in his own substructure.
Shields opined that, until a database for St. Thomas is constructed, it will not be known whether the differences in bin frequencies between United States blacks and St. Thomas blacks are so large as to create such a distortion. Shields therefore opined that the FBI's conclusion that the odds of finding a random match to the defendants DNA profile were 1 in approximately 41 million was "ludicrous" and not "scientifically defensible." (Shields at 77.)
Budowle, who played a major role in compiling the Worldwide Study, stated, however, that differences in bin frequencies "`do not have forensically significant effects on VNTRs profile frequency estimates when subgroup reference databases from within a major population group are compared.'" (Budowle at 4 quoting the Worldwide Study.) Budowle explained that this means that if a black suspect's bands were assigned the bin frequencies of black databases from Haiti, South Florida, or Michigan, then the odds in favor of finding a random match with the suspect's profile in any of these populations would be comparably small. (Budowle at 4.) For example, hypothetically, the odds favoring a match for a suspect in the Haiti database might be 1 in 35 million, in a South Florida black database the odds may be 1 in 37.5 million, and in the black Michigan database the odds may be 1 in 39 million. The odds in favor of finding a match in any of these database are so low as to make these databases virtually interchangeable. As Budowle put it, "the inference on the rarity of the profile would not change with the various estimates." (Budowle at 4 quoting the Worldwide Study.)
The obvious implication is that even though a jury will not hear exactly how rare the defendant's DNA profile is in St. Thomas' black population, the jury can infer how rare the defendant's DNA profile would be in a database that reflects that population by hearing how rare the defendant's DNA profile is in the United States black database. Given that differences in bin frequencies "`do not have forensically significant effects on VNTRs profile frequency estimates when subgroup reference databases from within a major population group are compared,'" (Budowle at 4 (quoting the Worldwide Study)), any concern that the St. Thomas' black population's bin frequencies are drastically different from those of the United States black population is unwarranted. Though the application of the United States black bin frequencies to the defendant's bands does not produce the precise odds of finding a random match in the defendant's population, the danger of error in such application is so small as to be practically nonexistent.[10]
Budowle's data and his conclusions drawn therefrom confirmed Kidd's opinion. Kidd stated that St. Thomas' "founding" black African population was diverse, (Kidd at 71); that St. Thomas' founding black population was not small but was instead "relatively large as human populations go," (Kidd at 71); that there has been caucasian mixture into the St. Thomas black population's gene pool, (Kidd at 71); that the caucasian mixture came from caucasians of diverse backgrounds, (Kidd at 71); and that the United States black population's gene pool has virtually the same attributes, though perhaps it has acquired a greater caucasian admixture (Kidd at 71). Kidd concluded from these data that St. Thomas's black population should be so genetically similar to the United States' black population that there would be little difference in the two populations' bin frequencies. (See Kidd at 71-72.) Thus, a black St. Thomas database, if it existed, would be almost identical to the black United States database. Assigning the bin frequencies of one to bands of a member of the other would, according to Kidd, not be an error. (See Kidd at 72.)
Kidd also pointed out that the black United States database is essentially an amalgam of the diverse genetic types reflected in a *1072 pure African database, along with a small caucasian admixture, (Kidd at 71-72), and that the genetic types within the black population are highly varied (Kidd at 71-72). Based on this information, Kidd did not expect any of the bins in a St. Thomas database to have a high frequency. (Kidd at 72.) One can infer from his conclusion that Kidd would not expect St. Thomas' black population's structure, like the structure of the black United States population, to be very distinctive. Though the structures of the two populations might not be identical, according to Kidd, they would be comparable. (See Kidd at 72.) In fact, in Kidd's experience of assigning bin frequencies of caucasian and hispanic databases to a Chinese person's bands, he found that the probability calculations based on each database produced similar results. (Kidd at 72.)
From this experience Kidd extrapolated that if a St. Thomas database existed, the odds of finding a random match with defendant's DNA profile would not be far different from the odds of finding a random match in the black United States population, 1 in approximately 41 million. (See Kidd at 72; Gov.['s] ex. 8.) When the odds are so low, according to Kidd, "the numbers are not really different." (Kidd at 72.) Put differently, although the odds of finding a random match with defendant's DNA profile might differ between the United States black database and a St. Thomas black database, due to the low likelihood of a random match in either database, the difference is negligible. (See Kidd at 72.) Kidd summarized his opinion by stating that "I am convinced that I am reasonably so and on very solid grounds, there is no way the black Virgin Islands population could be genetically extremely different from either the U.S. mainland blacks or blacks in Africa." (Kidd at 86.) Consequently, according to Kidd, it was scientifically reliable for the FBI to assign to the defendant's bands the bin frequencies of the United States black population. Though Kidd would be more certain about this conclusion if a black St. Thomas database existed, he did not think a black St. Thomas database was necessary to support his conclusion. (See Kidd at 87.)
Kidd contrasted the genetic makeup of the United States and St. Thomas populations with that of small, isolated tribes. (See Kidd at 58.) Certain tribes, living in places like the Amazon and Paupau, New Guinea, have small communities with languages that are mutually unintelligible. (Kidd at 58.) Such tribes live in small geographic areas and rarely marry or breed out of the tribe. (Kidd at 58.) Due to inbreeding in isolation for generations, certain tribes have "pronounced" substructuring. (See Kidd at 58.) Kidd cautioned against assigning the bin frequencies of these tribes to the bands of a suspect from the United States or St. Thomas. If the FBI did so, Kidd suggested, the suspect's DNA profile might appear to be much rarer than it would be if the profile were assigned the frequencies of the United States or St. Thomas populations. (See Kidd at 58.) But Kidd rejected the notion that the degree of substructuring in these tribes exists in the black population of St. Thomas. (Kidd at 58-59.)
Shields attempted to bolster his position with data reflecting bin frequencies of various island populations and of native american tribes living on reservations. Shields showed data comparing the bin frequencies of the population of the Northern Mariana Islands and of the New Zealand Maori tribe to the bin frequencies of United States blacks and caucasians. (Shields at 66.) The data established that there are huge frequency differences between the island populations and the United States populations. (Shield at 67.) Shields stated therefore that "[i]sland populations are different." (Shields at 67.) By Shields' admission, however, unlike United States blacks and caucasians, the Maori and the populations of the Northern Marianas are members of the "micronesian race." (Shields at 66.) The differences in bin frequencies therefore may not be attributable to the fact that the Maori and the Northern Mariana populations live on islands, the differences may be attributable to the fact that the Maori and the Northern Mariana populations belong to a race that is neither caucasian nor black.
Shields also drew on data concerning the bin frequencies of numerous Native American *1073 tribes to support his argument. (Shields at 61, 69-73.) The reason Shields focused on these data was that, according to Shields, "Native Americans, even though they don't live on islands, act as if they do." (Shields at 62.) Native American behavior, according to Shields, mirrors the behavior of island populations, specifically St. Thomas, in that generally, Native Americans on reservations have a higher probability of mating with other Native Americans than they do with non-Native Americans, and specifically, a higher probability of mating within their tribes than they do with members of other tribes. (Shields at 62.) Consequently, members of the tribes tend to all have "common ancestry." (Shields at 62.) This behavior results in highly structured populations, meaning that the bin frequencies of Native American tribes are very distinctive. (See Shields at 62.) Thus according to Shields, many members of Native American tribes share the same rung sequences and VNTRs lengths with other members of their respective tribes. Shields opined that the same is true of island populations such as St. Thomas.
A comparison between the populations of Native American reservations and of St. Thomas may not be entirely apt. It is reasonable for one to assume that the founding population of each Native American reservation was composed principally of members of the tribe for whom the reservation land was allotted. In contrast, the founding population of St. Thomas' black population was composed of genetically diverse members who came from different countries in Africa. (Kidd at 59.) Given this fundamental difference in the origins of the various Native American tribal populations and that of St. Thomas' black population, Shields' suggestion that the latter may be as highly structured as the former is not persuasive.
A careful review of the known and potential errors that may arise from the FBI's DNA profiling procedure and of the standards and controls governing the process leads the court to conclude that this Daubert factor weighs in favor of admitting the government's evidence.
4. The Degree to Which DNA Profiling Is Accepted By A Relevant Scientific Community.
The FBI's DNA profiling protocol is not the only way to produce a DNA profile. (Coffin at 112.) Different laboratories use different protocols, each with standards, controls and components that are defined by the system used. (Coffin at 112.) For example, Canada's Royal Mounted Police ("RMP") uses a matching window with a different span. (Coffin at 113.) This, however, does not mean that the RMP rejects the reliability of the FBI's matching window; rather, it means that since the RMP use a different protocol, they must use a different window. (Coffin at 113.) As Agent Coffin put it, "It's much like driving to Point A to Point B in a Cadillac and a Ford. You can't exchange the parts on them, both give the same results." (Coffin at 131.)
Kidd testified that each year his laboratory does several hundred thousand DNA profiles in a manner that is similar to the way it is done by the FBI. (Kidd at 19.) Additionally, thousands of laboratories around the world do the same on a daily basis. (Kidd at 19.) Furthermore, hundreds of scientific papers related to the process are published each month. (Kidd at 20.) Kidd testified that this process is generally accepted in the community of molecular biologists and related scientific communities. (Kidd at 20.) Moreover, a scientist, whose name the record does not disclose, teaches a laboratory course in which undergraduates use the process to find new polymorphisms. (Kidd at 21.) Kidd stated that the FBI, in performing DNA profiling, uses "standard methodologies that are broadly used in the scientific community. They are doing it correctly." (Kidd at 24-25.)
The court finds that the degree to which the FBI's DNA profiling procedure is accepted in relevant scientific communities is high. This finding weighs in favor of admitting the DNA evidence in this case.
CONCLUSION
Applying the teachings of Daubert to the FBI's DNA profiling process leads the court *1074 to conclude that the process is both relevant and reliable. The information regarding the FBI's DNA profiling protocol as well as the results thereof are admissible as evidence.
APPENDIX
NOTES
[1] Twenty-two of the twenty-three chromosomes are numbered from one to twenty-two, where chromosome-one is the largest and chromosome-twenty-two is the smallest. The twenty-third chromosome, which determines each person's sex, is not given a number.
[2] While this diagram represents the difference in VNTRs lengths between different persons, it also represents the difference in VNTRs lengths within a single person. A single person has two "versions" of any given VNTRs. Each version comes from the same pair of chromosomes, one from each of the chromosomes of the chromosomal pair. (See Kidd at 18.) The two versions are identical in that they have the same core sequence, but the two are different in length. Understanding that each person has two versions of each VNTRs is important to understanding certain aspects of the DNA profiling process.
[3] Specifically, semen can be distinguished by its component of "antigen," which is "secreted by the prostate gland and occurs in semen." (Coffin at 62.)
[4] This diagram was submitted as an exhibit by the government and the government's witness interpreted it for the record. The court notes, however, that in fact the FBI uses four size markers, represented by four reference columns on the autorad. (See Gov.['s] ex. 9B.) This difference in the number of size markers, for the purposes of explaining an autorad, is immaterial.
[5] Sometimes, however, only one band appears. This can occur due to one of three reasons. First, the length of each VNTRs from each chromosome of the chromosomal pair could be equal. In such case each VNTRs hybrid will travel the same distance on the agarose plate. Consequently, each hybrid's band is superimposed on the other, thus creating the illusion of a single band. Second, the two VNTRs could be of such similar lengths that on the agarose gel they travel almost the same distance. Consequently, each band is so close together that they appear as one band. Third, one VNTRs might be so short and therefore travel so fast that its runs off the gel. In such a case, only one VNTRs remains on the gel. Consequently, only one band appears on the autorad. (Coffin at 37-38.)
[6] For example, if Peter throws a pair of balanced dice, since there are a total of 36 possible combinations, there is probability 6/36 = 1/6 that the total number of dots showing is seven because there are six possible dice combinations that produce seven. If Paul also throws a pair, there is probability 4/36 = 1/9 that his total is five because there are four possible dice combinations that produce five. "The probability that Peter throws a seven and that Paul throws a five is the product 1/6 × 1/9 = 1/54, and this is because the two events in question are independent." (25 Encyclopedia Britannica 34 (15th ed. 1985) (emphasis in original).)
[7] All items on the agarose plate: the DNA fragments, the size markers, and the cell line control, react to the electrical charge in the same manner, that is, they move toward the positive pole. (Coffin at 33.) Like the hybrids, the size markers and cell line control are also labeled with phosphorous-32 so that the their locations after the electrophoresis will appear on the autoradiogram. (See Ex. 10 at 15.)
[8] The FBI bins bands that have similar base pair values together in order to compensate for the imprecision and uncertainties of the base pair measurements produced by the computer imaging program. (See Kidd at 46.)
[9] The FBI also assigned to the defendant's bands the bin frequencies of the United States hispanic, caucasian, and native american databases. Defense counsel raised no argument with regard to these.
[10] Accordingly, the court needs not decide whether the "ceiling principle method" should be admitted. The ceiling principle method is a mathematical formula advocated by Shields as an adequate means to remedy what he perceives to be a statistical error in applying the United States black population's bin frequencies to the suspect's bands. (See Shields at 72.)