Some of these definitions were obtained from other more comprehensive glossary's available
analogs -- Genes or proteins that have common activity/function but not common origin. They are therefore not homologous. The implication is that analogous proteins followed evolutionary pathways from different origins to converge upon the same activity. Thus, analogous genes or proteins are considered a product of convergent evolution.
antigen--- A foreign macromolecule that does not belong to the host organism and that elicits an immune response.
basal group -- In evolutionary theory: The smaller of two sister groups; often used as the outgroup for a study of the larger clade.
BLAST - Basic Local Alignment Search Tool - a set of similarity search programs used identify sequences in a database that share similarity to your query sequence. This method uses a heuristic algorithm which seeks local as opposed to global alignments and can therefore identify isolated regions of similarity. For more information see NCBI's BLAST overview, BLAST 2.0 information, the BLAST manual, Altschul et al., 1990, a list of further references, and the brief description of each of the BLAST programs.
BLASTP, BLASTX, TBLASTN etc... - see NCBI's summary of the different BLAST programs and the BLAST definition shown above for more information.
bootstrapping -- In phylogenetic analysis: Perform analysis x number of times, with subtle randomization of the input data. The the number of times a particular branch is formed in the tree (out of the x times) can be used to estimate its probability, which can be indicated on a consensus tree.
character -- Heritable trait possessed
by an organism; characters are usually described in terms of their states,
for example: "hair present" vs. "hair absent," where
"hair" is the character, and "present" and "absent" are its states.
clade -- Phylogeny: A monophyletic taxon; a group of organisms which includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor. From the Greek word "klados", meaning branch or twig.
cladogram -- A diagram/tree, resulting from a cladistic analysis, which depicts a hypothetical branching sequence of lineages leading to the taxa under consideration. The points of branching within a cladogram are called nodes. All taxa occur at the endpoints of the cladogram.
clustal alignment -- A heuristic multiple alignment as calculated by the Clustal software package, which employs alignment along a tree. The tree is determined from the scores of pairwise alignments of the sequences.
convergence -- Similarities which have arisen independently in two or more organisms that are not closely related. Contrast with homology.
COGs - Clusters of orthologous groups. COGs are groups of related protein sequences that are present in at least 3 phylogenetic lineages ( 21 complete genomes, representing 17 major phylogenetic lineages have been used for the analysis so far). Each COG corresponds to an ancient conserved domain (since it must be present in at least 3 of the deeply branching phylogenetic lineages. See the COGs web site for more details.
crown group -- All the taxa descended from a major cladogenesis event, recognized by possessing the clade's synapomorphy. See: stem group.
contig - A contiguous region of DNA sequence constructed by aligning many sequence "reads" (one "read" is the data generated from one sequencing reaction).
distance matrix -- Matrix which visualizes the calculation of the score of an optimal pairwise alignment (i.e. distance) of sequences or other attributes. Distances are small for similar sequences. The Protdist program we use computes a distance measure for protein sequences, using maximum likelihood estimates based on a particular matrix that indicates the significance of certain amino acid changes.
endosymbiosis -- When one organism
takes up permanent residence within another, such that the two become a
single functional organism. Mitochondria and
plastids are believed to have resulted from endosymbiosis.
evolutionary tree -- A diagram which
depicts the hypothetical phylogeny of the taxa under consideration. The
points at which lineages split represent ancestor taxa
to the descendant taxa appearing at the terminal points of the cladogram.
Expect value (BLAST Expect value) - Sometimes referred to as a probability value. Estimates the statistical significance of a sequence match, specifying the number of matches, with a given score, that are expected in a search of a database of the given size by chance. An Expect value of two, with a given score, would indicate that two matches with this score, are expected purely by chance. The Expect value is often set at a certain threshold for reporting matches against database sequences.
facultative --- adjective indicating a bacterium is able to grow in either the presence or absence of an environmental factor (i.e. for oxygen, facultative anaerobe; for growth inside cells, facultative intracellular)
filtering - Masks off segments of your BLAST query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993). The segments are replaced with XXXXX's or NNNNN's, as viewed in your BLAST output. For more information, see NCBI's description of filtering.
fomite --- Inanimate object that when contaminated with a viable pathogen, can transfer the pathogen to a host.
gene - Two different subtleties to the definitions, depending on whether you are refering to prokaryotic or eukaryotic genes! In both cases it is a unit of heredity, however in eukaryotes this unit may include both the protein coding region, and RNA coding region of a DNA sequence. In prokaryotes, a gene is refers only to the protein coding region, because multiple genes may be expressed from a single RNA molecule (an operon).
heterologs -- Heterologs differ in both origin and activity. Genes that are "unique" in activity and sequence are said to be heterologous. Note that genes initially defined as heterologous by syntax (letter matching) may actually be homologous by activity.
hit or hits - Sequence(s) in a database that is (are) found to be similar to a given query sequence - also used as a verb.
homology -- Homologs have common
origins but may or may not have common activity. Genes that share an arbitrary
threshold level of similarity determined by alignment of matching bases
are termed homologous. They are inherited from a common ancestor which
possessed the structure. This may be difficult to
determine when the structure has been modified through descent. Note:
homology is a qualitative term (something is either homologous or not),
while similarity is the corresponding quantitative term. Therefore, it
is more appropriate to refer to xx % similarity (i.e. never xx % homology).
hypothesis -- A concept or idea that can be falsified by various scientific methods. (I just put this in to remind everyone including myself :) )
ingroup -- In a cladistic analysis, the set of taxa which are hypothesized to be more closely related to each other than any are to the outgroup.
in silico - In the computer, computer generated.
in vitro - Outside a living organism.
in vivo - In the body, in a living organism.
lineage -- Any continuous line of descent; any series of organisms connected by reproduction by parent of offspring.
maximum likelihood -- Phylogenetic method that gives an estimation of the likelihood of a particular tree given a certain model of nucleotide substitution. Advantage is that it is based on a specific model of sequence evolution; gives a probability at each internode; and, the complete nucleotide sequnce is used. Disadvantage is that this method takes a long time to compute (relative to other methods).
monophyletic -- Term applied to
a group of organisms which includes the most recent common ancestor of
all of its members and all of the descendants of that
most recent common ancestor. A monophyletic group is called a clade.
neighbor-joining -- Method for deriving a tree from distances (see distance matrix). Compares number of differences in a conserved region, pairing those that are the most alike and using that pair to join to next closest sequence. Advantage for this method is it is very fast to compute, however it is not a very effective method for comparing sequences that have diverged significantly.
ORF - Open reading frame within a sequence that may be a gene, but has not yet been demonstrated as such.
orthologs - Homologs produced by speciation. Orthologs are genes derived from a common ancestor that diverged due to divergence of the organisms they are associated with. They tend to have similar function.
outgroup -- In a cladistic analysis,
any taxon used to help resolve the polarity of characters, and which is
hypothesized to be less closely related to each of the taxa
under consideration than any are to each other.
paralogs - Homologs produced by gene duplication. Paralogs are genes derived from a common ancestral gene that duplicated within an organism and then subsequently diverged. They tend to have differing functions.
paraphyletic -- Term applied to
a group of organisms which includes the most recent common ancestor of
all of its members, but not all of the descendants of that
most recent common ancestor.
parsimony -- Refers to a rule used
to choose among possible trees, which states that the tree implying the
least number of changes in character states
is the best. This method can be very missleading, especially if "simple"
parsimony is used.
phylogenetics -- Field of biology that studies the evolutionary relationships between organisms. It includes the discovery of these relationships, and the study of the causes behind this pattern.
phylogeny -- The evolutionary relationships
among organisms; the patterns of lineage branching produced by the true
evolutionary history of the organisms being
considered.
polyphyletic -- Term applied to
a group of organisms which does not include the most recent common ancestor
of those organisms; the ancestor does not possess
the character shared by members of the group.
probability (for BLAST analyses) - see Expect value.
query - For BLAST analyses (and many other analyses), this refers to the sequence you are using to perform your search of a database.
rank -- In traditional taxonomy,
taxa are ranked according to their level of inclusiveness. Thus a genus
contains one or more species, a family includes one or
more genera, and so on.
selection -- Process which favors
one feature of organisms in a population over another feature found in
the population. This occurs through differential
reproduction -- those with the favored feature produce more offspring
than those with the other feature, such that they become a greater percentage
of the
population in the next generation.
score (BLAST score) - The score in a BLAST output is usually given in 'bits'. The bit score is defined as: S' (bits) = [lambda * S (raw) - ln K] / ln 2 where lambda and K are Karlin-Altschul parameters. The expression of the score in terms of bits makes it independent of the scoring system used (i.e., which matrix). A more intuitive way to rank results involves the use of the Expect value (see above definition).
sister group -- Evolutionary theory: The two clades resulting from the splitting of a single lineage.
stem group -- All the taxa in a
clade preceding a major cladogenesis event. They are often difficult to
recognize because they may not possess synapomorpies
found in the crown group.
subject - For BLAST analyses, this refers to the sequence in the database that shares similarity to your query sequence.
systematics -- Field of biology that deals with the diversity of life. Systematics is usually divided into the two areas of phylogenetics and taxonomy.
taxon -- Any named group of organisms, not necessarily a clade.
taxonomy -- The science of naming and classifying organisms.
transfection --- Means two different things to eukaryotic and prokaryotic geneticists! First used for the transformation of prokaryotic cells by protein-free DNA or RNA from viruses. Also refers to the process of genetic transformation in eukaryotic cells.
transformation - A process by which the genetic material carried by an individual cell is altered by incorporation of exogenous DNA into its genome.
vector --- In epidemiology, this refers to an agent, usually insect or animal, able to carry pathogens from one organism to another. In genetics, this is a genetic alement able to incorporate DNA and be replicated in a cell.
word size threshold (BLAST word size) - This refers to the neighborhood word score threshold (Altschul et al., 1990). A critical part of the process involved in the BLAST search method, the initial word hits act as seeds for initiating searches to find longer regions of similarity. A higher threshold means that larger words are allowed to seed the search for finding regions of similarity. The larger this word is allowed to become, the faster your search, however, the accuracy of your search will be lower.
xenologs -- Homologs resulting from horizontal gene transfer. The determination of whether a gene of interest was recently transferred into the current host by horizontal gene transfer is frequently non-trivial. Occasionally the %G+C content may be so vastly different from the average gene in the current host that a conclusion of external origin is nearly inescapable, however often it is unclear whether a gene has horizontal origins.