Difference between revisions of "Summary class Geromics 2024 HyoungJinChoi"

Latest revision as of 12:58, 10 May 2024

Main Page » UNIST Geromics course » Geromics Course Students Folder 2024 » HyoungJinChoi 2024 Geromics Course » Summary class Geromics 2024 HyoungJinCho

2024.03.06

orientation Geromics

2024.03.08

What is theory?

A theory is a rational type of abstract thinking about a phenomenon, or the results of such thinking. The process of contemplative and rational thinking is often associated with such processes as observational study or research. Theories may be scientific, belong to a non-scientific discipline, or no discipline at all. Depending on the context, a theory's assertions might, for example, include generalized explanations of how nature works. The word has its roots in ancient Greek, but in modern use it has taken on several related meanings.

In modern science, the term "theory" refers to scientific theories, a well-confirmed type of explanation of nature, made in a way consistent with the scientific method, and fulfilling the criteria required by modern science. Such theories are described in such a way that scientific tests should be able to provide empirical support for it, or empirical contradiction ("falsify") of it. Scientific theories are the most reliable, rigorous, and comprehensive form of scientific knowledge,^[1] in contrast to more common uses of the word "theory" that imply that something is unproven or speculative (which in formal terms is better characterized by the word hypothesis).^[2] Scientific theories are distinguished from hypotheses, which are individual empirically testable conjectures, and from scientific laws, which are descriptive accounts of the way nature behaves under certain conditions.

Theories guide the enterprise of finding facts rather than of reaching goals, and are neutral concerning alternatives among values.^[3]^: 131 A theory can be a body of knowledge, which may or may not be associated with particular explanatory models. To theorize is to develop this body of knowledge.^[4]^: 46

The word theory or "in theory" is sometimes used outside of science to refer to something which the speaker did not experience or test before.^[5] In science, this same concept is referred to as a hypothesis, and the word "hypothetically" is used both inside and outside of science. In its usage outside of science, the word "theory" is very often contrasted to "practice" (from Greek praxis, πρᾶξις) a Greek term for doing, which is opposed to theory.^[6] A "classical example" of the distinction between "theoretical" and "practical" uses the discipline of medicine: medical theory involves trying to understand the causes and nature of health and sickness, while the practical side of medicine is trying to make people healthy. These two things are related but can be independent, because it is possible to research health and sickness without curing specific patients, and it is possible to cure a patient without knowing how the cure worked.^[a]

full text link : https://en.wikipedia.org/wiki/Theory

2024.03.22

--
Prepare class Before you attend this week's lecture, I would like to encourage you to watch the following YouTube video:

Title: Mitochondrial Regulation of Stem Cell Aging
Presenter: Danica Chen, PhD (University of California, Berkeley, USA)
YouTube Link: https://www.youtube.com/watch?v=FoJWmaT1ptM

In this video, Professor Danica Chen discusses various methods to protect mitochondria and reverse stem cell aging by Sirtuins.
It's an insightful presentation that will undoubtedly enrich our understanding of the topic before our lecture.
--

Mitochondrial Stress is a Driver of Stem Cell Aging

Mitochondrial stress increases in stem cell during aging
Mitochondrial dysfunction and aging produces similar defects in stem cells
Stem cells do not age at the same rate; about one third od chronologically aged HSCs exhibit regeberative function similar to healthy young HSCs, coinciding with the health of mitochondria.

Stem cell

In multicellular organisms, stem cells are undifferentiated or partially differentiated cells that can change into various types of cells and proliferate indefinitely to produce more of the same stem cell. They are the earliest type of cell in a cell lineage.^[1] They are found in both embryonic and adult organisms, but they have slightly different properties in each. They are usually distinguished from progenitor cells, which cannot divide indefinitely, and precursor or blast cells, which are usually committed to differentiating into one cell type.

In mammals, roughly 50 to 150 cells make up the inner cell mass during the blastocyst stage of embryonic development, around days 5–14. These have stem-cell capability. In vivo, they eventually differentiate into all of the body's cell types (making them pluripotent). This process starts with the differentiation into the three germ layers – the ectoderm, mesoderm and endoderm – at the gastrulation stage. However, when they are isolated and cultured in vitro, they can be kept in the stem-cell stage and are known as embryonic stem cells (ESCs).

Adult stem cells are found in a few select locations in the body, known as niches, such as those in the bone marrow or gonads. They exist to replenish rapidly lost cell types and are multipotent or unipotent, meaning they only differentiate into a few cell types or one type of cell. In mammals, they include, among others, hematopoietic stem cells, which replenish blood and immune cells, basal cells, which maintain the skin epithelium, and mesenchymal stem cells, which maintain bone, cartilage, muscle and fat cells. Adult stem cells are a small minority of cells; they are vastly outnumbered by the progenitor cells and terminally differentiated cells that they differentiate into.^[1]

Research into stem cells grew out of findings by Canadian biologists Ernest McCulloch, James Till and Andrew J. Becker at the University of Toronto and the Ontario Cancer Institute in the 1960s.^[2]^[3] As of 2016, the only established medical therapy using stem cells is hematopoietic stem cell transplantation,^[4] first performed in 1958 by French oncologist Georges Mathé. Since 1998 however, it has been possible to culture and differentiate human embryonic stem cells (in stem-cell lines). The process of isolating these cells has been controversial, because it typically results in the destruction of the embryo. Sources for isolating ESCs have been restricted in some European countries and Canada, but others such as the UK and China have promoted the research.^[5] Somatic cell nuclear transfer is a cloning method that can be used to create a cloned embryo for the use of its embryonic stem cells in stem cell therapy.^[6] In 2006, a Japanese team led by Shinya Yamanaka discovered a method to convert mature body cells back into stem cells. These were termed induced pluripotent stem cells (iPSCs).^[7]

full txt link : https://en.wikipedia.org/wiki/Stem_cell

How does the total amount of stem cells in humans change over time?
At what age does it reach its maximum and minimum?
When fertilization occurs, one: start life, two 120 years.

When certain data points are plotted, it seems feasible to converge through statistical methods (considering the number of inflection points).
Are there any papers related to the number of stem cells at various ages in a particular sample?
> No results found in the initial search. (Only use 14 min)

2024.03.29

Occupations with high life expectancy ?

full txt link : https://www.hani.co.kr/arti/society/rights/471412.html

2024.04.05

DNA

Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ ^ⓘ;^[1] DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.

The two DNA strands are known as polynucleotides as they are composed of simpler monomeric units called nucleotides.^[2]^[3] Each nucleotide is composed of one of four nitrogen-containing nucleobases (cytosine [C], guanine [G], adenine [A] or thymine [T]), a sugar called deoxyribose, and a phosphate group. The nucleotides are joined to one another in a chain by covalent bonds (known as the phosphodiester linkage) between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules (A with T and C with G), with hydrogen bonds to make double-stranded DNA. The complementary nitrogenous bases are divided into two groups, the single-ringed pyrimidines and the double-ringed purines. In DNA, the pyrimidines are thymine and cytosine; the purines are adenine and guanine.

Both strands of double-stranded DNA store the same biological information. This information is replicated when the two strands separate. A large part of DNA (more than 98% for humans) is non-coding, meaning that these sections do not serve as patterns for protein sequences. The two strands of DNA run in opposite directions to each other and are thus antiparallel. Attached to each sugar is one of four types of nucleobases (or bases). It is the sequence of these four nucleobases along the backbone that encodes genetic information. RNA strands are created using DNA strands as a template in a process called transcription, where DNA bases are exchanged for their corresponding bases except in the case of thymine (T), for which RNA substitutes uracil (U).^[4] Under the genetic code, these RNA strands specify the sequence of amino acids within proteins in a process called translation.

Within eukaryotic cells, DNA is organized into long structures called chromosomes. Before typical cell division, these chromosomes are duplicated in the process of DNA replication, providing a complete set of chromosomes for each daughter cell. Eukaryotic organisms (animals, plants, fungi and protists) store most of their DNA inside the cell nucleus as nuclear DNA, and some in the mitochondria as mitochondrial DNA or in chloroplasts as chloroplast DNA.^[5] In contrast, prokaryotes (bacteria and archaea) store their DNA only in the cytoplasm, in circular chromosomes. Within eukaryotic chromosomes, chromatin proteins, such as histones, compact and organize DNA. These compacting structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed.

full text link : https://en.wikipedia.org/wiki/DNA

RNA

Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyribonucleic acid (DNA) are nucleic acids. The nucleic acids constitute one of the four major macromolecules essential for all known forms of life. RNA is assembled as a chain of nucleotides. Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the nitrogenous bases of guanine, uracil, adenine, and cytosine, denoted by the letters G, U, A, and C) that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.

Some RNA molecules play an active role within cells by catalyzing biological reactions, controlling gene expression, or sensing and communicating responses to cellular signals. One of these active processes is protein synthesis, a universal function in which RNA molecules direct the synthesis of proteins on ribosomes. This process uses transfer RNA (tRNA) molecules to deliver amino acids to the ribosome, where ribosomal RNA (rRNA) then links amino acids together to form coded proteins.

It has become widely accepted in science^[1] that early in the history of life on Earth, prior to the evolution of DNA and possibly of protein-based enzymes as well, an "RNA world" existed in which RNA served as both living organisms' storage method for genetic information—a role fulfilled today by DNA, except in the case of RNA viruses—and potentially performed catalytic functions in cells—a function performed today by protein enzymes, with the notable and important exception of the ribosome, which is a ribozyme.

Full text link : https://en.wikipedia.org/wiki/RNA

eQTL

Distant and local, trans- and cis-eQTLs, respectively

An expression quantitative trait is an amount of an mRNA transcript or a protein. These are usually the product of a single gene with a specific chromosomal location. This distinguishes expression quantitative traits from most complex traits, which are not the product of the expression of a single gene. Chromosomal loci that explain variance in expression traits are called eQTLs. eQTLs located near the gene-of-origin (gene which produces the transcript or protein) are referred to as local eQTLs or cis-eQTLs. By contrast, those located distant from their gene of origin, often on different chromosomes, are referred to as distant eQTLs or trans-eQTLs.^[3] ^[4] The first genome-wide study of gene expression was carried out in yeast and published in 2002.^[5] The initial wave of eQTL studies employed microarrays to measure genome-wide gene expression; more recent studies have employed massively parallel RNA sequencing. Many expression QTL studies were performed in plants and animals, including humans,^[6] non-human primates^[7]^[8] and mice.^[9]

Some cis eQTLs are detected in many tissue types but the majority of trans eQTLs are tissue-dependent (dynamic).^[10] eQTLs may act in cis (locally) or trans (at a distance) to a gene.^[11] The abundance of a gene transcript is directly modified by polymorphism in regulatory elements. Consequently, transcript abundance might be considered as a quantitative trait that can be mapped with considerable power. These have been named expression QTLs (eQTLs).^[12] The combination of whole-genome genetic association studies and the measurement of global gene expression allows the systematic identification of eQTLs. By assaying gene expression and genetic variation simultaneously on a genome-wide basis in a large number of individuals, statistical genetic methods can be used to map the genetic factors that underpin individual differences in quantitative levels of expression of many thousands of transcripts.^[13] Studies have shown that single nucleotide polymorphisms (SNPs) reproducibly associated with complex disorders ^[14] as well as certain pharmacologic phenotypes ^[15] are found to be significantly enriched for eQTLs, relative to frequency-matched control SNPs. The integration of eQTLs with GWAS has led to development of the transcriptome-wide association study (TWAS) methodology.^[16]^[17]

Detecting eQTLs

Mapping eQTLs is done using standard QTL mapping methods that test the linkage between variation in expression and genetic polymorphisms. The only considerable difference is that eQTL studies can involve a million or more expression microtraits. Standard gene mapping software packages can be used, although it is often faster to use custom code such as QTL Reaper or the web-based eQTL mapping system GeneNetwork. GeneNetwork hosts many large eQTL mapping data sets and provide access to fast algorithms to map single loci and epistatic interactions. As is true in all QTL mapping studies, the final steps in defining DNA variants that cause variation in traits are usually difficult and require a second round of experimentation. This is especially the case for trans eQTLs that do not benefit from the strong prior probability that relevant variants are in the immediate vicinity of the parent gene. Statistical, graphical, and bioinformatic methods are used to evaluate positional candidate genes and entire systems of interactions.^[18]^[19] The development of single cell technologies, and parallel advances in statistical methods has made it possible to define even subtle changes in eQTLs as cell-states change.^[20]^[21]

Full text link : https://en.wikipedia.org/wiki/Expression_quantitative_trait_loci

2024.04.12

Proteomics

Proteomics is the large-scale study of proteins.^[1]^[2] Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

The proteome is the entire set of proteins produced or modified by an organism or system. Proteomics enables the identification of ever-increasing numbers of proteins. This varies with time and distinct requirements, or stresses, that a cell or organism undergoes.^[3]

Proteomics is an interdisciplinary domain that has benefited greatly from the genetic information of various genome projects, including the Human Genome Project.^[4] It covers the exploration of proteomes from the overall level of protein composition, structure, and activity, and is an important component of functional genomics.

Proteomics generally denotes the large-scale experimental analysis of proteins and proteomes, but often refers specifically to protein purification and mass spectrometry. Indeed, mass spectrometry is the most powerful method for analysis of proteomes, both in large samples composed of millions of cells^[5] and in single cells.^[6]^[7]

Full text link : https://en.wikipedia.org/wiki/Proteomics

Omics

The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.^[1]

The related suffix -ome is used to address the objects of study of such fields, such as the genome, proteome or metabolome respectively. The suffix -ome as used in molecular biology refers to a totality of some sort; it is an example of a "neo-suffix" formed by abstraction from various Greek terms in -ωμα, a sequence that does not form an identifiable suffix in Greek.

Functional genomics aims at identifying the functions of as many genes as possible of a given organism. It combines different -omics techniques such as transcriptomics and proteomics with saturated mutant collections.^[2]

Full text link : https://en.wikipedia.org/wiki/Omics

-ology

An ology or -logy is a scientific discipline.

Protein

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

A linear chain of amino acid residues is called a polypeptide. A protein contains at least one long polypeptide. Short polypeptides, containing less than 20–30 residues, are rarely considered to be proteins and are commonly called peptides. The individual amino acid residues are bonded together by peptide bonds and adjacent amino acid residues. The sequence of amino acid residues in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids; but in certain organisms the genetic code can include selenocysteine and—in certain archaea—pyrrolysine. Shortly after or even during synthesis, the residues in a protein are often chemically modified by post-translational modification, which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins. Some proteins have non-peptide groups attached, which can be called prosthetic groups or cofactors. Proteins can also work together to achieve a particular function, and they often associate to form stable protein complexes.

Once formed, proteins only exist for a certain period and are then degraded and recycled by the cell's machinery through the process of protein turnover. A protein's lifespan is measured in terms of its half-life and covers a wide range. They can exist for minutes or years with an average lifespan of 1–2 days in mammalian cells. Abnormal or misfolded proteins are degraded more rapidly either due to being targeted for destruction or due to being unstable.

Like other biological macromolecules such as polysaccharides and nucleic acids, proteins are essential parts of organisms and participate in virtually every process within cells. Many proteins are enzymes that catalyse biochemical reactions and are vital to metabolism. Proteins also have structural or mechanical functions, such as actin and myosin in muscle and the proteins in the cytoskeleton, which form a system of scaffolding that maintains cell shape. Other proteins are important in cell signaling, immune responses, cell adhesion, and the cell cycle. In animals, proteins are needed in the diet to provide the essential amino acids that cannot be synthesized. Digestion breaks the proteins down for metabolic use.

Proteins may be purified from other cellular components using a variety of techniques such as ultracentrifugation, precipitation, electrophoresis, and chromatography; the advent of genetic engineering has made possible a number of methods to facilitate purification. Methods commonly used to study protein structure and function include immunohistochemistry, site-directed mutagenesis, X-ray crystallography, nuclear magnetic resonance and mass spectrometry.

PPI (Protein-Protein interaction)

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

Proteins rarely act alone as their functions tend to be regulated. Many molecular processes within a cell are carried out by molecular machines that are built from numerous protein components organized by their PPIs. These physiological interactions make up the so-called interactomics of the organism, while aberrant PPIs are the basis of multiple aggregation-related diseases, such as Creutzfeldt–Jakob and Alzheimer's diseases.

PPIs have been studied with many methods and from different perspectives: biochemistry, quantum chemistry, molecular dynamics, signal transduction, among others.^[1]^[2]^[3] All this information enables the creation of large protein interaction networks^[4] – similar to metabolic or genetic/epigenetic networks – that empower the current knowledge on biochemical cascades and molecular etiology of disease, as well as the discovery of putative protein targets of therapeutic interest.

full text link : https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction

String

in computer sciece

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements.

When a string appears literally in source code, it is known as a string literal or an anonymous string.^[1]

In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set called an alphabet.

full text link : https://en.wikipedia.org/wiki/String_(computer_science)

in structure
String is a long flexible structure made from fibers twisted together into a single strand, or from multiple such strands which are in turn twisted together. String is used to tie, bind, or hang other objects. It is also used as a material to make things, such as textiles, and in arts and crafts. String is a simple tool, and its use by humans is known to have been developed tens of thousands of years ago.^[1] In Mesoamerica, for example, string was invented some 20,000 to 30,000 years ago, and was made by twisting plant fibers together.^[1] String may also be a component in other tools, and in devices as diverse as weapons, musical instruments, and toys.

full text link : https://en.wikipedia.org/wiki/String_(structure)

2024.04.19

P-value

In null-hypothesis significance testing, the 𝑝-value^{[note 1]} is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.^[2]^[3] A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience.^[4]^[5] In 2016, the American Statistical Association (ASA) made a formal statement that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis".^[6] That said, a 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: "p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data".^[7]

In statistics, every conjecture concerning the unknown probability distribution of a collection of random variables representing the observed data 𝑋 in some study is called a statistical hypothesis. If we state one hypothesis only and the aim of the statistical test is to see whether this hypothesis is tenable, but not to investigate other specific hypotheses, then such a test is called a null hypothesis test.

As our statistical hypothesis will, by definition, state some property of the distribution, the null hypothesis is the default hypothesis under which that property does not exist. The null hypothesis is typically that some parameter (such as a correlation or a difference between means) in the populations of interest is zero. Our hypothesis might specify the probability distribution of 𝑋 precisely, or it might only specify that it belongs to some class of distributions. Often, we reduce the data to a single numerical statistic, e.g., 𝑇, whose marginal probability distribution is closely connected to a main question of interest in the study.

The p-value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic 𝑇.^{[note 2]} The lower the p-value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically significant if it allows us to reject the null hypothesis. All other things being equal, smaller p-values are taken as stronger evidence against the null hypothesis.

Loosely speaking, rejection of the null hypothesis implies that there is sufficient evidence against it.

As a particular example, if a null hypothesis states that a certain summary statistic 𝑇 follows the standard normal distribution 𝑁(0,1), then the rejection of this null hypothesis could mean that (i) the mean of 𝑇 is not 0, or (ii) the variance of 𝑇 is not 1, or (iii) 𝑇 is not normally distributed. Different tests of the same null hypothesis would be more or less sensitive to different alternatives. However, even if we do manage to reject the null hypothesis for all 3 alternatives, and even if we know that the distribution is normal and variance is 1, the null hypothesis test does not tell us which non-zero values of the mean are now most plausible. The more independent observations from the same probability distribution one has, the more accurate the test will be, and the higher the precision with which one will be able to determine the mean value and show that it is not equal to zero; but this will also increase the importance of evaluating the real-world or scientific relevance of this deviation.

full text link : https://en.wikipedia.org/wiki/P-value

Log

In mathematics, the logarithm is the inverse function to exponentiation. That means that the logarithm of a number x to the base b is the exponent to which b must be raised to produce x. For example, since 1000 = 10³, the logarithm base 10 of 1000 is 3, or log₁₀ (1000) = 3. The logarithm of x to base b is denoted as log_b (x), or without parentheses, log_b x. When the base is clear from the context or is irrelevant, such as in big O notation, it is sometimes written log x.

The logarithm base 10 is called the decimal or common logarithm and is commonly used in science and engineering. The natural logarithm has the number e ≈ 2.718 as its base; its use is widespread in mathematics and physics, because of its very simple derivative. The binary logarithm uses base 2 and is frequently used in computer science.

Logarithms were introduced by John Napier in 1614 as a means of simplifying calculations.^[1] They were rapidly adopted by navigators, scientists, engineers, surveyors, and others to perform high-accuracy computations more easily. Using logarithm tables, tedious multi-digit multiplication steps can be replaced by table look-ups and simpler addition. This is possible because the logarithm of a product is the sum of the logarithms of the factors: log𝑏⁡(𝑥𝑦)=log𝑏⁡𝑥+log𝑏⁡𝑦,

provided that b, x and y are all positive and b ≠ 1. The slide rule, also based on logarithms, allows quick calculations without tables, but at lower precision. The present-day notion of logarithms comes from Leonhard Euler, who connected them to the exponential function in the 18th century, and who also introduced the letter e as the base of natural logarithms.^[2]

Logarithmic scales reduce wide-ranging quantities to smaller scopes. For example, the decibel (dB) is a unit used to express ratio as logarithms, mostly for signal power and amplitude (of which sound pressure is a common example). In chemistry, pH is a logarithmic measure for the acidity of an aqueous solution. Logarithms are commonplace in scientific formulae, and in measurements of the complexity of algorithms and of geometric objects called fractals. They help to describe frequency ratios of musical intervals, appear in formulas counting prime numbers or approximating factorials, inform some models in psychophysics, and can aid in forensic accounting.

The concept of logarithm as the inverse of exponentiation extends to other mathematical structures as well. However, in general settings, the logarithm tends to be a multi-valued function. For example, the complex logarithm is the multi-valued inverse of the complex exponential function. Similarly, the discrete logarithm is the multi-valued inverse of the exponential function in finite groups; it has uses in public-key cryptography.

full text link : https://en.wikipedia.org/wiki/Logarithm

Likelihood

The likelihood function (often simply called the likelihood) is the joint probability mass (or probability density) of observed data viewed as a function of the parameters of a statistical model.^[1]^[2]^[3] Intuitively, the likelihood function 𝐿(𝜃∣𝑥) is the probability of observing data 𝑥 assuming 𝜃 is the actual parameter.

In maximum likelihood estimation, the arg max (over the parameter 𝜃) of the likelihood function serves as a point estimate for 𝜃, while the Fisher information (often approximated by the likelihood's Hessian matrix) indicates the estimate's precision.

In contrast, in Bayesian statistics, parameter estimates are derived from the converse of the likelihood, the so-called posterior probability, which is calculated via Bayes' rule.^[4]

The likelihood function, parameterized by a (possibly multivariate) parameter 𝜃, is usually defined differently for discrete and continuous probability distributions (a more general definition is discussed below). Given a probability density or mass function

𝑥↦𝑓(𝑥∣𝜃),

where 𝑥 is a realization of the random variable 𝑋, the likelihood function is 𝜃↦𝑓(𝑥∣𝜃),

often written
𝐿(𝜃∣𝑥).

In other words, when 𝑓(𝑥∣𝜃) is viewed as a function of 𝑥 with 𝜃 fixed, it is a probability density function, and when viewed as a function of 𝜃 with 𝑥 fixed, it is a likelihood function. In the frequentist paradigm, the notation 𝑓(𝑥∣𝜃) is often avoided and instead 𝑓(𝑥;𝜃) or 𝑓(𝑥,𝜃) are used to indicate that 𝜃 is regarded as a fixed unknown quantity rather than as a random variable being conditioned on.

The likelihood function does not specify the probability that 𝜃 is the truth, given the observed sample 𝑋=𝑥. Such an interpretation is a common error, with potentially disastrous consequences (see prosecutor's fallacy).

full text link : https://en.wikipedia.org/wiki/Likelihood_function

E-value

In statistical hypothesis testing, e-values quantify the evidence in the data against a null hypothesis (e.g., "the coin is fair", or, in a medical context, "this new treatment has no effect"). They serve as a more robust alternative to p-values, addressing some shortcomings of the latter.

In contrast to p-values, e-values can deal with optional continuation: e-values of subsequent experiments (e.g. clinical trials concerning the same treatment) may simply be multiplied to provide a new, "product" e-value that represents the evidence in the joint experiment. This works even if, as often happens in practice, the decision to perform later experiments may depend in vague, unknown ways on the data observed in earlier experiments, and it is not known beforehand how many trials will be conducted: the product e-value remains a meaningful quantity, leading to tests with Type-I error control. For this reason, e-values and their sequential extension, the e-process, are the fundamental building blocks for anytime-valid statistical methods (e.g. confidence sequences). Another advantage over p-values is that any weighted average of e-values remains an e-value, even if the individual e-values are arbitrarily dependent. This is one of the reasons why e-values have also turned out to be useful tools in multiple testing.^[1]

E-values can be interpreted in a number of different ways: first, the reciprocal of any e-value is itself a p-value, but a special, conservative one, quite different from p-values used in practice. Second, they are broad generalizations of likelihood ratios and are also related to, yet distinct from, Bayes factors. Third, they have an interpretation as bets. Finally, in a sequential context, they can also be interpreted as increments of nonnegative supermartingales. Interest in e-values has exploded since 2019, when the term 'e-value' was coined and a number of breakthrough results were achieved by several research groups. The first overview article appeared in 2023.^[2]

Let the null hypothesis 𝐻0 be given as a set of distributions for data 𝑌. Usually 𝑌=(𝑋1,…,𝑋𝜏) with each 𝑋𝑖 a single outcome and 𝜏 a fixed sample size or some stopping time. We shall refer to such 𝑌, which represent the full sequence of outcomes of a statistical experiment, as a sample or batch of outcomes. But in some cases 𝑌 may also be an unordered bag of outcomes or a single outcome.

An e-variable or e-statistic is a nonnegative random variable 𝐸=𝐸(𝑌) such that under all 𝑃∈𝐻0, its expected value is bounded by 1:

𝐸𝑃[𝐸]≤1.

The value taken by e-variable 𝐸 is called the e-value. In practice, the term e-value (a number) is often used when one is really referring to the underlying e-variable (a random variable, that is, a measurable function of the data).

full text link : https://en.wikipedia.org/wiki/E-values

2024.05.03

Tetrahymena

As a ciliated protozoan, Tetrahymena thermophila exhibits nuclear dimorphism: two types of cell nuclei. They have a bigger, non-germline macronucleus and a small, germline micronucleus in each cell at the same time and these two carry out different functions with distinct cytological and biological properties. This unique versatility allows scientists to use Tetrahymena to identify several key factors regarding gene expression and genome integrity. In addition, Tetrahymena possess hundreds of cilia and has complicated microtubule structures, making it an optimal model to illustrate the diversity and functions of microtubule arrays.

Because Tetrahymena can be grown in a large quantity in the laboratory with ease, it has been a great source for biochemical analysis for years, specifically for enzymatic activities and purification of sub-cellular components. In addition, with the advancement of genetic techniques it has become an excellent model to study the gene function in vivo. The recent sequencing of the macronucleus genome should ensure that Tetrahymena will be continuously used as a model system.

Tetrahymena thermophila exists in 7 different sexes (mating types) that can reproduce in 21 different combinations, and a single tetrahymena cannot reproduce sexually with itself. Each organism "decides" which sex it will become during mating, through a stochastic process.^[5]^[6]

Studies on Tetrahymena have contributed to several scientific milestones including:

First cell which showed synchronized division, which led to the first insights into the existence of mechanisms which control the cell cycle.^[7]
Identification and purification of the first cytoskeleton based motor protein such as dynein.^[7]
Aid in the discovery of lysosomes and peroxisomes.^[7]
Early molecular identification of somatic genome rearrangement.^[7]
Discovery of the molecular structure of telomeres, telomerase enzyme, the templating role of telomerase RNA and their roles in cellular senescence and chromosome healing (for which a Nobel Prize was won).^[7]
Nobel Prize–winning co-discovery (1989, in Chemistry) of catalytic RNA (ribozyme).^[7]^[8]
Discovery of the function of histone acetylation.^[7]
Demonstration of the roles of posttranslational modification such as acetylation and glycylation on tubulins and discovery of the enzymes responsible for some of these modifications (glutamylation)
Crystal structure of 40S ribosome in complex with its initiation factor eIF1
First demonstration that two of the "universal" stop codons, UAA and UAG, will code for the amino acid glutamine in some eukaryotes, leaving UGA as the only termination codon in these organisms.^[9]

link : https://en.wikipedia.org/wiki/Tetrahymena

telomere

A telomere (/ˈtɛləmɪər, ˈtiːlə-/; from Ancient Greek τέλος (télos) 'end', and μέρος (méros) 'part') is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes (see Sequences). Telomeres are a widespread genetic feature most commonly found in eukaryotes. In most, if not all species possessing them, they protect the terminal regions of chromosomal DNA from progressive degradation and ensure the integrity of linear chromosomes by preventing DNA repair systems from mistaking the very ends of the DNA strand for a double-strand break.

The existence of a special structure at the ends of chromosomes was independently proposed in 1938 by Hermann Joseph Muller, studying the fruit fly Drosophila melanogaster, and in 1939 by Barbara McClintock, working with maize.^[1] Muller observed that the ends of irradiated fruit fly chromosomes did not present alterations such as deletions or inversions. He hypothesized the presence of a protective cap, which he coined "telomeres", from the Greek telos (end) and meros (part).^[2]

In the early 1970s, Soviet theorist Alexei Olovnikov first recognized that chromosomes could not completely replicate their ends; this is known as the "end replication problem". Building on this, and accommodating Leonard Hayflick's idea of limited somatic cell division, Olovnikov suggested that DNA sequences are lost every time a cell replicates until the loss reaches a critical level, at which point cell division ends.^[3]^[4]^[5] According to his theory of marginotomy DNA sequences at the ends of telomeres are represented by tandem repeats, which create a buffer that determines the number of divisions that a certain cell clone can undergo. Furthermore, it was predicted that a specialized DNA polymerase (originally called a tandem-DNA-polymerase) could extend telomeres in immortal tissues such as germ line, cancer cells and stem cells. It also followed from this hypothesis that organisms with circular genome, such as bacteria, do not have the end replication problem and therefore do not age.

In 1975–1977, Elizabeth Blackburn, working as a postdoctoral fellow at Yale University with Joseph G. Gall, discovered the unusual nature of telomeres, with their simple repeated DNA sequences composing chromosome ends.^[6] Blackburn, Carol Greider, and Jack Szostak were awarded the 2009 Nobel Prize in Physiology or Medicine for the discovery of how chromosomes are protected by telomeres and the enzyme telomerase.^[7

During DNA replication, DNA polymerase cannot replicate the sequences present at the 3' ends of the parent strands. This is a consequence of its unidirectional mode of DNA synthesis: it can only attach new nucleotides to an existing 3'-end (that is, synthesis progresses 5'-3') and thus it requires a primer to initiate replication. On the leading strand (oriented 5'-3' within the replication fork), DNA-polymerase continuously replicates from the point of initiation all the way to the strand's end with the primer (made of RNA) then being excised and substituted by DNA. The lagging strand, however, is oriented 3'-5' with respect to the replication fork so continuous replication by DNA-polymerase is impossible, which necessitates discontinuous replication involving the repeated synthesis of primers further 5' of the site of initiation (see lagging strand replication). The last primer to be involved in lagging-strand replication sits near the 3'-end of the template (corresponding to the potential 5'-end of the lagging-strand). Originally it was believed that the last primer would sit at the very end of the template, thus, once removed, the DNA-polymerase that substitutes primers with DNA (DNA-Pol δ in eukaryotes)^{[note 1]} would be unable to synthesize the "replacement DNA" from the 5'-end of the lagging strand so that the template nucleotides previously paired to the last primer would not be replicated.^[8] It has since been questioned whether the last lagging strand primer is placed exactly at the 3'-end of the template and it was demonstrated that it is rather synthesized at a distance of about 70–100 nucleotides which is consistent with the finding that DNA in cultured human cell is shortened by 50–100 base pairs per cell division.^[9]

If coding sequences are degraded in this process, potentially vital genetic code would be lost. Telomeres are non-coding, repetitive sequences located at the termini of linear chromosomes to act as buffers for those coding sequences further behind. They "cap" the end-sequences and are progressively degraded in the process of DNA replication.

The "end replication problem" is exclusive to linear chromosomes as circular chromosomes do not have ends lying without reach of DNA-polymerases. Most prokaryotes, relying on circular chromosomes, accordingly do not possess telomeres.^[10] A small fraction of bacterial chromosomes (such as those in Streptomyces, Agrobacterium, and Borrelia), however, are linear and possess telomeres, which are very different from those of the eukaryotic chromosomes in structure and function. The known structures of bacterial telomeres take the form of proteins bound to the ends of linear chromosomes, or hairpin loops of single-stranded DNA at the ends of the linear chromosomes.^[11]

Telomerase

Telomerase, also called terminal transferase,^[1] is a ribonucleoprotein that adds a species-dependent telomere repeat sequence to the 3' end of telomeres. A telomere is a region of repetitive sequences at each end of the chromosomes of most eukaryotes. Telomeres protect the end of the chromosome from DNA damage or from fusion with neighbouring chromosomes. The fruit fly Drosophila melanogaster lacks telomerase, but instead uses retrotransposons to maintain telomeres.^[2]

Telomerase is a reverse transcriptase enzyme that carries its own RNA molecule (e.g., with the sequence 3′-CCCAAUCCC-5′ in Trypanosoma brucei)^[3] which is used as a template when it elongates telomeres. Telomerase is active in gametes and most cancer cells, but is normally absent in most somatic cells.

The existence of a compensatory mechanism for telomere shortening was first found by Soviet biologist Alexey Olovnikov in 1973,^[4] who also suggested the telomere hypothesis of aging and the telomere's connections to cancer and perhaps some neurodegenerative diseases.^[5]

Telomerase in the ciliate Tetrahymena was discovered by Carol W. Greider and Elizabeth Blackburn in 1984.^[6] Together with Jack W. Szostak, Greider and Blackburn were awarded the 2009 Nobel Prize in Physiology or Medicine for their discovery.^[7] Later the cryo-EM structure of telomerase was first reported in T. thermophila, to be followed a few years later by the cryo-EM structure of telomerase in humans.^[8]

The role of telomeres and telomerase in cell aging and cancer was established by scientists at biotechnology company Geron with the cloning of the RNA and catalytic components of human telomerase^[9] and the development of a polymerase chain reaction (PCR) based assay for telomerase activity called the TRAP assay, which surveys telomerase activity in multiple types of cancer.^[10]

The negative stain electron microscopy (EM) structures of human and Tetrahymena telomerases were characterized in 2013.^[11]^[12] Two years later, the first cryo-electron microscopy (cryo-EM) structure of telomerase holoenzyme (Tetrahymena) was determined.^[13] In 2018, the structure of human telomerase was determined through cryo-EM by UC Berkeley scientists.^[14]

full text link : https://en.wikipedia.org/wiki/Telomerase

DNA replicate

In molecular biology,^[1]^[2]^[3] DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule.^[4] DNA replication occurs in all living organisms acting as the most essential part of biological inheritance. This is essential for cell division during growth and repair of damaged tissues, while it also ensures that each of the new cells receives its own copy of the DNA.^[5] The cell possesses the distinctive property of division, which makes replication of DNA essential.

DNA is made up of a double helix of two complementary strands. The double helix describes the appearance of a double-stranded DNA which is thus composed of two linear strands that run opposite to each other and twist together to form.^[6] During replication, these strands are separated. Each strand of the original DNA molecule then serves as a template for the production of its counterpart, a process referred to as semiconservative replication. As a result of semi-conservative replication, the new helix will be composed of an original DNA strand as well as a newly synthesized strand.^[7] Cellular proofreading and error-checking mechanisms ensure near perfect fidelity for DNA replication.^[8]^[9]

In a cell, DNA replication begins at specific locations, or origins of replication,^[10] in the genome^[11] which contains the genetic material of an organism.^[12] Unwinding of DNA at the origin and synthesis of new strands, accommodated by an enzyme known as helicase, results in replication forks growing bi-directionally from the origin. A number of proteins are associated with the replication fork to help in the initiation and continuation of DNA synthesis. Most prominently, DNA polymerase synthesizes the new strands by adding nucleotides that complement each (template) strand. DNA replication occurs during the S-stage of interphase.^[13]

DNA replication (DNA amplification) can also be performed in vitro (artificially, outside a cell).^[14] DNA polymerases isolated from cells and artificial DNA primers can be used to start DNA synthesis at known sequences in a template DNA molecule. Polymerase chain reaction (PCR), ligase chain reaction (LCR), and transcription-mediated amplification (TMA) are examples. In March 2021, researchers reported evidence suggesting that a preliminary form of transfer RNA, a necessary component of translation, the biological synthesis of new proteins in accordance with the genetic code, could have been a replicator molecule itself in the very early development of life, or abiogenesis.^[15]^[16]

^{DNA Structure}

DNA exists as a double-stranded structure, with both strands coiled together to form the characteristic double helix. Each single strand of DNA is a chain of four types of nucleotides. Nucleotides in DNA contain a deoxyribose sugar, a phosphate, and a nucleobase. The four types of nucleotide correspond to the four nucleobases adenine, cytosine, guanine, and thymine, commonly abbreviated as A, C, G, and T. Adenine and guanine are pu^[17]rine bases, while cytosine and thymine are pyrimidines. These nucleotides form phosphodiester bonds, creating the phosphate-deoxyribose backbone of the DNA double helix with the nucleobases pointing inward (i.e., toward the opposing strand). Nucleobases are matched between strands through hydrogen bonds to form base pairs. Adenine pairs with thymine (two hydrogen bonds), and guanine pairs with cytosine (three hydrogen bonds).^[18]

DNA strands have a directionality, and the different ends of a single strand are called the "3′ (three-prime) end" and the "5′ (five-prime) end". By convention, if the base sequence of a single strand of DNA is given, the left end of the sequence is the 5′ end, while the right end of the sequence is the 3′ end. The strands of the double helix are anti-parallel, with one being 5′ to 3′, and the opposite strand 3′ to 5′. These terms refer to the carbon atom in deoxyribose to which the next phosphate in the chain attaches. Directionality has consequences in DNA synthesis, because DNA polymerase can synthesize DNA in only one direction by adding nucleotides to the 3′ end of a DNA strand.

The pairing of complementary bases in DNA (through hydrogen bonding) means that the information contained within each strand is redundant. Phosphodiester (intra-strand) bonds are stronger than hydrogen (inter-strand) bonds. The actual job of the phosphodiester bonds is where in DNA polymers connect the 5' carbon atom of one nucleotide to the 3' carbon atom of another nucleotide, while the hydrogen bonds stabilize DNA double helices across the helix axis but not in the direction of the axis.^[19] This makes it possible to separate the strands from one another. The nucleotides on a single strand can therefore be used to reconstruct nucleotides on a newly synthesized partner strand.^[20]

^{DNA polymerase}

DNA polymerases are a family of enzymes that carry out all forms of DNA replication.^[22] DNA polymerases in general cannot initiate synthesis of new strands but can only extend an existing DNA or RNA strand paired with a template strand. To begin synthesis, a short fragment of RNA, called a primer, must be created and paired with the template DNA strand.

DNA polymerase adds a new strand of DNA by extending the 3′ end of an existing nucleotide chain, adding new nucleotides matched to the template strand, one at a time, via the creation of phosphodiester bonds. The energy for this process of DNA polymerization comes from hydrolysis of the high-energy phosphate (phosphoanhydride) bonds between the three phosphates attached to each unincorporated base. Free bases with their attached phosphate groups are called nucleotides; in particular, bases with three attached phosphate groups are called nucleoside triphosphates. When a nucleotide is being added to a growing DNA strand, the formation of a phosphodiester bond between the proximal phosphate of the nucleotide to the growing chain is accompanied by hydrolysis of a high-energy phosphate bond with release of the two distal phosphate groups as a pyrophosphate. Enzymatic hydrolysis of the resulting pyrophosphate into inorganic phosphate consumes a second high-energy phosphate bond and renders the reaction effectively irreversible.^{[Note 1]}

In general, DNA polymerases are highly accurate, with an intrinsic error rate of less than one mistake for every 10⁷ nucleotides added.^[23] Some DNA polymerases can also delete nucleotides from the end of a developing strand in order to fix mismatched bases. This is known as proofreading. Finally, post-replication mismatch repair mechanisms monitor the DNA for errors, being capable of distinguishing mismatches in the newly synthesized DNA Strand from the original strand sequence. Together, these three discrimination steps enable replication fidelity of less than one mistake for every 10⁹ nucleotides added.^[23]

The rate of DNA replication in a living cell was first measured as the rate of phage T4 DNA elongation in phage-infected E. coli.^[24] During the period of exponential DNA increase at 37 °C, the rate was 749 nucleotides per second. The mutation rate per base pair per replication during phage T4 DNA synthesis is 1.7 per 10⁸.^[25]

Transposase

a cut-and-paste mechanism or a replicative mechanism, in a process known as transposition. The word "transposase" was first coined by the individuals who cloned the enzyme required for transposition of the Tn3 transposon.^[1] The existence of transposons was postulated in the late 1940s by Barbara McClintock, who was studying the inheritance of maize, but the actual molecular basis for transposition was described by later groups. McClintock discovered that some segments of chromosomes changed their position, jumping between different loci or from one chromosome to another. The repositioning of these transposons (which coded for color) allowed other genes for pigment to be expressed.^[2] Transposition in maize causes changes in color; however, in other organisms, such as bacteria, it can cause antibiotic resistance.^[2] Transposition is also important in creating genetic diversity within species and generating adaptability to changing living conditions.^[3]

Transposases are classified under EC number EC 2.7.7. Genes encoding transposases are widespread in the genomes of most organisms and are the most abundant genes known.^[4] During the course of human evolution, as much as 40% of the human genome has moved around via methods such as transposition of transposons.^[2]

Transposase Tn5

Transposase (Tnp) Tn5 is a member of the RNase superfamily of proteins which includes retroviral integrases. Tn5 can be found in Shewanella and Escherichia bacteria.^[5] The transposon codes for antibiotic resistance to kanamycin and other aminoglycoside antibiotics.^[3]^[6]

Tn5 and other transposases are notably inactive. Because DNA transposition events are inherently mutagenic, the low activity of transposases is necessary to reduce the risk of causing a fatal mutation in the host, and thus eliminating the transposable element. One of the reasons Tn5 is so unreactive is because the N- and C-termini are located in relatively close proximity to one another and tend to inhibit each other. This was elucidated by the characterization of several mutations which resulted in hyperactive forms of transposases. One such mutation, L372P, is a mutation of amino acid 372 in the Tn5 transposase. This amino acid is generally a leucine residue in the middle of an alpha helix. When this leucine is replaced with a proline residue the alpha helix is broken, introducing a conformational change to the C-terminal domain, separating it from the N-terminal domain enough to promote higher activity of the protein.^[3] The transposition of a transposon often needs only three pieces: the transposon, the transposase enzyme, and the target DNA for the insertion of the transposon.^[3] This is the case with Tn5, which uses a cut-and-paste mechanism for moving around transposons.^[3]

Tn5 and most other transposases contain a DDE motif, which is the active site that catalyzes the movement of the transposon. Aspartate-97, aspartate-188, and glutamate-326 make up the active site, which is a triad of acidic residues.^[7] The DDE motif is said to coordinate divalent metal ions, most often magnesium and manganese, which are important in the catalytic reaction.^[7] Because transposase is incredibly inactive, the DDE region is mutated so that the transposase becomes hyperactive and catalyzes the movement of the transposon.^[7] The glutamate is transformed into an aspartate and the two aspartates into glutamates.^[7] Through this mutation, the study of Tn5 becomes possible, but some steps in the catalytic process are lost as a result.^[3]