A Hidden Markov Model is a Bayes Network with these assumptions:
• Yt depends only on Yt-1
• Xt depends only on Yt
The belief network conveys the independence assumption:
$$
for\ all\ i \geq 0, P(S_{i+1}|S_i) = P (S_1|S_0)
$$
$$
P(S_i = s) = \sum_{s’} P(S_{i+1} = s \mid S_i = s’) * P(S_i = s’)
$$
In the context of the equation you’re referring to, ( s ) and ( s’ ) represent states in a Markov chain. Typically, ( s ) is used to denote the current state, while ( s’ ) (read as “s prime”) denotes a subsequent or different state that the system can transition into from the current state ( s ).
The summation over ( s’ ) in the equation indicates that you’re summing over all possible subsequent states that the system can transition to from the current state ( s ). This is part of the definition of a stationary distribution for a Markov chain, where the probability of being in any given state ( s ) is equal to the sum of the probabilities of transitioning to state ( s ) from all possible previous states ( s’ ), weighted by the probability of being in state ( s’ ) at the previous time step.
it works much better than bayes
Text generated by a naïve Bayes model (unigram model):
Text generated by a HMM (bigram model):
The Viterbi algorithm is a computationally efficient algorithm for computing the maximum a posteriori (MAP) state sequence
$$
f(x)= argmax_{y_1, … , y_d}P(y_1, … , y_d|x_1, … , x_d)
$$
The sum symbol, represented by the Greek letter sigma (Σ), is widely used in mathematics to denote the summation of a sequence of numbers or expressions. When you see this symbol, it means you should add up a series of numbers according to the specified rule. Here’s a breakdown of how it’s typically used:
The summation symbol is written as:
$$
\sum_{i=a}^{b} f(i)
$$
where:
Sum of the first 5 natural numbers:
$$
\sum_{i=1}^{5} i = 1 + 2 + 3 + 4 + 5 = 15
$$
Here, $f(i) = i$, and you sum the values of $i$ from 1 to 5.
Sum of the squares of the first 3 positive integers:
$$
\sum_{i=1}^{3} i^2 = 1^2 + 2^2 + 3^2 = 1 + 4 + 9 = 14
$$
In this example, $f(i) = i^2$, so you square each $i$ from 1 to 3 and then add them together.
Sum of a constant over a range:
Suppose you want to add the number 4, five times. The expression would be:
$$
\sum_{i=1}^{5} 4 = 4 + 4 + 4 + 4 + 4 = 20
$$
Here, $f(i) = 4$, which doesn’t depend on $i$. You’re essentially multiplying 4 by the number of terms (5 in this case).
Two sums
$$
\sum_{i=1}^ {5}\sum_{j=2}^ {6} ij
$$
For this, you sum over $j$ from 2 to 6 for each value of $i$ from 1 to 5, and then sum those results. It’s like computing a series within another series. The operation proceeds as follows:
Let’s compute this step-by-step to see the result.
The result of the double summation $\sum_{i=1}^ {5}\sum_{j=2}^ {6} ij$ is 300. This means that when you sum the product of $i$ and $j$ for each $i$ from 1 to 5 and each $j$ from 2 to 6, the total sum is 300.
PS: in python:
|
Summation notation is a powerful tool in mathematics, especially for dealing with sequences and series, and it’s widely used in various fields such as statistics, physics, and finance.
Similarly, we have product notation, too. The product symbol is represented by the Greek letter pi (Π), not to be confused with the mathematical constant $\pi$ (pi) used for the ratio of a circle’s circumference to its diameter. The product symbol is used to denote the multiplication of a sequence of numbers or expressions, just like the sum symbol is used for addition.
$$
\prod_{i=a}^{b} f(i)
$$
where:
Product of the first 5 natural numbers (also known as $5!$, factorial of 5):
$$
\prod_{i=1}^{5} i = 1 \times 2 \times 3 \times 4 \times 5 = 120
$$
This multiplies the values of $i$ from 1 to 5.
In mathematics and particularly in machine learning, besides the summation (Σ) and product (Π) notations, another frequently used notation is the integral symbol (∫). While the summation and product notations deal with discrete sequences, the integral symbol is used for continuous functions and is fundamental in calculus. Integrals play a crucial role in various aspects of machine learning, especially in optimization, probability distributions, and understanding the area under curves (such as ROC curves).
The basic structure of an integral is:
$$
\int_{a}^{b} f(x) , dx
$$
where:
Optimization: Many machine learning models involve optimization problems where the goal is to minimize or maximize some function (e.g., a loss function in neural networks or a cost function in logistic regression). Integrals are essential in solving continuous optimization problems, especially when calculating gradients or understanding the behavior of functions over continuous intervals.
Probability Distributions: In the context of probabilistic models and statistics, integrals are used to calculate probabilities, expected values, and variances of continuous random variables. For example, the area under the probability density function (PDF) of a continuous random variable over an interval gives the probability of the variable falling within that interval.
Feature Extraction and Signal Processing: In machine learning applications involving signal processing or feature extraction from continuous data, integrals are used to calculate various features and transform signals into more useful forms.
Kernel Methods: In machine learning, kernel methods (e.g., support vector machines) utilize integrals in the formulation of kernel functions, which are essential in mapping input data into higher-dimensional spaces for classification or regression tasks.
Deep Learning: In the training of deep neural networks, integrals may not be explicitly visible but are conceptually present in the form of continuous optimization and in the calculation of gradients during backpropagation.
Consider the problem of finding the area under a curve, which is a fundamental concept in machine learning for evaluating model performance (e.g., calculating the area under the ROC curve (AUC) for classification problems). If $f(x)$ represents the curve, the area under $f(x)$ from $a$ to $b$ can be computed by the integral:
$$
\text{Area} = \int_{a}^{b} f(x) , dx
$$
This integral computes the total area under $f(x)$ between $a$ and $b$, providing a measure of the model’s performance over that interval.
Integrals, along with summation and product notations, form the backbone of many mathematical operations in machine learning, from theoretical underpinnings to practical applications in data analysis, model evaluation, and optimization strategies.
Beyond summation (Σ), product (Π), and integral (∫) notations, there are several other mathematical symbols and concepts that are frequently used in machine learning and statistics. These include:
The gradient is a vector operation that represents the direction and rate of the fastest increase of a scalar function. In machine learning, the gradient is crucial for optimization algorithms like gradient descent, which is used to minimize loss functions. The gradient of a function $f(x_1, x_2, \ldots, x_n)$ with respect to its variables is denoted by:
$$
\nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right)
$$
The partial derivative represents the rate of change of a function of multiple variables with respect to one of those variables, keeping the others constant. It’s denoted by the symbol ∂. Partial derivatives are essential in the calculation of gradients and in the optimization of machine learning models.
The expectation or expected value of a random variable is a fundamental concept in probability and statistics, denoted by $E[X]$ for a random variable $X$. It represents the average or mean value that $X$ takes over its probability distribution and is crucial in understanding the behavior of models, especially in probabilistic settings.
Variance measures the spread of a random variable’s values and is denoted by $Var(X)$ or $\sigma^2$ for a random variable $X$. The standard deviation, $\sigma$, is the square root of the variance and provides a measure of the dispersion of data points around their mean value. These concepts are vital in assessing the reliability and performance of models.
Covariance and correlation measure the relationship between two random variables. Covariance indicates the direction of the linear relationship between variables, while correlation measures both the strength and direction of this linear relationship. Understanding these relationships is essential in features selection and in modeling the interactions between variables.
Big O notation is used to describe the computational complexity of algorithms, which is crucial in machine learning for understanding the scalability and efficiency of models and algorithms. For example, an algorithm with a complexity of $O(n^2)$ means its execution time or space requirements increase quadratically as the input size $n$ increases.
Matrices and vectors are fundamental in machine learning for representing and manipulating data. Operations such as matrix multiplication, transpose, and inversion are essential for linear algebra, which underpins many machine learning algorithms, including neural networks, PCA (Principal Component Analysis), and SVMs (Support Vector Machines).
Each of these mathematical concepts plays a crucial role in the formulation, analysis, and implementation of machine learning algorithms. They provide the theoretical foundation for understanding model behavior, optimizing performance, and evaluating outcomes in a wide range of applications.
Matrix multiplication is a fundamental operation in linear algebra with extensive applications in mathematics, physics, engineering, computer science, and particularly in machine learning and data analysis. The way matrix multiplication is defined—by taking the dot product of rows and columns—might seem arbitrary at first, but it’s designed to capture several important mathematical and practical concepts.
Understanding how to perform basic operations with matrices—addition, subtraction, multiplication, and division (in a sense)—is crucial in linear algebra, which is foundational for many areas of mathematics, physics, engineering, and especially machine learning. Here’s a brief overview of each operation:
$$
\begin{pmatrix}
a_{11} & \cdots & a_{1j} \\
\vdots & \ddots & \vdots \\
a_{i1} & \cdots & a_{ij}
\end{pmatrix}
$$
Each element within the matrix is a pair $(i,j)$, where $i$ is the row index and $j$ is the column index.
Matrix addition and subtraction are straightforward operations that are performed element-wise. This means you add or subtract the corresponding elements of the matrices. For these operations to be defined, the matrices must be of the same dimensions.
Addition: If $A$ and $B$ are matrices of the same size, their sum $C = A + B$ is a matrix where each element $c_{ij}$ is the sum of $a_{ij} + b_{ij}$.
Subtraction: Similarly, the difference $C = A - B$ is a matrix where each element $c_{ij}$ is the difference $a_{ij} - b_{ij}$.
If $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ and $B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}$, then
Matrix multiplication is more complex and involves a dot product of rows and columns. For two matrices $A$ and $B$ to be multiplied, the number of columns in $A$ must equal the number of rows in $B$. If $A$ is an $m \times n$ matrix and $B$ is an $n \times p$ matrix, the resulting matrix $C = AB$ will be an $m \times p$ matrix where each element $c_{ij}$ is computed as the dot product of the $i$th row of $A$ and the $j$th column of $B$.
If $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ and $B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}$, then
Matrix division as such doesn’t exist in the way we think of division for real numbers. Instead, we talk about the inverse of a matrix. For matrix $A$ to “divide” another matrix $B$, you would multiply $B$ by the inverse of $A$, denoted as $A^{-1}$. This operation is only defined for square matrices (same number of rows and columns), and not all square matrices have an inverse.
If $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$, and its inverse $A^{-1} = \begin{pmatrix} -2 & 1 \\ 1.5 & -0.5 \end{pmatrix}$, and you want to “divide” $B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix}$ by $A$, you would compute $A^{-1}B$.
Perceptron is invented before the loss function
Linear classifier: Notation
• The observation xT = [x1, … , xn] is a real-valued vector (d is the number of feature dimensions)
• The class label y ∈ Y is drawn from some finite set of class labels.
• Usually the output vocabulary, Y, is some set of strings. For
convenience, though, we usually map the class labels to a sequence
of integers, Y = [1, … , v} , where � is the vocabulary size
A linear classifier is defined by
$$
f(x) = \text{argmax } Wx + b
$$
where:
$w_k, b_k$ are the weight vector and bias corresponding to class $k$, and the argmax function finds the element of the vector $wx$ with the largest value.
There are a total of $v(d + 1)$ trainable parameters: the elements of the matrix $w$.
Notice that in the two-class case, the equation
$$
f(x) = \text{argmax } Wx + b
$$
Simplifies to
The class boundary is the line whose equation is
$$
(w_2 - w_1)^T x + (b_2 - b_1) = 0
$$
Suppose we have training tokens $(x_i, y_i)$, and we have some initial class vectors $w_1$ and $w_2$. We want to update them as
$$
w_1 \leftarrow w_1 - \eta \frac{\partial \mathcal{L}}{\partial w_1}
$$
$$
w_2 \leftarrow w_2 - \eta \frac{\partial \mathcal{L}}{\partial w_2}
$$
…where $\mathcal{L}$ is some loss function. What loss function makes sense?
The most obvious loss function for a classifier is its classification error rate,
$$
\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} \ell(f(x_i), y_i)
$$
Where $\ell(\hat{y}, y)$ is the zero-one loss function,
$$
\ell(f(x), y) =
\begin{cases}
0 & \text{if } f(x) = y \
1 & \text{if } f(x) \neq y
\end{cases}
The problem with the zero -one loss function is that it’s not differentiable:
Integer vectors: One-hot vectors, A one-hot vector is a binary vector in which all elements are 0 except for a single element that’s equal to 1.
Drive the perceptron
a mistake happens here (function)
]]>What happens after the molecule is excited?
Fluorescence properties depend on what happens to the molecule during the ~10-8 sec during which it is excited. The decay after absorption includes 1. radiative decay ($K_f$) 2. Non-radiative decay($k_NR$)
Fluorescence happens very fast because it back to the ground state very fast. In general, the decay brings the electron from excited state to the ground state (decay defines as events per sec ($k_f$))
Nan-radiative decay (exp: form of heat), the decay faster and does not generate photon. The energy transfer into solid molecules and spreed away. They don't went to exited state and generate photons.
Unit: sec-1 | |
---|---|
Black = Non-radiative Red = Radiative (photon) ABS = absorption (1015) IC = internal conversion (kIC ≈ 1011~12) Q = quenching IX=intersystem crossing S1→T0: 108; T1→S0: 102 Chem=photochemistry kf ≈ 108; kp ≈ 102 F = fluorescence P = phosphorescence Trans = energy transfer kcollision ≈ 1010 M-1sec-1 |
Only apart of electron went to the S1 and they decay back to the ground state to generate fluorescence. Most of them when to S2 and decay faster., In this case, less energy lost through fluorescence. Those change are internal Change. When they when to T1 it transferred to other states (intersystem crossing) and generate phosphorescence. This state state decay very slow (phosphorescence decays for a few seconds or even more slow)
What is the concentration of pure water?
The processes from the S2 to S1 is very fast and easily be absorbed and stored in the solvent.
The processes from the S1 to the S0, the processes relatively slow.
Concentration of bonds for solvent like water is very high, so it could store lots of energy.
Measure fluorescence at fixed λex as a function of λem
Stokes shift: Emission spectrum is always red-shifted (lower energy) compared to the absorption spectrum
Vibrational relaxation | Solvent reorganization |
---|---|
© libretexts.org | © wikipedia |
Franck-Condon Overlap Factors Prob (0’→2’) ≅ Prob (2’←0’) etc |
The vibrational relaxation looks almost symitry. So, the peaks from the solvent reorganization should corresponded the states change in the vibrational relaxation.
The emission shift (Stokes Shift) always as red-shifted (into right). So, the fluorescence always as less energy as the excited state.
Solvent Effect |
---|
The larger, the more effects (?)
Absorption: ~ 10-15 sec
Solvent reorganization (relaxation): ~ 10-10 sec
Fluorescence: ~ 10-8 sec
Step 1: Permanent Dipoles of solvent re-orient to adjust to the altered dipole of the excited fluorophore.
Step 2: The dipole-dipole interaction in turn stabilizes S1 and destabilizes S0.
Requires:
1. solvent polarity (dielectric constant, ε)
2. mobility of solvent (reorientation of solvent dipoles)
Excite some molecules to $ S_1 $ with a brief pulse of light at $ t = 0 $, $ N_0^ * $ excited state molecules
Decay of the excited state population is exponential:
$$ \frac{dN^ *(t)}{dt} = -(k_ f + k_{NR})N^ *(t) $$
left: ?. right: Chemical rate total decay rate * total concentration
So: $ N ^ * (t) = N_ 0 ^ * e ^ {-(k_f+k_ {NR})t} = N_0 ^ * e^ {-t/\tau} \quad$
where $N ^ * (t)$ is the number of excited molecules at time t.
Define: fluorescence lifetime $ \tau $
$ \tau = \frac{1}{k_f + k_{NR}} $
the processes when 1/e ?
Hence, the equation for the decay of the excited state population:
$ N^ *(t) = N_0^ * e^ {-t/\tau} $
The meaning of the fluorescence lifetime
τ has units of time (seconds)
$[N_ {t=\tau}^ *] = \frac{N_ 0^ *}{e} \approx 0/37N_ 0^ *$
After one lifetime following excitation, the probability of a molecule still being in the excited state is about 37%. The shorter the τ the faster the decay.
Commonly used fluorescence in biological system, the τ ~ 1-10 ns
Rate up: $S_ 0 \rightarrow S_ 1 = I_ 0$, unit: [# of photons absorbed/sec]
Rate down: $S_ 1 \rightarrow S_ 0 = -(k_ f + k_ {NR}) \cdot N ^*(t)$
Unit: [1/sec][# of photons]
$N^ *(t)$ = concentration of excited state molecules at any time, $t$
$k_ {NR}$ = sum of all non-radiative rate constants
$N^ *(t) = [S_1(t)]$, [# of molecules]
Steady state: rate up = rate down
$0 = \frac{dN ^ *(t)}{dt} = I_ 0 - (k_ f + k_ {NR}) N ^ *(t)$
In the steady state $N ^ *(t)$ is constant $= N ^ * _ {SS}$
$I_0 = (k_f + k_{NR}) N ^ *_{SS}$
$Q_f$ (QY): fraction of excited-state molecules that relax to the ground state by emitting a photon.
photons/sec emitted in steady state
$$ Q_f = \frac{k_ f N^ *_ {SS}}{I_ 0} = \frac{k_ f N^ *_ {SS}}{(k_ f + k_{NR})N^ *_ {SS}} $$
Since $I_0 = (k_f + k_{NR})N^*_{SS}$
photons/sec absorbed in steady state
Quantum yield: $Q_f = \frac{k_f}{(k_f + k_{NR})} = k_f \times \tau$
Recall $\tau = \frac{1}{(k_f + k_{NR})}$
Absorption spectra Fluorescence spectra of amino acids in water “About 300 papers per year abstracted in Biological Abstracts report work that exploits or studies tryptophan (Trp) fluorescence in proteins…”
Vivian et al. Biophysical Journal 2001
Lifetime (nsec) | Absorption | Fluorescence | |||
---|---|---|---|---|---|
Wavelength (nm) | Absorptivity (ε, M-1cm-1) | Wavelength (λmax, nm) | Quantum Yield (25°C) | ||
Tryptophan | 2.6 | 280 | 5,600 | 348 | 0.20 |
Tyrosine | 3.6 | 274 | 1,400 | 303 | 0.14 |
Phenylalanine | 6.4 | 257 | 200 | 282 | 0.04 |
© zeiss |
Compound | Lifetime (nsec) | Wavelength (nm)(Absorption) | Absorptivity (ε, M-1cm-1) | Wavelength (λmax, nm) (Emission) | Quantum Yield (25°C) (Emission) |
---|---|---|---|---|---|
Tryptophan | 2.6 | 280 | 5,600 | 348 | 0.20 |
Tyrosine | 3.6 | 274 | 1,400 | 303 | 0.14 |
Phenylalanine | 6.4 | 257 | 200 | 282 | 0.04 |
wtGFP | 3.3/2.8 | 395/475 | 21,000 | 509 | 0.77 |
(Enhanced) | |||||
EGFP (F64L, S65T) | 2.7 | 484 | 56,000 | 507 | 0.60 |
Quenching: reduce the fluorescent signal. Static and dynamic quenching causing the similar result. But the processes are totally different.
$$F = \sigma × 𝐼 × 𝑄Y$$
$F$: fluorescence intensity (photons/sec)
$\sigma$: absorption cross-section (cm2)
$I$: excitation light flux - photons/(cm2 sec)
$QY$: Quantum yield (unitless)
Mirror-image rule: between the citation and the shifting spectrum, they are symmetry.
Fluorescent species $A$ can associate with quencher $Q$ to form a non-fluorescent complex $AQ$:
$$
A + Q \leftrightarrow AQ
$$
The association constant $K_a$ is defined as:
$$
K_a = \frac{[AQ]}{[A][Q]}
$$
The ratio of the fluorescence intensities without and with the quencher present is given by:
$$
\frac{F_0}{F} = \frac{A_{tot}}{A} = \frac{[A] + [AQ]}{[A]} = \frac{[A] + [A][Q]K_a}{[A]} \\ = 1 + [Q]K_a
$$
F is after quenching
It could quenching black radioactive object.
The diagram illustrates the energy levels $S_1$ and $S_0$, with $k_f$ representing the rate of fluorescence, $k_{NR}$ the non-radiative decay, and $[Q]k_Q$ the rate of quenching by the quencher $Q$. There’s also an illustrative depiction of a molecule $A^*$ being quenched by $Q$ within a radius of 50Å.
Only quenching excited molecule
The energy levels $S_1$ and $S_0$ are shown with $k_f$ representing the rate of fluorescence, $k_{NR}$ the non-radiative decay, and $[Q]k_Q$ the rate of quenching by the quencher $Q$.
Rate of decay due to collision:
$$
\frac{d[S_1]}{dt} = -k_Q[Q][S_1]
$$
Total rate of decay: $S_1 \rightarrow S_0$:
The quantum yield in the presence of a quencher:
$$ Q_f^{\theta} = \frac{k_f}{k_f + k_{NR} + k_Q[Q]} $$
No Quencher:$Q_f^0 = \frac{k_f}{k_f + k_{NR}}$
Plus quencher:$Q_f^{\theta} = \frac{k_f}{k_f + k_{NR} + k_Q[Q]}$
The ratio of fluorescence intensities without and with the quencher is given by:
$$
\frac{F_0}{F} = \frac{Q_f^ 0}{Q_f^ 0 + Q_f} \\ = \frac{k_ f}{k_ f + k_ {NR}} \times \frac{k_ f + k_ {NR} + k_ Q[Q]}{k_ f} \\
= 1 + \frac{k_Q[Q]}{k_f + k_{NR}} \\ = 1 + \tau_0 k_Q[Q]
$$
Where $\tau_0 = \frac{1}{k_f + k_{NR}}$ is the fluorescence lifetime (without quencher).
Define: the Stern-Volmer constant
$K_{SV} = k_Q \tau_0$
$ \frac{F_0}{F} = 1 + K_{SV} [Q] $
KSV measures the rate of quencher colliding into fluorophores at the excited state.
The more the fluorophore is protected from solvent, the smaller the value of KSV.
(It descripts how strong the quencher it is. The larger (sharper), the stronger.) |
Dynamic quenching: $\frac{F_0}{F} = 1 + K_{SV} [Q]$
Static quenching: $\frac{F_0}{F} = 1 + K_a [Q]$
In each case, you will get a straight line if you plot $\frac{F_0}{F}$ vs $[Q]$
The differential equation for the decay of excited state molecules $N^*$ is given by:
$$
\frac{dN^ * (t)}{dt} = -(k_f + k_{NR} + k_Q[Q])N^ * (t)
$$
This leads to the solution:
$$
N^ * (t) = N ^ * _ 0 e ^ {-(k_f+k_{NR}+k_Q[Q])t}
$$
And equivalently:
$$
N^ * (t) = N ^ * _0 e^ {-\frac{t}{\tau}}
$$
DYNAMIC QUENCHING: The lifetime of the excited state decreases as the concentration of the quencher is increased.
In the presence of a quencher, the lifetime $\tau$ is given by:
$$
\tau = \frac{1}{(k_f + k_{NR} + k_Q[Q])}
$$
Plus quencher | No quencher |
---|---|
$\tau = \frac{1}{(k_f + k_{NR} + k_Q[Q])}$ | $ \tau_0 = \frac{1}{(k_f + k_{NR})} $ |
$Q_f^{+Q} = \frac{k_f}{k_f + k_{NR} + k_Q[Q]} $ | $ Q_f^{0} = \frac{k_f}{k_f + k_{NR}} $ |
hence
$$
\frac{\tau_0}{\tau} = \frac{Q_f^ {0}}{Q_f^ {+Q}} \approx \frac{F_0}{F}
$$
Stern-Volmer plot will be the same if you plot lifetimes or fluorescence intensity
Static quenching (ground state)
$$
A + Q \leftrightarrow AQ
$$
$$
K_a = \frac{[AQ]}{[A][Q]}
$$
Fluorescent species $A$ can form a non-fluorescent complex $AQ$ with quencher $Q$. The association constant $K_a$ is defined as the ratio of the concentration of the complex to the product of the concentrations of $A$ and $Q$.
The ratio of the fluorescence intensities without and with the quencher is given by:
$$
\frac{F_0}{F} = \frac{A_{tot}}{A} = \frac{[A] + [AQ]}{[A]} = \frac{[A] + [A][Q]K_a}{[A]} = 1 + [Q]K_a
$$
The excited state species $A^*$ has the same properties in the presence of the static quencher. But there is less of it, so the fluorescence intensity decreases.
$$
\frac{F_0}{F} = (1 + k_Q \tau_0 [Q])(1 + K_a [Q]) = (1 + K_{SV} [Q])(1 + K_a [Q])
$$
fluorescence
Trp94 + His18 ⇌ Trp94·(H+His18) DARK
At acidic pH:
Hurtubise RJ (1990) Phosphorimetry: Theory, Instrumentation, and Applications, VCH, New York. ↩︎
GitHub: shenwei356/seqkit
|
|
|
-s
: Specifies that duplicates should be identified based on sequence content.[input_file]
: Replace this with the path to your input FASTA or FASTQ file.-o [output_file]
: Specifies the output file. Replace [output_file]
with the desired path for the file containing the sequences after duplicate removal.-D
: write all removed duplicates (and counts) to this specified file.In numpy, the dot product can be written np.dot(w,x) or w@x.
Vectors will always be column vectors. Thus:
Vectors are lowercase bold letters | Matrices are uppercase bold letters |
Vector and Matrix Gradients
The gradient of a scalar function with respect to a vector or matrix is:
The symbol $\frac{\sigma f}{\sigma x_ 1}$ means “partial derivative of f with respect to x1”
© Vladimir Nasteski |
$$ f(x) = w^ T x + b = \sum_{j=0} ^{D-1} w_ j x_ j + b $$
Squared: tends to notice the big values and trying ignor small values.
One useful criterion (not the only useful criterion, but perhaps the most common) of “minimizing the error” is to minimize the mean squared error:
$$ \mathcal{L} = \frac{1}{2n} \sum_{i=1}^ {n} \varepsilon_i^ 2 = \frac{1}{2n} \sum_{i=1}^ {n} (f(x_ i) - y_ i)^ 2 $$
The factor $\frac{1}{2}$ is included so that, so that when you differentiate ℒ , the 2 and the $\frac{1}{2}$ can cancel each other.
MSE = Parabola
Notice that MSE is a non -negative quadratic function of f(x~i~) = w^T^ x~i~ + b, therefore it’s a non negative quadratic function of w . Since it’s a non -negative quadratic function of w, it has a unique minimum that you can compute in closed form! We won’t do that today.
$\mathcal{L} = \frac{1}{2n} \sum_{i=1}^ {n} (f(x_ i) - y_ i)^ 2$
The iterative solution to linear regression (gradient descent):
$ w \leftarrow w - \eta \frac{\partial \mathcal{L}}{\partial w} $
$ b \leftarrow b - \eta \frac{\partial \mathcal{L}}{\partial b} $
The loss function ( \mathcal{L} ) is defined as:
[ \mathcal{L} = \frac{1}{2n} \sum_{i=1}^{n} L_i, \quad L_i = \varepsilon_i^2, \quad \varepsilon_i = w^T x_i + b - y_i ]
To find the gradient, we use the chain rule of calculus:
[ \frac{\partial \mathcal{L}}{\partial w} = \frac{1}{2n} \sum_{i=1}^{n} \frac{\partial L_i}{\partial w}, \quad \frac{\partial L_i}{\partial w} = 2\varepsilon_i \frac{\partial \varepsilon_i}{\partial w}, \quad \frac{\partial \varepsilon_i}{\partial w} = x_i ]
Putting it all together,
[ \frac{\partial \mathcal{L}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} \varepsilon_i x_i ]
• Start from random initial values of
� and � (at � = 0).
• Adjust � and � according to:
[ w \leftarrow w - \frac{\eta}{n} \sum_{i=1}^{n} \varepsilon_i x_i ]
[ b \leftarrow b - \frac{\eta}{n} \sum_{i=1}^{n} \varepsilon_i ]
Perceptron is invented before the loss function
Linear classifier: Notation
• The observation xT = [x1, … , xn] is a real-valued vector (d is the number of feature dimensions)
• The class label y ∈ Y is drawn from some finite set of class labels.
• Usually the output vocabulary, Y, is some set of strings. For
convenience, though, we usually map the class labels to a sequence
of integers, Y = [1, … , v} , where � is the vocabulary size
A linear classifier is defined by
$$
f(x) = \text{argmax } Wx + b
$$
where:
$w_k, b_k$ are the weight vector and bias corresponding to class $k$, and the argmax function finds the element of the vector $wx$ with the largest value.
There are a total of $v(d + 1)$ trainable parameters: the elements of the matrix $w$.
Notice that in the two-class case, the equation
$$
f(x) = \text{argmax } Wx + b
$$
Simplifies to
The class boundary is the line whose equation is
$$
(w_2 - w_1)^T x + (b_2 - b_1) = 0
$$
Suppose we have training tokens $(x_i, y_i)$, and we have some initial class vectors $w_1$ and $w_2$. We want to update them as
$$
w_1 \leftarrow w_1 - \eta \frac{\partial \mathcal{L}}{\partial w_1}
$$
$$
w_2 \leftarrow w_2 - \eta \frac{\partial \mathcal{L}}{\partial w_2}
$$
…where $\mathcal{L}$ is some loss function. What loss function makes sense?
The most obvious loss function for a classifier is its classification error rate,
$$
\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} \ell(f(x_i), y_i)
$$
Where $\ell(\hat{y}, y)$ is the zero-one loss function,
$$
\ell(f(x), y) =
\begin{cases}
0 & \text{if } f(x) = y \
1 & \text{if } f(x) \neq y
\end{cases}
The problem with the zero -one loss function is that it’s not differentiable:
Integer vectors: One-hot vectors, A one-hot vector is a binary vector in which all elements are 0 except for a single element that’s equal to 1.
Drive the perceptron
a mistake happens here (function)
Compute the classifier output $\hat{y} = \text{argmax}_k (w_k^T x + b_k)$
Update the weight vectors as:
$$
w_k \leftarrow
\begin{cases}
w_k - \eta x & \text{if } k = \hat{y} \\
w_k + \eta x & \text{if } k = y \\
w_k & \text{otherwise}
\end{cases}
$$
where $\eta \approx 0.01$ is the learning rate.
Key idea: $f_c(x) =$ posterior probability of cass $c$
Axiom #1, probabilities are non-negative $(f_k(x) \geq 0)$. There are many ways to do this, but one way that works is to choose:
$$
f_c(x) \propto \exp(w_c^T x + b_c)
$$
Axiom #2, probabilities should sum to one $(\sum_{k=1}^{v} f_k(x) = 1)$. This can be done by normalizing:
$$
f(x) = [f_1(x), …, f_v(x)]^T
$$
$$
f_c(x) = \frac{\exp(w_c^T x + b_c)}{\sum_{k=0}^{v-1} \exp(w_k^T x + b_k)}
$$
where $w_k^T$ is the $k^{th}$ row of the matrix $W$.
For a two-class classifier, we don’t really need the vector label. If we define $w = w_2 - w_1$ and $b = b_2 - b_1$, then the softmax simplifies to:
$$
f(Wx + b) =
\begin{bmatrix}
\text{Pr}(Y = 1|x) \\
\text{Pr}(Y = 2|x)
\end{bmatrix} =
\begin{bmatrix}
\frac{1}{1+e^ {-(w^ Tx+b)}} \\
\frac{e^ {-(w^ Tx+b)}}{1+e^ {-(w^ Tx+b)}}
\end{bmatrix} =
\begin{bmatrix}
\sigma(w^Tx + b) \\
1 - \sigma(w^Tx + b)
\end{bmatrix}
$$
… so instead of the softmax, we use a scalar function called the logistic sigmoid function:
$$
\sigma(z) = \frac{1}{1+e^{-z}}
$$
This function is called sigmoid because it is S-shaped.
For $z \to -\infty$, $\sigma(z) \to 0$
For $z \to +\infty$, $\sigma(z) \to 1$
Suppose we have training tokens $(x_i, y_i)$, and we have some initial class vectors $w_1$ and $w_2$. We want to update them as
$$
w_1 \leftarrow w_1 - \eta \frac{\partial \mathcal{L}}{\partial w_1}
$$
$$
w_2 \leftarrow w_2 - \eta \frac{\partial \mathcal{L}}{\partial w_2}
$$
…where $\mathcal{L}$ is some loss function. What loss function makes sense?
The most obvious loss function for a classifier is its classification error rate,
$$
\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} \ell(\hat{f}(x_i), y_i)
$$
Where $\ell(\hat{y}, y)$ is the zero-one loss function,
$$
\ell(f(x), y) =
\begin{cases}
0 & \text{if } f(x) = y \\
1 & \text{if } f(x) \neq y
\end{cases}
$$
The problem with zero-one loss is that it’s not differentiable.
Suppose we have a softmax output, so we want $f_c(x) \approx \Pr(Y = c|x)$. We can train this by learning $W$ and $b$ to maximize the probability of the training corpus. If we assume all training tokens are independent, we get:
$$
W, b = \underset{W,b}{\text{argmax}} \prod_{i=1}^{n} \Pr(Y = y_i|x_i) = \underset{W,b}{\text{argmax}} \sum_{i=1}^{n} \ln \Pr(Y = y_i|x_i)
$$
But remember that $f_c(x) \approx \Pr(Y = c|x)$! Therefore, maximizing the log probability of training data is the same as minimizing the cross entropy between the neural net and the ground truth:
$$
W, b = \underset{W,b}{\text{argmin}} -\frac{1}{n} \sum_{i=1}^{n} \mathcal{L}_ i, \quad \mathcal{L}_ i = - \log f_ {y_ i}(x_ i)
$$
This loss function:
$$
\mathcal{L} = - \ln f_{y}(x)
$$
is called cross-entropy. It measures the difference in randomness between:
Thus
$$
\mathcal{L} = 0 - \ln f_{y}(x)
$$
Since we have these definitions:
$$
\mathcal{L} = - \ln f_{y}(x), \quad f_{y}(x) = \frac{\exp(z_{y})}{\sum_{k=1}^{v} \exp(z_{k})}, \quad z_{c} = w_c^T x + b_c
$$
Then:
$$
\frac{\partial \mathcal{L}}{\partial w_c} = \left( \frac{\partial \mathcal{L}}{\partial z_c} \right) \left( \frac{\partial z_c}{\partial w_c} \right) = \left( \frac{\partial \mathcal{L}}{\partial z_c} \right) x
$$
…where:
$$
\frac{\partial \mathcal{L}}{\partial z_c} =
\begin{cases}
f_{c}(x_i) - 1 & c = y \\
f_{c}(x_i) & c \neq y
\end{cases}
$$
For linear regression, we had:
$$
\frac{\partial \mathcal{L}}{\partial w} = \epsilon x, \quad \epsilon = f(x) - y
$$
For the softmax classifier with cross-entropy loss, we have
$$
\frac{\partial \mathcal{L}}{\partial w_c} = \epsilon_c x
$$
$$
\epsilon_c =
\begin{cases}
f_c(x_i) - 1 & c = y \text{ (output should be 1)} \\
f_c(x_i) & \text{otherwise (output should be 0)}
\end{cases}
$$
Suppose we have a training token $(x, y)$, and we have some initial class vectors $w_c$. Using softmax and cross-entropy loss, we can update the weight vectors as
$$
w_c \leftarrow w_c - \eta \epsilon_c x
$$
…where
$$
\epsilon_c =
\begin{cases}
f_c(x_i) - 1 & c = y_i \\
f_c(x_i) & \text{otherwise}
\end{cases}
$$
In other words, like a perceptron,
$$
\epsilon_c =
\begin{cases}
\epsilon_c < 0 & c = y_i \\
\epsilon_c > 0 & \text{otherwise}
\end{cases}
$$
Softmax:
$$ f_c(x) = \frac{\exp(w_c^T x + b_c)}{\sum_{k=1}^{v} \exp(w_k^T x + b_k)} \approx \Pr(Y = c|x) $$
Cross-entropy:
$$ \mathcal{L} = - \ln f_{y}(x) $$
Derivative of the cross-entropy of a softmax:
$$ \frac{\partial \mathcal{L}}{\partial w_c} = \epsilon_c x, \quad \epsilon_c =
\begin{cases}
f_c(x_i) - 1 & c = y \text{ (output should be 1)} \\
f_c(x_i) & \text{otherwise (output should be 0)}
\end{cases} $$
Gradient descent:
$$ w_c \leftarrow w_c - \eta \epsilon_c x $$
Biological inspiration: Long-term potentiation
The Titanic sank. You were rescued. You want to know if your friend was also rescued. You can’t find them. Can you use machine learning methods to estimate the probability that your friend survived? (Calculate the possibility of your friend also be rescued)
Decision-tree learning*:
© wikipedia |
In each leaf node of this tree:
Number on the left = probability of survival
Number on the right = percentage of all known cases that are explained by this node
A decision tree is an example of a parametric learner
The function f(x) is determined by some learned parameters
In this case, the parameters are:
Should this node split, or not?
If so, which tokens go to the right-hand child?
If not, what is f(x) at the current node?
Titanic shipwreck example:
A mathematical definition of learning
Learning: Given $\mathcal{D} = {(x_1, y_1), \ldots, (x_n, y_n)}$, find the function $f(X)$ that minimizes some measure of risk.
Empirical risk: a.k.a. training corpus error:
True risk, a.k.a. expected test corpus error:
Usually, minimum test error and minimum dev error don’t occur at the same time
… but early stopping based on the test set is cheating,
… so early stopping based on the dev set is the best we can do w/o cheating.
The problem with likelihood: Too many words
What does it mean to say that the words, x, have a particular probability?
Suppose our training corpus contains two sample emails:
Our test corpus is just one email:
One thing we could do is:
Then the approximation formula for $P(X | Y)$ is given by:
$$ P(X = x | Y = y) \approx \prod_{i=1}^{n} P(W = w_i | Y = y) $$
In this context, $W$ represents a word in a document, $X$ represents the document itself, $Y$ represents the class (spam or ham), $w_i$ represents the $i$-th word in the document, and $n$ is the total number of words in the document. The product is taken over all words in the document, assuming that the words are conditionally independent of each other given the class label $Y$.
Why naïve Bayes is naïve?
We call this model "naïve Bayes" because the words aren't really conditionally independent given the label. For example, the sequence "for you" is more common in spam emails than it would be if the words "for" and "you" were conditionally independent.
True Statement:
Naïve Bayes Approximation:
That equation has a computational issue. Suppose that the probability of any given word is roughly P(W = Wi|Y = y) ≈ 10-3, and suppose that there are 103 words in an email. Then ∏ni=1 P(W = Wi|Y = y) = 10-309,which gets rounded off to zero. This phenomenon is called “floating-point underflow”.
Solution
$$f(x) = \underset{y}{\mathrm{argmax}} \left( \ln P(Y = y) + \sum^n_{i=1} \ln P(W = w_i | Y = y) \right)$$
Remember that the bag-of-words model is unable to represent this fact:
True Statement:
N-Grams:
Bigram naïve Bayes
A bigram naïve Bayes model approximates the bigrams as conditionally independent, instead of the unigrams. For example,
P(X = “approved prescription for you” | Y= Spam) ≈
P(B = “approved prescription” |Y = Spam) ×
P(B = “prescription for” | Y= Spam) ×
P(B = “for you” |Y = Spam)
The prior, P(x), is usually estimated in one of two ways.
The likelihood, ***P(W = wi|Y = y), is also estimated by counting. The “maximum likelihood estimate of the likelihood parameter” is the most intuitively obvious estimate:
$$
P(W=w_i| Y = Spam) = \frac{Count(W=w_i, Y = Spam)}{Count(Y = Spam)}
$$
where Count(W=wi, Y = Spam) means the number of times that the word wi occurs in the Spam portion of the training corpus, and Count(Y = Spam) is the total number of words in the Spam portion.
One of the biggest challenge for Bayes is it can’t handle unobserved situation.
The basic idea: add $k$ “unobserved observations” to the count of every unigram
Estimated probability of a word that occurred Count(w) times in the training data:
$$ P(W = w) = \frac{k + \text{Count}(W = w)}{k + \sum_v (k + \text{Count}(W = v))} $$
Estimated probability of a word that never occurred in the training data (an “out of vocabulary” or OOV word):
$$ P(W = \text{OOV}) = \frac{k}{k + \sum_v (k + \text{Count}(W = v))} $$
Notice that
$$ P(W = \text{OOV}) + \sum_w P(W = w) = 1 $$
Why Network?
A better way to represent knowledge: Bayesian network
graph LR B --> A E --> A A --> J A --> M
$$
P(B = T \mid J = T, M = T) = \frac{P(B = T, J = T, M = T)}{P(J = T, M = T)} \\
= \frac{P(B = T, J = T, M = T)}{P(J = T, M = T)} + \frac{P(B = L, J = T, M = T)}{P(J = T, M = T)} \\
= \sum_{e=T}^{L} \sum_{a=T}^{L} P(B = T, E = e, A = a, J = T, M = T) \\
= \sum_{e=T}^{L} \sum_{a=T}^{L} P(B = T) P(E = e) \ × \\
P(A = a \mid B = T, E = e) P(J = T \mid A = a) P(M = T \mid A = a)
$$
graph TDB --> AE --> A
!!! Independent variables may not be conditionally independent
graph TDA --> JA --> M
Conditionally Independent variables may not be independent
graph LR B --> A E --> A A --> J A --> M
© wiki |
$$
\frac{I}{I_ 0} = 10^ {-\frac{kc(\Delta y)}{2.303}} = 10^ {- \epsilon c (\Delta y)} = 10^ {-A}
$$
Define of absorbance: | |
---|---|
© imamagnets |
Application of Absorbance
Use UV-Vis absorbance to calculate the concentration of the molecules like DNA, protein, etc.
$A=\epsilon (\lambda) c l$
Molecule | λ (nm) | ε (×10-3) (M-1cm-1) |
---|---|---|
Adenine | 260.5 | 13.4 |
Adenosine | 259.5 | 14.9 |
NADH | 340, 259 | 6.23, 14.4 |
NAD+ | 260 | 18 |
FAD | 450 | 11.2 |
Tryptophan | 280, 219 | 5.6, 47 |
Tyrosine | 274,222,193 | 1.4, 8, 48 |
Phenylalanine | 257, 206, 188 | 0.2, 9.3, 60 |
Histidine | 211 | 5.9 |
Cysteine | 250 | 0.3 |
The probability per unit time that a molecule in state 1 will end up in state 2 in the presence of an oscillating electromagnetic field at the resonance frequency
Energy of the light = Difference between energy levels
$$
Rate_ {1 → 2} = B_ {12} \rho (\nu)[S_ 1]
$$
Rate constant dependents on the transition dipole moment
$B_{12} \propto \langle \mu \rangle^2$
$\langle \mu \rangle = \int \Psi_2 (q_e\vec{r})\Psi dV$
Transition dipole moment:
$\langle \mu \rangle \propto$ overlap between $\Psi_1$ and $\Psi_2$
When a light wave hit hydrogen atom, the r from the atom into electron is far smaller than the λ.
© Sentry |
The transition dipole reflects the change of electron distribution by excitation |
© wikipedia |
Transition dipole moment |
$$\mu_{mn} = \int_{-\infty}^{\infty} \Psi_n^* \left( \sum_{i=1}^{N} q_e \vec{r}_i \right) \Psi_m , dr $$
Where:
Note: each molecular wavefunction depends on the position of BOTH Nucleus AND electron but for now, let’s focus on electron wavefunction (i.e. no structural change of molecular structure or atom position).
Larger overlap of initial and final state wavefunctions means higher transition probability, which generates higher extinction coefficient (Fermi’s golden rule).
Wavefunction overlap: Larger wavefunction overlap of initial and final state means higher transition probability, which generates higher extinction coefficient (Fermi’s golden rule).
Orbital Symmetry:
$\int_{-\infty}^{\infty} f(r ) , dr = 0 \quad \text{if} \quad f(r )$ is an odd function; i.e., $f(-x) = -f(x)$
$\int_{-\infty}^{\infty} f(r ) , dr \neq 0 \quad \text{if} \quad f(r )$ is an even function; i.e., $f(-x) = f(x)$
So: $\Psi_n^* \tilde{\Psi}_m$ must be an even function;
or: $\Psi_n^* \Psi_m$ must be an odd function (e.g., } $\pi$ and $\pi^ *$ state
Traditional ways to quantify the “strength of a transition”
Dipole strength: $D_{mn} = |\mu_{mn}|^2 = 9.18 \times 10^{-3} \int \left( \frac{\varepsilon}{\nu} \right) d\nu $
Oscillator strength: $f_{mn} = 4.315 \times 10^{-9} \int \varepsilon(\nu) d\nu $
Area under the spectrum associated with the m→n transition
$ f_{mn} \approx 0.1-1 $ Strong absorption (heme, chlorophyll, organic dyes)
$ f_{mn} \approx 10^{-5} $ Weak absorption
In any molecule with N electrons the sum of the oscillator strengths from any one state to all of the other states is equal to the sum of the electrons
$$
\sum_j f_{ij} = N
$$
This means that the area underneath the absorption spectrum is a constant (ground state is the initial state).
If a molecule is perturbed (change environments) then if one transition goes down, another must go up.
Every transition is associated with a transition dipole
The transition dipole is a vector:
σ→σ* often requires absorption of photons higher than the UV- vis range (200-700 nm).
Transition Rate:
$$ Rate_{1 \rightarrow 2} = B_{12} \rho(\nu) [S_1]$$
Radiation field density
Component of the Electric Field:
$$ E_{\parallel} = |\vec{E}| \cos \theta$$
Density of States:
$$ \rho(\nu) \propto |E_{\parallel}|^2 = |\vec{E}|^2 \cos^2 \theta$$
The density of states $\rho(\nu)$ is proportional to the square of the parallel component of the electric field:
$$ \rho(\nu) \propto |E_{\parallel}|^2 = |\vec{E}|^2 \cos^2 \theta$$
This relationship is depicted through diagrams that illustrate the electric field vector $\vec{E}$ relative to the molecular transition dipole moment $\vec{\mu}$. The alignment of $\vec{E}$ with $\vec{\mu}$ affects the absorption, with maximum absorption when they are parallel and zero absorption when they are perpendicular. This is exemplified by the molecular orientations of adenine shown in the image.
The rates of absorption, emission, and stimulated emission can be described by the following equations:
Rates | Equetion |
---|---|
Absorption Rate | $ Rate_{abs} = B_{12} \rho(\nu) [S_1] $ |
Emission Rate | $ Rate_{emi} = -A_{21} [S_2] $ |
Stimulated Emission Rate | $ Rate_{se} = -B_{21} \rho(\nu) [S_2] $ |
At steady state, the rate of upward transitions (absorption) equals the rate of downward transitions (emission and stimulated emission):
$$ B_{12} \rho(\nu) [S_1] = A_{21} [S_2] + B_{21} \rho(\nu) [S_2] $$
A21, B12, and B21 are called Einstein coefficients.
It can be shown that:
Faster spontaneous emission at higher Frequency
In a typical UV-Vis spectroscopy (electronic transitions)
Conditions: $ A \gg B \rho(\nu) $
Einstein coefficients relationships: $ B_{12} \rho(\nu) [S_1] = A_{21} [S_2] + B_{21} \rho(\nu) [S_2] \approx A_{21} [S_2] $
Approximations: $ \frac{B_{12} \rho(\nu)}{A_{21}} \frac{[S_2]}{[S_1]} \ll 1 $
Population of states: $ [S_2] \ll [S_1] $
The population of the excited state never builds up to a significant amount.
laser requires stimulated emission rate constant, i.e. B21, to be much larger than the spontaneous emission rate constant, i.e. A21.
So, UV laser is harder to make than visible light laser
The probability $P_i$ of a system being in a state i with energy $E_i$ at temperature T is given by:
$$ P_i = \frac{e^{-\frac{E_i}{k_B T}}}{\sum_{i=0}^{M} e^{-\frac{E_i}{k_B T}}} = \frac{e^{-\frac{E_i}{k_B T}}}{Q} $$
Where:
The ratio of probabilities between two states i and j is given by:
$$ \frac{P_i}{P_j} = e^{-\frac{(E_i - E_j)}{k_B T}} = e^{-\frac{\Delta E}{k_B T}} $$
© Mauricio Alcolea Palafox |
© Anas Al-Rabadi The panel left: wave funciton, the panel right: porbability of finding a nuclei |
© wikipedia Vibrational energy of the nuclei on top of electronic energy | © oe1.com Jablonski energy diagram |
VERTICAL TRANSITION: consider the nuclei to remain in the same place during an electronic transition.
At thermal equilibrium, most molecules will be in the lowest vibrational state
The total wavefunction $\Psi(r, R)$ is a product of the electronic $\Psi_{el}(r, R)$ and nuclear $\Psi_{nuc}®$ wavefunctions:
$$ \Psi(r, R) = \Psi_{el}(r, R)\Psi_{nuc}( R) $$
electrons
refers to $\Psi_{el}(r, R)$nuclei
refers to $\Psi_{nuc}(R )$Transition from vibrational level i of the ground electronic state to the vibrational level j of the exited electronic state is given by:
$$ \vec{\mu}_ {g \rightarrow ex,j} = \left( \vec{\mu}_ {g \rightarrow ex} \right) \int \Psi_ {nuc(j)}^* \Psi_ {nuc(i)} dR $$
Electron transition dipole moment
.Nuclear overlap Factor
, also known as the Franck-Condon factor.Vibrational structure and the Franck- Condon principle: vertical transitions |
---|
Column1 | Column2 | Column3 |
---|---|---|
Vibrational Structure | ||
Overlapping Electronic Bands |
© PSIBERG Team |
to left: Bathochromic or Red shift
to right: Hypsochromic or Blue shift
The nature of the changes are not always simple to predict.
How does the solvent influence the ground and excited states?
More on the dielectric constant
Material 1: High εr, therefore higher ability to cancel out (stabilize) the original source charge
Material 2: Low εr, The ability to insulate charge or The ability to stabilize charges
The relative permittivity $\varepsilon_r$ as a function of frequency $\omega$ is given by:
$$ \varepsilon_r(\omega) = \frac{\varepsilon(\omega)}{\varepsilon_0} $$
Where:
Examples of relative permittivity for different materials:
Relative permittivity values for various solvents:
Solvent | Hexane | Ether | Ethanol | Methanol | Water |
---|---|---|---|---|---|
$\varepsilon_r$ | 2 | 4.3 | 25.8 | 31 | 81 |
Small value means it is non-polar solvent. Large value means it is a polar solvent.
© Vaishali Gupta |
Compound | λ(nm) | Intensity/ε | Transition with lowest energy |
---|---|---|---|
CH₄ | 122 | intense | σ→σ* (C-H) |
CH₃CH₃ | 130 | intense | σ→σ* (C-C) |
CH₃OH | 183 | 200 | n→σ* (C-O) |
CH₃SH | 235 | 180 | n→σ* (C-S) |
CH₃NH₂ | 210 | 800 | n→σ* (C-N) |
CH₃Cl | 173 | 200 | n→σ* (C-Cl) |
CH₃I | 258 | 380 | n→σ* (C-I) |
CH₂=CH₂ | 165 | 16000 | π→π* (C=C) |
CH₃COCH₃ | 187 | 950 | π→π* (C=O) |
CH₃COCH₃ | 273 | 14 | n→π* (C=O) |
CH₃CSCl₃ | 460 | weak | n→π* (C=S) |
CH₃N=NCCH₃ | 347 | 15 | n→π* (N=N) |
Typical π-π* transitions: the dipole gets larger in the same direction
More stabilization, energy increases; high solvent polarity results in a RED SHIFT (of the absorption peak).
In a more polar state: More stabilization, energy increases; high solvent polarity results in a RED SHIFT (of the absorption peak).
Typical n-π* transitions: the dipole of the chromophore gets smaller or shifts direction after excitation.
Less stabilization, energy increases; high solvent polarity results in a BLUE SHIFT (of the absorption peak)
© wikipedia |
The solvent effects on the absorption maxima (λmax) for the π→π* and n→π* transitions in acetone:
solvent | Static dielectric constant | λmax (nm) π→π* (Red shift) | λmax (nm) n→π* (Blue shift) |
---|---|---|---|
Hexane | 2 | 229.5 | 327 |
Ether | 4.3 | 230 | 326 |
Ethanol | 25.8 | 237 | 315 |
Methanol | 31 | 238 | 312 |
Water | 81 | 244.5 | 305 |
The transitions are characterized by their molar absorptivities (ε):
© Neera Sharma
La: large excited state dipole (lower energy state)
Lb: Smaller excited state dipole
dipole is induced in the solvent by the dipole of the chromophore (ground state and excited state)
No nuclear movement involved. Purely due to electrons
ground state dipole ( ← )
excited state (↑)
induced dipoles in solvent ( ← )
solvent | Index of refraction |
---|---|
Perfluoropentane | 1.239 |
Water | 1.333 |
Ethanol | 1.362 |
Iso-octane | 1.392 |
Chloroform | 1.446 |
Carbontetrachloride | 1.463 |
Note that water is less polarizable than iso-octane although clearly water is a much more polar solvent (larger dielectric constant)
vision is due to same pigment proteins in rod and cone cells: Same chromophore: 11-cis retinal “spectral tuning” by interaction with amino acid residues nearby |
Probability:
What is Random Variable?
$P(X=x)$ means the probability of the occurs for value x. Here $P(X=x)$ is a number, the $P(X)$ is a distribution.
Example
Event = [Cloud, Cloud, Rain]In this Weather event P(X), it the probability of:
The random variable we used in the example above is Discrete random variable, but sometimes we have to use continues random variable. For example: $X \in R$ (the set of all positive real numbers)
Because we have two types of random variable, the function for calculating the sun of all possible variables are different:
Date | X=Temperature (°C) | Y=Precipitation |
---|---|---|
January 11 | 4 | cloud |
January 12 | 1 | cloud |
January 13 | -2 | snow |
January 14 | -3 | cloud |
January 15 | -3 | clear |
January 16 | 4 | rain |
For this table, we could have joint random variables P(X=x, Y=y):
P(X=x,Y=y) | snow | rain | cloud | clear |
---|---|---|---|---|
-3 | 0 | 0 | 1/6 | 1/6 |
-2 | 1/6 | 0 | 0 | 0 |
1 | 0 | 0 | 1/6 | 0 |
4 | 0 | 1/6 | 1/6 | 0 |
P(X=x) is the probability that random variable X takes the value of the vector x. This is just a shorthand for the joint distribution of x1, x2, …, xn
When X is a random matrix |
Suppose we know the joint distribution P(X,Y). We want to find the two marginal distributions P(X):
Backing the table above, we could know that the marginal distributions of:
PS: Some place also write P(X) as PX(X) or PX(i) and P(Y) as PY(Y) or PY(j).
With the joint possibility and marginal possibility, we could now calculating the Joint and Conditional distributions, which is P(Y|X)
Exp of Joint and Conditional Distribution *P(X|Y = cloud)*
$P(X|Y=could) = \frac{P(X, Y = y)}{P(Y = cloud)}$
$=\frac{\frac{1}{6}\ \ 0\ \ \frac{1}{6}\ \ \frac{1}{6}}{\frac{1}{2}}$
So, the result is a vector = {1/3, 0, 1/3, 1/3}
According to the example, we could know that: Joint = Conditional×Marginal; which is:
$$
P(X,Y) = P(X|Y)P(Y)
$$
The expected value of a function is its weighted average, weighted by its pmf or pdf.
The covariance of two random variables is the expected product of their deviations:
$$
Covar(X,Y) = E[(X- E[X])(Y-E[Y])]
$$
Example
Suppose we have two random variables, X and Y, with the following values:
Now we multiply these deviations pairwise and sum them up:
Since we have three observations, we divide the sum by 3-1 (in the case of sample covariance) or simply by 3 (if we are dealing with a population).So, if we treat these as a population, the covariance is:
for python code:
|
2.5556-2.6665
In this case, the covariance > 0 means it is positively associated, covariance < 0 means X and Y are negative-associated. Covariance = 0 means they are not associated at all.
Covariance Matrix:
Suppose X = [X1, … , Xn] is a random vector. Its matrix of variances and covariances (a.k.a. covariance matrix) is
In other places, the covariance equation are mostly write as:
$$
Cov(X,Y) = \frac{\sum(X_i-\bar{X})(Y_j-\bar{Y})}{n}
$$
We can expected that they are the same because mostly, we expected the mean value is the expected value for a random variable.
\ tokens\ correctly\ classified}{n\ tokens\ total}$
The solution: Confusion Matrix:
Confusion Matrix =
• (m, n)th element is the number of tokens of the mth class that were labeled, by the classifier, as belonging to the nth class.
© Aniruddha Bhandari |
Confusion matrix for a binary classifier
Accuracy on which corpus?
Bayes Error Rate:
$$
Error Rate = \sum_x P(X=x)\underset{y}{min} P(Y \neq y |x=x)
$$
Confusion Matrix, Precision & Recall (Sensitivity)
$$P=P(Y =1|f(x)=1)=\frac{TP}{TP+FP}$$
$$R = TRP = P(f(X) =1|Y=1) = \frac{TP}{TP+FN}$$
$E^2 = (m_ 0 C ^2) ^2 + (pc)^ 2$
For Energy with rest mess: $E = m_0c^2$
For Energy with no rest mess: $E = pc$
Introducing Bacground:
How to infer? What indirect evidence?
$$
-\frac{\hbar}{2m} \frac{d^2 \Psi (x)}{dx^ 2} + V(x)\Psi(x) = E\Psi(x)
$$
We will end up with a series of wavefunctions with associated energies:
$$\Psi(x) \leftrightarrow E_ n $$
The Schrödinger Equation is a fundamental equation in quantum mechanics that describes how the quantum state of a physical system changes over time. It was formulated by Erwin Schrödinger in 1925. There are two forms of the Schrödinger Equation: the time-dependent and the time-independent forms.
$$
P(x_0, t_0)dx = \Psi^ * (x_0, t_0)\Psi(x_0, t_0)dx = |\Psi (x_0, t_0)|^2 dx
$$
P(x0, t0): Probability* of finding the particle (e.g. electron) within an interval
of dx of position x0 at time t0
$$
\int_{-\infty}^{\infty} \Psi^* (x, t) \Psi (x, t)dx = 1
$$
Average value of f(x) over all space
$$
\left \langle f(x) \right \rangle = \frac{ \int_{-\infty}^{\infty} \Psi^* (x) \Psi (x)f(x)dx }{\int_{-\infty}^{\infty} \Psi^* (x) \Psi (x)dx}
$$
Here Particle = electrons in a molecule
Layman language: Potential energy outside the box is infinite.
Mathematical language: Boundary condition
V(x) = 0 for L > x > 0
V(x) = ∞ for x ≥ L, x ≤ 0
$$
\frac{d^ 2 \Psi(x)}{dx^ 2} = \frac{2m} {\hbar^ 2} [V(x)-E]\Psi(x)
$$
Since V(x) is infinite outside the box, Ψ(x) must be zero outside the box
(otherwise $\frac{d^2 Ψ(x)}{dx^2}$ would be infinite: not allowed)
Since Ψ(x) must be continous, Ψ(x) inside the box must connect smoothly to Ψ(x) outside the box
Hence, $Ψ(0)= Ψ(L) = 0$
$$ \Psi_ n (x) = \sqrt{\frac{2}{L}} sin(\frac{n\pi x}{L}) $$
With this theory, we could measure the energy of the system using light
The electrons and its orbitals
4 pi electrons => 2 orbitals8 pi electrons => 4 orbitals
$$
\Delta E = \frac{(n_ 3 ^ 2 - n_ 2^ 2) h^ 2 }{8mL^ 2}
$$
For this compound, There are 4 pi electrons. Two each in the n=1 and n=2 orbitals. (This is due to electron spin, which we will see later).
The absorption is due to promoting an electron from the n=2 to the n=3 orbital.
TUNNELING effect in Quantum Mechanics
a particle can go "through" an energy barrier instead of needing to have sufficient energy to go over the barrierWhen the wave go through the energy barrier, the exponential decay occurred inside of the energy barrier.
If we know the structure of the molecule, can we predict the energy and optical property of the molecule?
Take the hydrogen atom as example.
Here we only need 3 quantum number: n, l, and ml
As you can see, the energy only depends on n:
$E_ n = - \frac{m_ e e^ 4}{8 \epsilon_ 0^ 2 h^2 n^ 2} = - \frac{2.179 × 10 ^ {-18}}{n^ 2}Joule = -\frac{13.6}{n^ 2}eV\ \ \ n = 1, 2, 3… $
© Atkins, The Elements of Physical Chemistry |
Calculate the energy of one photon of green light (532 nm)
$ E = h\nu = \frac{hc}{\lambda} = \frac{hc}{532 × 10^ {-9}m} = 3.823×10^ {-19}J$
Convert J to Hz
$ \nu = \frac{E}{h} = \frac{3.823×10^ {-19}J}{h} = 5.8 × 10^ {14} Hz$
Convert kJ/mol and kcal/mol
$E_ {total} = N_ A × E = 6.02 × 10^ {23} \frac{1}{mol} × 3.823 × 10^ {-19} J$
$= 230226 J/mol ~ 230 kJ/mol $
$= 55 kcal/mol$
Time-Dependent Schrödinger Equation:
$$
i\hbar\frac{\partial}{\partial t}\Psi(\mathbf{r}, t) = \hat{H}\Psi(\mathbf{r}, t)
$$
Here, $\Psi(\mathbf{r}, t)$ is the wave function of the system, $i$ is the imaginary unit, $\hbar$ is the reduced Planck constant, $t$ represents time, $\mathbf{r}$ is the position vector, and $\hat{H}$ is the Hamiltonian operator which represents the total energy of the system.
Time-Independent Schrödinger Equation:
$$
\hat{H}\psi(\mathbf{r}) = E\psi(\mathbf{r})
$$
In this form, $\psi(\mathbf{r})$ is the time-independent wave function, $E$ represents the energy of the system, and other symbols have the same meaning as in the time-dependent equation.
The Schrödinger Equation is a cornerstone of quantum mechanics, providing a mathematical framework for understanding and predicting the behavior of quantum systems. It’s important to note that these equations are usually accompanied by specific boundary conditions or potentials, depending on the physical situation being modeled.
]]>Gustav Kirchhoff
The blackbody is an idealized physical body that absorbs all incident electromagnetic radiation (such as light), regardless of frequency or angle of incidence.
A black body can also emit black-body radiation, which is solely determined by its temperature.
Notic: the “Cold” object also emit “light” as long as its above the absolute zero (-273.15 °C). And it would appear “blakc” because the peak wavelength is in infrared range, which human eye cannot recognize.
According to this theory, we could astimate the temperature of stars based on its color.
webbtelescope | astronomy.com |
$ \lambda _{max} = \frac{W}{T} $
Exp:
When λmax = 500nm:
$T = \frac{W}{\lambda_ {max}} = \frac{2.9 e^ {-3} mK}{500e^ {-9}m } = 5800K$
Conflict between observation and expectation
Reason: The classic oscillating field from electromagnetic radiation drives the motion of a spring
- faster oscillation → higher frequency → higher energy
For fix this conflict, a propportional model was given: E = nhν (Max Planck)
$$
\rho (\nu, T) = \frac{8\pi\nu^2 h \nu}{ c^3 } \frac{1}{e^{\frac{hv}{k_ BT}} - 1 }
$$
Explained:
- energy of photon (with frequent ν): hν
- weight if photon population (ν): g(ν) = $\frac{8\pi \nu ^2 }{c^ 3}$
- averagy number of photon (ν): n(#, T) = $\frac{1}{exp(\frac{h\nu}{k_BT})-1}$
- ρ (νT)dν = hν × g(ν) × n(#, T)
= $h\nu × \frac{8\pi \nu ^2 }{c^ 3} × \frac{1}{exp(\frac{h\nu}{k_BT})-1}$
= $\frac{8\pi h \nu^3}{ c^3 } \frac{1}{e^{\frac{hv}{k_ BT}} - 1 }$
In Planck Low, it show either high temperature or low frequency would case the “Ultralviolet catastrophe”:
|
So, both Rayleigh-Jean law and Planck law agree with the observation very well in very low frequency.
But only Plank law could fit the decreasing of the intensity in high frequency situation.
$$
E = nhν
$$
By using the light to eject electrons from a copper, they found the kinetic energy of ejected electrons depends on light frequency
$$
E_{light} = \beta\nu_{light}
$$
Conservation of energy
…?unknow |
$$
E_{elec} + \Phi= \beta \nu
$$
Einstein concluded that light must be behaving like a particle in this experiment: PHOTON
E = hν
from Planck’s equation to Einstein’s equestion:
$v = h\nu$
$mv = P = m\lambda\nu$
$v = \frac{P}{m\lambda}$
$E = h\nu = \frac(hP){m\lambda} → E = \frac{p^2 }{m}$
De Broglie Equation:
$\lambda = \frac{h}{p} = \frac{h}{m\nu}$
$g(\nu) = \frac{N(E)}{V} = \frac{Number of States (E)}{Volumne}$
]]>In the equation you’ve provided:
$$
E(x,t) = E^0 \sin\left[2\pi \left(\frac{x}{\lambda} - \frac{t}{T}\right)\right]
$$
This represents a sinusoidal wave function, where $E(x,t)$ is the electric field of the light wave at a position $x$ and at time $t$. Here’s what the terms mean:
The reason both $\lambda$ and $T$ are present in the equation is because they describe different aspects of the wave:
The term $\frac{x}{\lambda} - \frac{t}{T}$ is the phase of the wave, which determines the position of the peaks and troughs of the wave at any given time $t$ and position $x$. It’s not meant to equal zero; instead, it changes with time and position to represent the propagation of the wave through space and time.
The phase changes as time goes by, indicating that the peaks and troughs of the wave are moving. If $\frac{x}{\lambda} - \frac{t}{T}$ were always zero, it would imply a stationary wave, not a propagating one.
The product $2\pi$ times the phase gives you the argument of the sine function in radians, which is necessary because the sine function is periodic with a period of $2\pi$. This means that the wave repeats itself every $2\pi$ radians, which corresponds to one wavelength in space and one period in time.
|
Let’s set the $\lambda$ as 1, T as 10, and x = 0. Then, the equation could be simplified as $E(0, t) = sin[2\pi(0 - \frac{t}{10})]$.
And the change of the E0 with T could be show as the animation below.
The E0 corresponded with the t | Static View by using x axis as t |
---|---|
If we observe more positions, let’s say, 0 to 10, and using the full function, we could get an animation like below. This is the propagating wave we could observe in 10 different locations.
© psu |
When t increases, x increases => “+” direction on x;
When t increases, x decreases => “-” direction on x.
$velocity = \frac{\lambda}{T} = \lambda \nu$
So, the light of the wave is:
Energy of a wave is proportional to the square of the amplitude (in classical mechanics)
$\rho(\nu)$: energy per unit volume
$$
\rho(\nu) = constant\ x (E^ o)^ 2
$$
Wave Vector ($k$): The wave vector is defined as $\frac{2\pi}{\lambda}$, where $\lambda$ is the wavelength of the wave. The wave vector points in the direction of the wave’s propagation and has a magnitude equal to the number of wave cycles per unit distance. The notation $\hat{k}$ represents a unit vector in the direction of $k$, so the wave vector $k$ is sometimes written as $\frac{2\pi}{\lambda} \hat{k}$, emphasizing its direction.
Angular Frequency ($\omega$): This is defined as $2\pi\nu$, where $\nu$ is the frequency of the wave. It represents how many radians the wave cycles through per unit time.
Phase ($\phi$): The phase is a term that allows us to specify where in its cycle the wave is at $t = 0$ and $x = 0$. It lets us define the “zero point” or starting point of the wave at a place other than the origin of our coordinate system.
$$E(x,t) = E^0 \sin(kx - \omega t + \phi)$$
$$H(x,t) = H^0 \sin(kx - \omega t + \phi)$$
Why Standard Form?
The first function form expresses the wave in terms of its wavelength λ and period T, which are perhaps more intuitive when you're first learning about waves. It makes it very clear that the wave repeats itself every wavelength λ in space and every period T in time.
The second function is the standard form. It's particularly useful in more advanced topics like wave interference, diffraction, and quantum mechanics, where the concept of phase space and the relationship between position and momentum (or wavelength and frequency) are crucial.
For example, when you want to mimic the interference of the wave, the previous function could became extreamly complcated because the only difference between two wave is phase.
In an electromagnetic wave, the electric and magnetic fields are perpendicular to each other and to the direction of wave propagation. The equations show that both fields oscillate in sync (they have the same phase $\phi$) but are described by separate equations since they are perpendicular components.
The term $kx - \omega t$ indicates that the wave is moving in the positive $x$-direction. If the wave were moving in the negative $x$-direction, the sign in front of $\omega t$ would be positive.
The factor $\sin(kx - \omega t + \phi)$ varies between (-1) and (1), causing the electric and magnetic field strengths to oscillate between $-E^0$ to $E^0$ and $-H^0$ to $H^0$, respectively. The wave thus carries energy and, if it is light, can be observed as it interacts with matter.
Waves spread as if each region of space is behaving as a source of new waves of the same frequency and phase.
So, Huygen’s principle applied to light showing a wave front.
When we konw d, D, X, and θ we could calcualate the λ (wave length):
Two wave interference (center: y1 = -15, y2 = 15) | Three wave interference (y1 = 10, y2 = 0, y3 = -10) |
© gsu |
stlawu.edu |
$$
n\lambda = d sin\theta_n \approx d tan\theta_n = d \frac{x_ n }{D}
$$
|
At the beginning of the story, let’s say there are two same waves with different $\phi$ which: $\phi_2 - \phi_1 = \pi$
In this case, the Intensity of this two wave is:
(Destructive interference)
$ E(x,t) = E^o sin(k_1 x - \omega _1 t) +E^o sin(k_2 x - \omega _2 t) $
$ = 2 E^o cos [\frac{[k_1 - k_2]}{2}x - \frac{\omega _1 - \omega _2}{2}t] sin[\frac{[k_1 - k_2]}{2}x - \frac{\omega _1 - \omega _2}{2}t] $
$$ 𝑛_1 \cdot sin 𝜃_1 = 𝑛_2 \cdot sin 𝜃_2 $$
Becuase we konw:
So, when we move to the 2D wave, we could have function:
In order to calcualte the 2D wave from the different emission location, we need to introduce the initial point b=(bx, by)
|
Vim is a classic text editor known for efficiency, while NeoVim is its modernized fork with improvements like better plugin support. LunarVim, built on NeoVim, offers a pre-configured setup, making it easier for users to get a powerful, feature-rich environment without the hassle of individual configurations. Ideal for those new to Vim/NeoVim or seeking a ready-to-use development setup, LunarVim combines ease of setup with customizability. It’s particularly appealing for its integrated toolset, active community support, and a balance between functionality and performance, making it a great choice for a streamlined coding experience.
Plug: Nvim-R
Video Tutorial: Rohit Farmer
Instruction following the video: rohitfarmer
Final work:
© Karobben |
Please install the latest NeoVim by following the Neovim document and LunarVim Document
|
Save the lines below in the ~/.config/nvim/init.vim
file to install the pluges.
" Specify a directory for plugins" - Avoid using standard Vim directory names like 'plugin'call plug#begin('~/.vim/plugged')" List of plugins." Make sure you use single quotes" Shorthand notationPlug 'jalvesaq/Nvim-R', { 'branch' : 'stable' }Plug 'ncm2/ncm2'Plug 'roxma/nvim-yarp'Plug 'gaalcaras/ncm-R'Plug 'preservim/nerdtree'Plug 'Raimondi/delimitMate'Plug 'patstockwell/vim-monokai-tasty'Plug 'itchyny/lightline.vim'" Initialize plugin systemcall plug#end()
How to instsall the plunges
After stored the change, you need to open it again by using nvim ~/.config/nvim/init.vim
. And then, under the command model (which is triggered by :
), input PlugInstall
(PlugUpdate
if you want to update them). After you see the picture below which means you installed it successfully:
By following the instruction from rohitfarmer’s post, we could add more things at the end of the init.vim
file:
folding behavior
" Set foldbehaviorset tabstop=2 " Number of spaces that a
in the file counts for
set shiftwidth=2 " Number of spaces to use for each step of (auto)indent
set softtabstop=2 " Number of spaces that acounts for while performing editing operations
set expandtab " Use spaces instead of tabsset foldmethod=indent
set foldlevelstart=2 " Start folding at an indent level greater than 2
For quick unfold all codes:
:set nofoldenable
" Set a Local Leader" With a map leader it's possible to do extra key combinations" likew saves the current filelet mapleader = ","let g:mapleader = ","" Plugin Related Settings" NCM2autocmd BufEnter * call ncm2#enable_for_buffer() " To enable ncm2 for all buffers.set completeopt=noinsert,menuone,noselect " :help Ncm2PopupOpen for more " information." NERD Treemap nn :NERDTreeToggle " Toggle NERD tree." Monokai-tastylet g:vim_monokai_tasty_italic = 1 " Allow italics.colorscheme vim-monokai-tasty " Enable monokai theme." LightLine.vim set laststatus=2 " To tell Vim we want to see the statusline.let g:lightline = { \ 'colorscheme':'monokai_tasty', \ }" General NVIM/VIM Settings" Mouse Integrationset mouse=i " Enable mouse support in insert mode." Tabs & Navigationmap nt :tabnew " To create a new tab.map to :tabonly " To close all other tabs (show only the current tab).map tc :tabclose " To close the current tab.map tm :tabmove " To move the current tab to next position.map tn :tabn " To swtich to next tab.map tp :tabp " To switch to previous tab." Line Numbers & Indentationset backspace=indent,eol,start " To make backscape work in all conditions.set ma " To set mark a at current cursor location.set number " To switch the line numbers on.set expandtab " To enter spaces when tab is pressed.set smarttab " To use smart tabs.set autoindent " To copy indentation from current line " when starting a new line.set si " To switch on smart indentation." Searchset ignorecase " To ignore case when searching.set smartcase " When searching try to be smart about cases.set hlsearch " To highlight search results.set incsearch " To make search act like search in modern browsers.set magic " For regular expressions turn magic on." Bracketsset showmatch " To show matching brackets when text indicator " is over them.set mat=2 " How many tenths of a second to blink " when matching brackets." Errorsset noerrorbells " No annoying sound on errors." Color & Fontssyntax enable " Enable syntax highlighting.set encoding=utf8 " Set utf8 as standard encoding and " en_US as the standard language." Enable 256 colors palette in Gnome Terminal.if $COLORTERM == 'gnome-terminal' set t_Co=256endiftry colorscheme desertcatchendtry" Files & Backupset nobackup " Turn off backup.set nowb " Don't backup before overwriting a file.set noswapfile " Don't create a swap file.set ffs=unix,dos,mac " Use Unix as the standard file type." Return to last edit position when opening filesau BufReadPost * if line("'\"") > 1 && line("'\"") <= line("$") | exe "normal! g'\"" | endif
Ctrl + W + HJKL " Remove the cursor from window to window,nt " Open a new tab,tn " Move to the next tab,tp " Back to the previous tab# code fold behaviorzc - Close (fold) the current fold under the cursor.zo - Open (unfold) the current fold under the cursor.za - Toggle between closing and opening the fold under the cursor.zR - Open all folds in the current buffer.zM - Close all folds in the current buffer.# Nvim-RCtrl + x + o " Access the help information (auto fill)\rf " Connect to R console.\rq " Quit R console.\ro " Open object bowser.\d " Execute current line of code and move to the next line.\ss " Execute a block of selected code.\aa " Execute the entire script. This is equivalent to source().\xx " Toggle comment in an R script.# NERDTree,nn " Toggle NERDTree.
You could also included them into the vim.init
file
|
After installed the Nvim-R (them master branch), you’ll have the error code below whenever you open nvim. Just ignore it and it would be fine.
Error detected while processing function ROnJobStdout[40]..UpdateSynRhlist[11]..FunHiOtherBf:line 10:E117: Unknown function: nvim_set_option_valuePress ENTER or type command to continue
|
|
For editing the word, we need to switch the model of read, visual, and editing. Press i
enable the editing mode. Type Esc
or Ctrl + c
exist the editing mode and back to the reading mode. v
enable selection model so you could select words.
|
|
|
© Adrian Y. S. Lee |
Immunoglobulins, commonly known as antibodies, are crucial proteins in the immune system that recognize and bind to specific antigens, such as bacteria and viruses, to help protect the body. Their structure is both unique and complex, consisting of several key components:
Basic Structure: Immunoglobulins are Y-shaped molecules made up of four polypeptide chains - two identical heavy (H) chains and two identical light (L) chains. These chains are held together by disulfide bonds.
Variable (V) and Constant © Regions:
Isotypes: Mammals have several classes of immunoglobulins (IgG, IgA, IgM, IgE, IgD), each with different roles in the immune response. These isotypes differ mainly in their heavy chain constant regions.
Glycosylation: Many antibodies are glycosylated, meaning they have carbohydrate groups attached. This glycosylation can affect the antibody’s stability, distribution, and activity.
Light Chain Types: There are two types of light chains in antibodies - kappa (κ) and lambda (λ). An individual antibody will have two identical light chains of one type.
© David B. Roth |
V(D)J recombination is a mechanism in the immune system that generates the immense diversity of antibodies (immunoglobulins) and T cell receptors necessary for the adaptive immune response. This process is named for the three gene segments involved in the recombination: Variable (V), Diversity (D), and Joining (J). So, V(D)J is the recombination unit.
We take 5wl2
as example:
©PDB 5wl2 |
© pipebio |
How to make it in python:
|
The convert
command in Linux is a part of the ImageMagick suite, a powerful toolset for image manipulation. This command allows you to convert between different image formats, resize images, change image quality, and perform a wide variety of other image transformations.
Here are a few basic examples of what you can do with the convert
command:
|
|
|
|
|
|
These examples are just the tip of the iceberg in terms of what ImageMagick’s convert
command can do. It’s a very powerful tool with a wide array of options and capabilities. For more detailed information, you can check the manual page (man convert
) or the official ImageMagick documentation.
Certainly! Here are examples demonstrating various capabilities of ImageMagick:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These commands showcase the versatility of ImageMagick. Remember to adjust the file names and parameters according to your specific needs. The ImageMagick documentation provides more detailed information and examples for these and other features.
]]>IMGT® provides a range of databases, tools, and web resources focused on the immune system, particularly on the genetic and structural aspects of immunoglobulins (IG), T cell receptors (TCR), major histocompatibility complex (MHC) of all vertebrate species, and related proteins of the immune system (RPI) of any species. These resources are crucial for research in various fields, including immunology, genetics, bioinformatics, drug design, and personalized medicine.
Key features and offerings of IMGT® include:
Databases: IMGT® offers several databases containing detailed information on IG, TCR, and MHC sequences and structures, along with RPI. These databases are meticulously curated and regularly updated.
Analysis Tools: IMGT® provides tools for sequence analysis, gene identification, and 3D structure determination. IMGT/V-QUEST, for instance, is a widely used tool for the analysis of IG and TCR sequences.
Standardized Nomenclature: IMGT® has established a standardized nomenclature for the description of IG and TCR genetic components, which is essential for consistent communication and research in the field.
Educational Resources: The system also offers educational resources for those new to the field of immunogenetics, including tutorials, glossaries, and comprehensive descriptions of the molecular components of the immune system.
Research and Clinical Applications: The information and tools provided by IMGT® are invaluable for various applications, including research in immunology, genetics, and autoimmunity, as well as in clinical settings for antibody engineering, diagnosis, and understanding of immune disorders.
Category | Name | Description/Focus Area |
---|---|---|
Databases | IMGT/GENE-DB | Database for immunoglobulin (IG) and T cell receptor (TCR) genes of all vertebrate species. |
IMGT/3Dstructure-DB | Database for 3D structures of IG, TCR, MHC, and RPI (related proteins of the immune system). | |
IMGT/LIGM-DB | A comprehensive database of IG and TCR nucleotide sequences from various species. | |
IMGT/PRIMER-DB | Database of primers and probes for IG and TCR gene sequences. | |
IMGT/PROTEIN-DB | Database for IG, TCR, MHC, and RPI protein sequences and structures. | |
Analysis Tools | IMGT/V-QUEST | Tool for the analysis of IG and TCR nucleotide sequences. Identifies V, D, and J gene segments and alleles. |
IMGT/JunctionAnalysis | Tool focused on detailed analysis of the V-J and V-D-J junctions of IG and TCR sequences. | |
IMGT/HighV-QUEST | High-throughput version of IMGT/V-QUEST for next-generation sequencing (NGS) data. | |
IMGT/DomainGapAlign | Tool for the analysis of IG, TCR, and RPI domain sequences and comparison with IMGT reference directory. | |
IMGT/Collier-de-Perles | Tool for two-dimensional (2D) graphical representation of IG and TCR variable domains. | |
Resources | IMGT Education | Educational resources, including tutorials, glossaries, and comprehensive descriptions of immunogenetics. |
IMGT Scientific chart | Standardized nomenclature and classification for IG, TCR, and MHC of humans and other vertebrates. | |
IMGT Repertoire | Compilation of allelic polymorphisms and protein displays for variable (V), diversity (D), and joining (J) genes. | |
Other Services | IMGT/Therapeutic | Information on therapeutic antibodies and fusion proteins for immune applications. |
IMGT/mAb-DB | Specific database for monoclonal antibodies (mAbs). |
IMGT/V-QUEST is a specialized database and analysis tool that is part of the broader IMGT®, the international ImMunoGeneTics information system®. This system is a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TCR), major histocompatibility complex (MHC) of all vertebrate species, and related proteins of the immune system (RPI) of any species.
IMGT/V-QUEST specifically provides detailed analysis of nucleotide sequences for immunoglobulins (IG) and T cell receptors (TCR). It is widely used in immunology and related research fields for tasks such as:
Sequence Analysis: It allows for the identification and delimitation of V, D, and J genes and alleles in the input sequences. This is crucial for understanding the genetic basis of the immune response.
Clonotype Characterization: Researchers use it to characterize the clonotypes (unique T cell or B cell receptor sequences) in an individual’s immune repertoire, which is important in studies of immune system diversity and response.
Somatic Hypermutation Studies: It helps in the analysis of somatic hypermutations, which are critical for understanding adaptive immunity and processes like affinity maturation.
Comparative Immunology: By providing a comprehensive and curated database of IG and TCR sequences across different species, it aids in comparative immunology studies.
As of my last update in April 2023, IMGT/V-QUEST continues to be a valuable resource for immunologists, molecular biologists, and other researchers studying the adaptive immune system. The database is regularly updated to include new findings and sequences, ensuring its relevance and usefulness in the field.
Something You May Want to Know
c16 > g, Q6 > E (++−) means that the nt mutation (c > g) leads to an AA change at codon 6 with the same hydropathy (+) and volume (+) but with different physicochemical properties (−) classes ( 12 )[^Pommié_04].
IMGT/Ontology is a key component of IMGT®, the international ImMunoGeneTics information system®. It represents the first ontology for immunogenetics and immunoinformatics and is a foundational aspect of the IMGT® information system. Developed by Marie-Paule Lefranc and her team, IMGT/Ontology provides a standardized vocabulary and a set of concepts that are essential for the consistent annotation, description, and comparison of the immune system’s components across different species. So, IMGT/Ontology is more of a set of criteria or a framework rather than a specific tool or software.
Key features and aspects of IMGT/Ontology include:
IMGT-ONTOLOGY axioms and concepts
Seven IMGT-ONTOLOGY axioms have been defined: ‘IDENTIFICATION’ [1], ‘DESCRIPTION’ [2], ‘CLASSIFICATION’ [3], ‘NUMEROTATION’ [4][5], ‘LOCALIZATION’, ‘ORIENTATION’, and ‘OBTENTION’. They constitute the Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope [6].
© IMGT Education |
IMGT/Ontology is a critical resource for researchers and professionals in immunology, genetics, bioinformatics, and related fields. It ensures that data and analyses are consistent, reproducible, and interoperable, which is vital in advancing our understanding of the immune system and in developing immunotherapy and other medical applications.
The oldest paper about ontology:
Lefranc, M.-P. From IMGT-ONTOLOGY IDENTIFICATION Axiom to IMGT Standardized Keywords: For Immunoglobulins (IG), T Cell Receptors (TR), and Conventional Genes. Cold Spring Harb Protoc., 1;2011(6): 604-613. pii: pdb.ip82. doi: 10.1101/pdb.ip82(2011) PMID:21632792. ↩︎
Lefranc, M.-P. From IMGT-ONTOLOGY DESCRIPTION Axiom to IMGT Standardized Labels: For Immunoglobulin (IG) and T Cell Receptor (TR) Sequences and Structures. Cold Spring Harb Protoc., 1;2011(6): 614-626. pii: pdb.ip83. doi: 10.1101/pdb.ip83 (2011) PMID:21632791. ↩︎
Lefranc, M.-P. From IMGT-ONTOLOGY CLASSIFICATION Axiom to IMGT Standardized Gene and Allele Nomenclature: For Immunoglobulins (IG) and T Cell Receptors (TR). Cold Spring Harb Protoc., 1;2011(6): 627-632. pii: pdb.ip84. doi: 10.1101/pdb.ip84 (2011) PMID:21632790. ↩︎
Lefranc, M.-P. “IMGT Collier de Perles for the Variable (V), Constant ©, and Groove (G) Domains of IG, TR, MH, IgSF, and MhSF” Cold Spring Harb Protoc. 2011 Jun 1;2011(6). pii: pdb.ip86. doi: 10.1101/pdb.ip86. PMID: 21632788 ↩︎
Lefranc, M.-P. IMGT Unique Numbering for the Variable (V), Constant ©, and Groove (G) Domains of IG, TR, MH, IgSF, and MhSF. Cold Spring Harb Protoc., 1;2011(6). pii: pdb.ip85. doi: 10.1101/pdb.ip85 (2011) PMID: 21632789 ↩︎
Duroux P et al., IMGT-Kaleidoscope, the Formal IMGT-ONTOLOGY paradigm. Biochimie, 90:570-83. Epub 2007 Sep 11 (2008) PMID:17949886 ↩︎
Giudicelli V, Lefranc M P. Ontology for immunogenetics: the IMGT-ONTOLOGY[J]. Bioinformatics, 1999, 15(12): 1047-1054. ↩︎
Duroux P, Kaas Q, Brochet X, et al. IMGT-Kaleidoscope, the formal IMGT-ONTOLOGY paradigm[J]. Biochimie, 2008, 90(4): 570-583. ↩︎
Identification of V(D)J segments: IgBLAST can identify variable (V), diversity (D), and joining (J) gene segments in IG or TCR sequences.
Clonotype Analysis: It helps in determining clonotypes based on V(D)J segment usage, providing insights into the diversity and clonality of IG or TCR repertoires.
Somatic Hypermutation Analysis: It identifies somatic hypermutations in IG sequences and can compare these to germline sequences, which is critical in understanding adaptive immune responses.
Flexible Input Options: IgBLAST can process both nucleotide and protein sequences, and it supports various input formats.
Detailed Alignment Information: It provides detailed alignment results that include information about gene segments, framework regions, complementarity-determining regions (CDRs), and mutations.
Integration with Other Databases: The results can be linked to other NCBI databases for additional information and analysis.
IgBLAST is widely used in immunology and related fields for studying B cell and T cell receptor repertoire, which is crucial for understanding immune responses, vaccine development, and in the study of autoimmune diseases and cancer.
Here is an example of set up by using conda from nicwulab/SARS-CoV-2_Abs
|
In the pyir
, it is using ‘http’ and download the data failed. By following the error code, we could find the script and alter the ‘http’ to ‘https’. It should solving the problem.
An example of running the program:
prepare the DataBase
run the blast
|
-germline_db_VGermline database name-organism The organism for your query sequence. Supported organisms include human, mouse, rat, rabbit and rhesus_monkey for Ig and human and mouse for TCR. Custom organism is also supported but you need to supply your own germline annotations (see IgBLAST web site for details) Default = `human'
If you installed pyir, we could use the pyir to do the igblast with less parameters.
|
Key parameters:
--sequence_type {nucl,prot} default: nucl-m MULTI, --multi MULTI Number of threads-o, --out default: inputfile.json.gz--outfmt {tsv,lsjson,dict,json} suggest: tsv--igdata IGDATA Path to your IGDATA directory.-r, --receptor {Ig,TCR} The receptor you are analyzing, immunoglobulin or t cell receptor-s, --species {human,mouse...} The Species you are analyzing {human,mouse,rabbit,rat,rhesus_monkey}
Can I annotate the light chain and heavy chain simultaneously?
IgBLAST is designed to analyze immunoglobulin (IG) sequences, including both heavy and light chains. However, it typically processes and analyzes these chains separately. When you input a sequence that contains both heavy and light chains, IgBLAST might only process the first recognizable sequence, which in your case appears to be the heavy chain.
To analyze both heavy and light chains using IgBLAST, you generally need to input them as separate sequences. This means splitting your combined sequence into two parts - one for the heavy chain and the other for the light chain - and then running IgBLAST for each part individually.
There isn't a parameter in IgBLAST that allows for the simultaneous analysis of both heavy and light chains when they are combined into a single sequence. The tool's algorithm is designed to identify and annotate the V(D)J segments of a single chain at a time, as the structure and sequence features of heavy and light chains are distinct.
If you are consistently working with sequences that contain both chains, you may need to develop a preprocessing step in your workflow to separate these chains before analysis. Alternatively, if such a tool is essential for your work, you might need to look into other bioinformatics tools or custom scripting to first identify and separate the heavy and light chain sequences before feeding them into IgBLAST.
V(D)J Library Construction |
V (Variable): These are gene segments that code for the variable region of an antibody or T-cell receptor. The variable region is responsible for binding to antigens.
D (Diversity): These segments are found in some classes of antibodies and in T-cell receptors. They provide an additional level of diversity to the antigen-binding region.
J (Joining): These gene segments join with the V (and D, where present) segments to complete the variable region of the receptor.
C (Constant): The constant region of the antibody or T-cell receptor is encoded by these segments. This region does not vary much between different antibodies and is responsible for the effector functions of the antibody, such as recruiting other parts of the immune system.
© 10X Genomics |
Click the Link and fill out the information and you could get the download page
|
Reference
|
Documentation: 10X Genomics
|
cellranger vdj: This is the main command being run. `cellranger` is the software package, and `vdj` specifies that you are running the V(D)J analysis pipeline, which is used for assembling and annotating V(D)J sequences from single-cell RNA-Seq data.--id=sample345: This sets the unique identifier for the run. Here, the identifier is `sample345`. This ID is used to name the output directory.--reference=...: This specifies the reference dataset to be used for the analysis. The provided path (`/opt/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.1.0`) points to a reference dataset for human V(D)J sequences.--fastqs=...: This indicates the directory where the FASTQ files are located. FASTQ files are the input files for the Cell Ranger software, containing the sequenced reads.--sample=mysample: This specifies the name of the sample to be analyzed. It should match the sample name in the FASTQ files.--localcores=8: This parameter tells Cell Ranger to use 8 CPU cores for the computation. This setting helps to optimize the use of available computational resources.--localmem=64: This allocates 64 GB of memory (RAM) for the run. This parameter is crucial for ensuring the software has enough memory to process the data without crashing.
A successful cellranger vdj
run should conclude with a message similar to this:
Outputs:- Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html- Run summary CSV: /home/jdoe/runs/sample345/outs/metrics_summary.csv- Clonotype info: /home/jdoe/runs/sample345/outs/clonotypes.csv- Filtered contig sequences FASTA: /home/jdoe/runs/sample345/outs/filtered_contig.fasta- Filtered contig sequences FASTQ: /home/jdoe/runs/sample345/outs/filtered_contig.fastq- Filtered contigs (CSV): /home/jdoe/runs/sample345/outs/filtered_contig_annotations.csv- All-contig FASTA: /home/jdoe/runs/sample345/outs/all_contig.fasta- All-contig FASTA index: /home/jdoe/runs/sample345/outs/all_contig.fasta.fai- All-contig FASTQ: /home/jdoe/runs/sample345/outs/all_contig.fastq- Read-contig alignments: /home/jdoe/runs/sample345/outs/all_contig.bam- Read-contig alignment index: /home/jdoe/runs/sample345/outs/all_contig.bam.bai- All contig annotations (JSON): /home/jdoe/runs/sample345/outs/all_contig_annotations.json- All contig annotations (BED): /home/jdoe/runs/sample345/outs/all_contig_annotations.bed- All contig annotations (CSV): /home/jdoe/runs/sample345/outs/all_contig_annotations.csv- Barcodes that are declared to be targetted cells: /home/jdoe/runs/sample345/outs/cell_barcodes.json- Clonotype consensus FASTA: /home/jdoe/runs/sample345/outs/consensus.fasta- Clonotype consensus FASTA index: /home/jdoe/runs/sample345/outs/consensus.fasta.fai- Contig-consensus alignments: /home/jdoe/runs/sample345/outs/consensus.bam- Contig-consensus alignment index: /home/jdoe/runs/sample345/outs/consensus.bam.bai- Clonotype consensus annotations (CSV): /home/jdoe/runs/sample345/outs/consensus_annotations.csv- Concatenated reference sequences: /home/jdoe/runs/sample345/outs/concat_ref.fasta- Concatenated reference index: /home/jdoe/runs/sample345/outs/concat_ref.fasta.fai- Contig-reference alignments: /home/jdoe/runs/sample345/outs/concat_ref.bam- Contig-reference alignment index: /home/jdoe/runs/sample345/outs/concat_ref.bam.bai- Loupe V(D)J Browser file: /home/jdoe/runs/sample345/outs/vloupe.vloupe- V(D)J reference:fasta:regions: /home/jdoe/runs/sample345/outs/vdj_reference/fasta/regions.fadonor_regions: /home/jdoe/runs/sample345/outs/vdj_reference/fasta/donor_regions.fareference: /home/jdoe/runs/sample345/outs/vdj_reference/reference.json- AIRR Rearrangement TSV: /home/jdoe/runs/sample345/outs/airr_rearrangement.tsv- All contig info (ProtoBuf format): /home/jdoe/runs/sample345/outs/vdj_contig_info.pbWaiting 6 seconds for UI to do final refresh.Pipestance completed successfully!
Once cellranger vdj
has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .vloupe
file in Loupe V(D)J Browser, or refer to the Understanding Output section to explore the data by hand.
[error] Pipestance failed. Error log at:MockC_cs/SC_VDJ_ASSEMBLER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/VDJ_CHEMISTRY_DETECTOR/DETECT_VDJ_RECEPTOR/fork0/chnk0-u22ea849f77/_errorsLog message:V(D)J Chain detection failed for Sample VDJ-B-293-redo-1 in "/raid/home/wenkanl2/MokC_sc/1_primary_seq".Total Reads = 1000000Reads mapped to TR = 30Reads mapped to IG = 28665In order to distinguish between the TR and the IG chain the following conditions need to be satisfied:- A minimum of 10000 total reads- A minimum of 5.0% of the total reads needs to map to TR or IG- The number of reads mapped to TR should be at least 3.0x compared to the number of reads mapped to IG or vice versaPlease check the input data and/or specify the chain via the --chain argument.
The problem here is with the proportion of reads mapping to TR and IG. Even though you have a significant number of reads mapped to IG, the number of reads mapped to TR is too low to meet the required thresholds.
Resolution:
The message suggests checking the input data or specifying the chain via the --chain argument. Explicitly specify whether you are analyzing T-cell receptors (TR) or Immunoglobulins (IG) by using the --chain flag in your Cell Ranger command.For example, assume that it is B cell data, we could add --chain IG
to solve this problem
SSH, short for Secure Shell, is a network protocol used to securely access and manage a computer over an unsecured network. It provides a secure channel over an unencrypted network, like the internet, allowing users to log into another computer, execute commands remotely, and move files. SSH uses strong encryption to protect the data being transmitted, ensuring confidentiality and integrity of the data against eavesdropping and interception. It’s commonly used by system administrators and IT professionals for managing systems and applications remotely.
Using SSH typically involves two primary components: an SSH client and an SSH server. The server runs on the machine you want to connect to, while the client runs on the machine you’re connecting from. Here’s a basic guide on how to use SSH:
Install SSH Server: On the remote machine (the one you want to access), you need to install an SSH server. For Linux systems, this is often done using the openssh-server
package.
|
This example is for Debian-based systems (like Ubuntu). The commands might vary for other systems.
Start and Enable SSH Service: Ensure that the SSH service is started and enabled to start on boot.
|
Configure SSH Server (Optional): You can configure your SSH server by editing the /etc/ssh/sshd_config
file. This step is optional and typically only necessary for advanced configurations.
Install SSH Client: Most Unix-like systems (Linux, macOS) come with an SSH client pre-installed. For Windows, you can use clients like PuTTY or use the built-in SSH client in Windows 10/11.
Establish an SSH Connection: To connect to the SSH server, you need the IP address or hostname of the server and the username on that system. The basic command is:
|
For example, if your username is user
and the server’s IP address is 192.168.1.100
, you would use:
|
Authenticate: When you connect for the first time, you’ll be asked to verify the identity of the server. After accepting, you’ll be prompted for the password of the user account you are logging into on the remote machine.
Using SSH: Once connected, you can execute commands on the remote machine as if you were physically present.
Transferring Files (Optional): SSH also allows for secure file transfer using SCP or SFTP.
|
|
Exiting SSH: To end your SSH session, simply type exit
or press Ctrl+D
.
vim ~/.ssh/config
Host home_pc HostName 192.168.3.1 User john Port 2322
How to login this host:ssh home_pc
SSH Keys: For better security, it’s recommended to use SSH keys instead of passwords. SSH keys are a pair of cryptographic keys that can be used to authenticate to an SSH server as an alternative to password-based logins.
Firewall Settings: Make sure your firewall settings allow SSH connections (usually on port 22).
Regular Updates: Keep the SSH server software up to date for security.
SSH is a powerful tool for remote administration, but it’s important to use it securely to protect your systems and data.
Generating an SSH key is a security practice for authenticating to an SSH server more securely than using just a password. Here’s why you should do it and how to generate an SSH key:
Enhanced Security: SSH keys are cryptographic keys that are much more secure than passwords. They are almost impossible to decipher using brute force methods.
No Need for Passwords: When you use SSH keys, you don’t need to enter your password every time you connect, which reduces the risk of password theft.
Automation Friendly: SSH keys are ideal for automated processes. Scripts and applications can authenticate without manual password entry.
Access Control: SSH keys can be used to control who can access a server. Only users with the matching private key can access the server configured with the public key.
Open Terminal: Launch the terminal application.
Generate Key Pair: Use the ssh-keygen
command to generate a new SSH key pair.
|
This command creates a new SSH key using the RSA algorithm with a key size of 4096 bits, providing a good balance between security and compatibility. You can choose other algorithms like ed25519
which is considered more secure but may not be compatible with older systems.
Specify File to Save the Key: By default, ssh-keygen
will save the key in the ~/.ssh/id_rsa
file. You can specify a different file if you want.
Enter a Passphrase (Optional): For additional security, you can enter a passphrase when prompted. This passphrase will be required whenever the private key is used.
Copy Public Key: After generating your SSH key, you need to add the public key to the ~/.ssh/authorized_keys
file on your SSH server.
Use ssh-copy-id
on Linux/macOS: If you’re using Linux or macOS, you can use ssh-copy-id
to copy your public key to the server easily.
|
Manual Copying: If ssh-copy-id
isn’t available or you’re using Windows, you can manually copy the public key text and append it to ~/.ssh/authorized_keys
on the server.
Remember, never share your private key. The public key is what you distribute or add to servers, while the private key should be securely stored and kept private.
If your SSH public key authentication isn’t working, there could be several reasons. Here’s a troubleshooting guide to help you resolve common issues:
~/.ssh
directory and the authorized_keys
file. Incorrect permissions can prevent SSH from authenticating using keys.~/.ssh
directory should have permissions set to 700
(drwx------).authorized_keys
file should have permissions set to 600
(-rw-------).chmod
command to set these permissions:
|
The .ssh
directory and the authorized_keys
file should be owned by the user, not root or any other user. Use the chown
command to set the ownership:
|
You can also make sure that the home directory permissions are restricted to the user. I think it could do the same thing.
|
Ensure that the public key in authorized_keys
is in the correct format. It should be a single line starting with ssh-rsa
or ssh-ed25519
, followed by the key, and optionally a comment.
The SSH server configuration file (/etc/ssh/sshd_config
) on the server might restrict public key authentication. Check the following settings:
PubkeyAuthentication yes
should be set to allow public key authentication.AuthorizedKeysFile
should point to the correct path, typically .ssh/authorized_keys
.PasswordAuthentication
is set to no
, the server will not fall back to password authentication if key authentication fails.
|
On your client machine, ensure you’re specifying the correct private key. If you’re using a key with a non-default name or location, specify it with the -i
option:
|
SSH server logs can provide details on why the key authentication is failing. Check the logs for relevant error messages:
/var/log/auth.log
or /var/log/secure
.Ensure that the public and private keys are a matching pair. If you have regenerated or changed keys, make sure the server has the corresponding public key.
If your private key is protected with a passphrase, ensure you’re entering the correct passphrase when prompted.
Confirm there are no network issues preventing SSH access. Firewall settings on either the client or server side can block SSH connections.
Use SSH with the -vvv
option for verbose output, which can give more insights:
|
This will provide detailed debug information about each step of the SSH connection process, potentially highlighting where the issue lies.
]]>