PCA

PCA
Read more

AI: Logistic Regression

Logistic regression is a supervised machine learning algorithm used for binary classification tasks. Unlike linear regression, which predicts continuous values, logistic regression predicts the probability that a given input belongs to a certain class.
Read more

Regularization

Regularization is a way to make sure our model doesn't become too complicated. It ensures the model doesn’t overfit the training data while still making good predictions on new data. Think of it as adding a 'rule' or 'constraint' that prevents the model from relying too much on any specific feature or predictor.
Read more
pyrosetta© Karobben

pyrosetta

pyrosetta
Read more
Heatmap with GGplot© Karobben
GGplot: Prism style© Karobben
OpenMM, Molecular Dynamic Simulation© Karobben
HDF5 Data Format Introduction© Karobben

HDF5 Data Format Introduction

HDF5 (Hierarchical Data Format version 5) is a file format designed for efficiently storing and organizing large, complex datasets. It uses a hierarchical structure of **groups** (like directories) and **datasets** (like files) to store data, supporting multidimensional arrays, metadata, and a wide variety of data types. Key advantages include **compression**, **cross-platform compatibility**, and the ability to handle large datasets that don’t fit in memory. It’s widely used in fields like scientific computing, machine learning, and bioinformatics due to its efficiency and flexibility.
Read more
Render Your Protein in Blender with Molecular Nodes© Karobben
NCBI Data Submit with FTP/ASCP© Karobben

NCBI Data Submit with FTP/ASCP

ASCP (Aspera Secure Copy Protocol) is a fast, reliable protocol for transferring large files, particularly over long distances or in conditions with network latency or packet loss. It uses a technology called fasp (Fast, Adaptive, and Secure Protocol) to maximize available bandwidth, making transfers faster than traditional methods like FTP.
For uploading data to NCBI, ASCP is particularly useful because it efficiently handles large datasets, such as genomic sequences or omics data. Its ability to resume interrupted transfers ensures that if a connection fails during an upload, the transfer continues from where it left off, saving time and bandwidth. ASCP also provides strong encryption, ensuring data security during the upload process.
Read more

Softmax

Softmax is a mathematical function commonly used in machine learning, particularly in the context of classification problems. It transforms a vector of raw scores, often called logits, from a model into a vector of probabilities that sum to one. The probabilities generated by the softmax function represent the likelihood of each class being the correct classification. $$\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}}$$
Read more

Support Vector Machine

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression. It finds the best hyperplane that separates the data into different classes with the largest possible margin. SVM can work well with high-dimensional data and use different kernel functions to transform data for better separation when it is not linearly separable.$$f(x) = sign(w^T x + b)$$
Read more