0 Posted 2025-04-10Updated 2025-04-10Machine Learning / LM / Proteina minute read (About 178 words)

Rosetta, the Pioneer of Protein Structure Prediction

Rosetta is a comprehensive computational suite that plays a pivotal role in the protein folding field by predicting and designing protein structures based on amino acid sequences. It employs a combination of physics-based energy functions and advanced algorithms, such as fragment assembly and Monte Carlo sampling, to simulate the folding process and explore the vast conformational landscape of proteins. By iteratively optimizing potential structures, Rosetta helps researchers identify low-energy, stable configurations that closely resemble naturally occurring proteins. This tool not only aids in elucidating fundamental principles of protein structure and function but also supports the design of novel proteins and therapeutic interventions, making it an indispensable resource in structural biology and bioengineering.

0 Posted 2025-04-05Updated 2025-04-09Machine Learning / LM / Protein11 minutes read (About 1671 words)

AlphaFold

AlphaFold2

0 Posted 2025-03-03Updated 2025-03-03Paper10 minutes read (About 1463 words)

Protein Loop Refinement

Protein Loop refinement:

0 Posted 2025-01-22Updated 2025-01-22Biology / Bioinformatics / Protein Structure4 minutes read (About 616 words)

esm, Evolutionary Scale Modeling

ESM (Evolutionary Scale Modeling) is a family of large-scale protein language models developed by Meta AI. They’re trained on massive protein sequence databases, learning contextual representations of amino acids purely from sequence data. These representations—often called embeddings—capture both structural and functional clues.
In practice, you feed a protein sequence into ESM to obtain per-residue embeddings, which you can then use for downstream tasks like structure prediction, function annotation, or variant effect prediction. If you batch multiple sequences together, ESM aligns them by adding special start/end tokens and padding shorter sequences to match the longest one. You then slice out the valid embedding region for each protein, ignoring any padding.

0 Posted 2024-10-23Updated 2024-10-30Machine Learning / Data Format7 minutes read (About 1027 words)

HDF5 Data Format Introduction

HDF5 (Hierarchical Data Format version 5) is a file format designed for efficiently storing and organizing large, complex datasets. It uses a hierarchical structure of **groups** (like directories) and **datasets** (like files) to store data, supporting multidimensional arrays, metadata, and a wide variety of data types. Key advantages include **compression**, **cross-platform compatibility**, and the ability to handle large datasets that don’t fit in memory. It’s widely used in fields like scientific computing, machine learning, and bioinformatics due to its efficiency and flexibility.