User Interface
At the expense of a graphical interface, development resources have been focused on energy functions, on algorithms for search of conformation space, and on a handful of supporting utilities. The ereg package interfaces to programs (such as pymol) for graphical display through pdb format files. The user interface consists of 12 commands typed to the macOS (or linux) prompt. The following command line arguments, each replaced by the user with a meaningful string of characters, specify the associated objects.
| FAM | A project name. |
| MOL | A molecule or set of molecules that compose a system. |
| CNF | A structure or conformation of the system. |
| SUB | A subset of the set of rigid-geometry degrees of freedom of the system. |
| GRP | A collection of templates to be used in homology model building. |
The user interface is designed with a goal of creating an easily-usable tool set, providing functionality in factored units of utility that should be combinable to accomplish a range of modeling studies.
For computers running macOS, ereg includes a simple viewing app (written in Apple's Swift and Metal languages) for graphical display of program generated structures.
Central Commands
The functionality of the package is accessed through the following 10 commands.
geometry regularization
greg FAM MOL
- regularizes geometry, meaning bond lengths, bond angles, and some torsion angles are adjusted to standard values with minimal movement of atom coordinates
The energy surface is defined for a rigid-geometry model. To access the energy surface for structure prediction, a generalized structure of a molecule or system of molecules must be moved into the subspace of structures consistent with regularized geometry.
The primary use of this command is, for a collection of templates in preparation for homology model building, geometry regularization of experimental structures. A second, less common, use is geometry regularization of large structures, as an alternative to the ereg command, in preparation for application of structure prediction to localized regions.
local energy minimization
ereg FAM MOL
- regularizes geometry
- minimizes energy locally with respect to all generalized coordinates
- to prevent movement out of the well of the initial conformation, minimization on the energy surface is accomplished by generating a sequence of up to 9 local minimization trajectories, gradually reducing to zero the weighting coefficient for a set of harmonic distance constraints taken from the initial structure
The ereg command is a better alternative to the greg command for bringing an experimental structure into regularized geometry in preparation for sequence design, prediction of pKa values of ionizable groups, or structure prediction of localized regions. Another common use is, to support comparison to other structures, evaluation of the full energy of a structure.
segment structure prediction and sequence design
estp FAM MOL CNF SUB
- inputs a compact file specifying a subspace of generalized coordinates to be searched and, at each sequence position, a set of residues to be substituted
- for each sequence variant, generates a collection of local minima on the energy surface by minimizing globally within the subspace of generalized coordinates
- calculates, over the set of sequence variants, a measure dG of free energy of folding
Two of the most useful functionalities of the ereg program are accessed using the estp command. Structure prediction for segments of proteins is achieved by search through conformation space. Sequence design of thermodynamic stability is achieved by additional search through sequence space.
For a rigid-geometry mechanical system MOL, using conformation CNF as the starting structure, the estp command searches for the conformation that minimizes the full energy function within a subspace SUB of the full space of motion. The subspace of conformations to be searched consists of from 1 to 8 segments, each segment 7 to 13 residues in length, plus a collection of side chains.
For each sequence of the specified space of sequences, the program searches through the specified space of conformations. The lowest-energy conformation found is used to evaluate dG, an estimate of free energy of folding. Because different models are used to represent the folded state of the protein and the reference unfolded state, dG is not, for a single sequence, a physically meaningful measure of stability. However, ddG=( dG(sequence1) -dG(sequence0)), the change in dG with sequence, does provide a meaningful measure of relative stability.
structure quality assessment
prof FAM MOL CNF
- identifies defects in a structure
- associates with these defects differential free energies of folding
- accumulates a profile along the sequence of defect energy density
Structure quality assessment identifies chain segments likely to be improved by applications of molecular mechanics-based structure prediction.
energy refinement of a homology model
rcyc FAM MOL CNF
- generates a collection of 7-residue segments such that the union spans the entire chain or chains
- cycles through this collection of segments performing a limited conformational search with respect to each segment
Common uses of energy refinement are to improve structures created by the hlog or igor commands, or to explore conformations in the region of an experimental structure.
homology model building
hlog FAM MOL GRP
- aligns all structures contained in the group of templates
- partitions this multiple structure alignment into structurally conserved and non-conserved regions
- aligns the target sequence to the multiple structure alignment
- constructs a homology model based on this alignment
Homology model building leads to one of the major applications of molecular mechanics-based structure prediction, structure prediction of surface loops for which knowledge-based structure prediction may not be reliable.
ab initio fold prediction
igor FAM MOL
- generates a collection of 16 ab initio folds by global search through the space of element compositions of the low-resolution igor model
A description of the igor model and definitions of element composition and fold are given on the technology page of this website. A common use of predicted folds is as starting points for global energy minimization using the rcyc, estp, or ptra commands.
guided trajectory search
ptra FAM MOL CNF SUB
- generates a trajectory through a large, unconstrained subspace of generalized coordinates
As a tool for search of conformation space, the ptra command complements the estp command. A search directed by the estp command, by focusing computation on a spatially localized region of a larger structure, is more efficient when productive motions are concentrated within 1, or more, segments. A search directed by the ptra command, by enabling full chain flexibility, facilitates unconstrained motions such as changes to the packing configuration of helices and sheets.
A description of the guided trajectory search algorithm is given on the technology page of this website. The primary use of this command is structure prediction for small proteins, or, more generally, search through large, unconstrained subspaces of generalized coordinates. A second, less common, use is energy refinement as an alternative to the rcyc command.ionizable group pKa prediction
ionstate FAM MOL
- regularizes geometry
- iterates over a range of pH values
- for each pH, calculates a probability distribution over the set of ionization states
- for each ionizable group, calculates pKa
A common use of the ionstate command is to calculate the most probable ionization state for a specified pH, and to generate a most probable conformation consistent with this ionization state. This usage enables subsequent modeling of the most probable sequence, including protonation state, for a given pH.
docking prediction
edoc FAM MOL1, MOL2, ... MOLn
- generates a collection of 64 docked conformations by global search over a grid
The input structures are packed as rigid bodies. For docking of 2 rigid bodies, the space of conformations is defined by 6 degrees of freedom, translation and rotation of the 2nd body with respect to the 1st. This continuous space is replaced with a grid consisting of 104857600 discrete conformations. The search algorithm optimizes a packing score over the discretized space.
Computational Requirements
The program, which consists of roughly 124,000 lines of C++ code, was developed for a macOS (or linux) workstation with a requirement of 8, and preferably more, Gigabytes of memory. The source code is compiled using gcc. The calculations are computationally intensive.
Download
The ereg package is being distributed open source under the AGPLv3 license.
The User Manual, included with the distribution, describes the installation.
A short description of the macOS specific Viewing App, is also included with the distribution.
Download ereg_jun2024.tar.gz [72 MB]
This version (jun2024) constitutes the second public release of the software, beginning in Q2 of 2024.
Support
We offer support and responsiveness to user input.
