parameters.secondary_structure submodule
- cg_openmm.parameters.secondary_structure.bootstrap_native_contacts_expectation(cgmodel, traj_file_list, native_contact_list, native_contact_distances, output_data='output/output.nc', frame_begin=0, sample_spacing=1, native_contact_tol=1.3, num_intermediate_states=0, n_trial_boot=200, conf_percent='sigma', plotfile='Q_vs_T_bootstrap.pdf', homopolymer_sym=False)[source]
Given a cgmodel, native contact definitions, and trajectory file list, this function calculates the fraction of native contacts for all specified frames, and uses a bootstrapping scheme to compute the uncertainties in the Q vs T folding curve. Intended to be used after the native contact tolerance has been optimized (either the helical or generalized versions).
- Parameters
cgmodel (class) – CGModel() class object
traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name
native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.
native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list
frame_begin (int) – Frame at which to start native contacts analysis (default=0)
sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)
native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)
num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)
n_trial_boot (int) – number of trials to run for generating bootstrapping uncertainties (default=200)
conf_percent (float) – Confidence level in percent for outputting uncertainties (default=’sigma’=68.27)
plotfile (str) – Path to output file for plotting results (default=’Q_vs_T_bootstrap.pdf’)
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
temp_list ( List( float * unit.simtk.temperature ) ) - The temperature list corresponding to the native contact fraction values
Q_values ( List( float ) ) - The native contact fraction values for all (including inserted intermediates) states
Q_uncertainty ( Tuple ( np.array(float) ) - confidence interval for all Q_values computed from bootstrapping
sigmoid_results_boot ( dict ) - dictionary containing the 4 sigmoid parameters and Q_folded (and their confidence interval tuples)
- cg_openmm.parameters.secondary_structure.expectations_fraction_contacts(fraction_native_contacts, frame_begin=0, sample_spacing=1, output_data='output/output.nc', num_intermediate_states=0, bootstrap_energies=None)[source]
Given a .nc output file, temperature list, and number of intermediate states to insert for the temperature list, this function calculates the native contact fraction expectation.
- Parameters
fraction_native_contacts (numpy array (float * nframes x nreplicas)) – The fraction of native contacts for all selected frames in the trajectories.
frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)
sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)
output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)
num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)
bootstrap_energies (2d numpy array (float)) – a custom replica_energies array to be used for bootstrapping calculations. Used instead of the energies in the .nc file. (default=None)
- Returns
results ( dict ) - dictionary containing complete temperature list(“T”), native contact fraction expectation (“Q”), and uncertainty of Q (“dQ”)
- cg_openmm.parameters.secondary_structure.fraction_native_contacts(cgmodel, file_list, native_contact_list, native_contact_distances, frame_begin=0, native_contact_tol=1.3, subsample=True, homopolymer_sym=False)[source]
Given a cgmodel, mdtraj trajectory object, and positions for the native structure, this function calculates the fraction of native contacts for the model.
- Parameters
cgmodel (class) – CGModel() class object
file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name
native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.
native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list
frame_begin (int) – Frame at which to start native contacts analysis (default=0)
native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)
subsample (Boolean) – option to use pymbar subsampleCorrelatedData to detect and return the interval between uncorrelated data points (default=True)
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
Q ( numpy array (float * nframes x nreplicas) ) - The fraction of native contacts for all selected frames in the trajectories.
Q_avg ( numpy array (float * nreplicas) ) - Mean values of Q for each replica.
Q_stderr ( numpy array (float * nreplicas) ) - Standard error of the mean of Q for each replica.
decorrelation_spacing ( int ) - Number of frames between uncorrelated native contact fractions
- cg_openmm.parameters.secondary_structure.fraction_native_contacts_preloaded(cgmodel, traj_dict, native_contact_list, native_contact_distances, frame_begin=0, native_contact_tol=1.3, subsample=True, homopolymer_sym=False)[source]
Given a cgmodel, mdtraj trajectory object, and positions for the native structure, this function calculates the fraction of native contacts for the model.
- Parameters
cgmodel (class) – CGModel() class object
traj_dict (dict{replica: MDTraj trajectory object}) – A dictionary of preloaded MDTraj trajectory objects
native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.
native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list
frame_begin (int) – Frame at which to start native contacts analysis (default=0)
native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)
subsample (Boolean) – option to use pymbar subsampleCorrelatedData to detect and return the interval between uncorrelated data points (default=True)
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
Q ( numpy array (float * nframes x nreplicas) ) - The fraction of native contacts for all selected frames in the trajectories.
Q_avg ( numpy array (float * nreplicas) ) - Mean values of Q for each replica.
Q_stderr ( numpy array (float * nreplicas) ) - Standard error of the mean of Q for each replica.
decorrelation_spacing ( int ) - Number of frames between uncorrelated native contact fractions
- cg_openmm.parameters.secondary_structure.get_helix_contacts(cgmodel, native_structure_file, backbone_type_name='bb', verbose=False)[source]
Given a coarse grained model and positions for the native structure this function determines which pairs are native contacts. This function assumes helical geometry with native contacts being backbone pairs iteracting as (i) to (i+n) neighbors, where n defines the pairs which on average are the shortest distance.
- Parameters
cgmodel (class) – CGModel() class object
native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.
backbone_type_name (str or list(str)) – type name(s) in cgmodel which corresponds to the particles forming the helical backbone (default=’bb’)
verbose (bool) – Option to print detailed statistics for each helical backbone sequence considered
- Returns
native_contact_list - A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_cutoff_distance’.
native_contact_distances - A Quantity numpy array of the native pairwise distances corresponding to native_contact_list
opt_seq_spacing - The (i) to (i+n) number n defining contacting backbone beads
- cg_openmm.parameters.secondary_structure.get_native_contacts(cgmodel, native_structure_file, native_contact_distance_cutoff)[source]
Given a coarse grained model, positions for the native structure, and cutoff, this function determines which pairs are native contacts.
- Parameters
cgmodel (class) – CGModel() class object
native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.
native_contact_distance_cutoff (Quantity()) – The maximum distance for two nonbonded particles that are defined as native
- Returns
native_contact_list - A list of the nonbonded interactions whose inter-particle distances are less than the native_contact_distance_cutoff.
native_contact_distances - A Quantity numpy array of the native pairwise distances corresponding to native_contact_list
contact_type_dict - A dictionary of {native contact particle type pair: counts}
- cg_openmm.parameters.secondary_structure.optimize_Q_cut(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, plotfile='native_contacts_opt_2d.pdf', verbose=False, minimizer_options=None, bounds_nc_cut=None, bounds_nc_tol=(1, 2), homopolymer_sym=False)[source]
Given a coarse grained model and a native structure as input, optimize both the distance cutoff defining the native contact pairs and the distance tolerance for scanning the trajectory for native contacts.
- Parameters
cgmodel (class) – CGModel() class object
native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.
traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name
output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)
num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)
frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)
frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)
plotfile (str) – Path to output file for plotting results (default=’native_contacts_opt_2d.pdf’)
verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)
minimizer_options (dict) – dictionary of additional options for scipy.minimize.optimize.differential_evolution (default=None)
bounds_nc_cut (tuple) – native contact distance cutoff bounds in distance units - if None, will determine bounds based on backbone sigma parameter (default=None)
bounds_nc_tol (tuple) – native contact tolerance factor bounds (default=(1,2))
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
native_contact_cutoff ( Quantity() ) - The ideal distance below which two nonbonded, interacting particles should be defined as a “native contact”
native_contact_tol( float ) - tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native contact distances)
opt_results ( dict ) - results of the native contact cutoff scipy.optimize.minimize optimization
Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’
sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)
sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt
contact_type_dict ( dict ) - a dictionary of {native contact particle type pair: counts}
- cg_openmm.parameters.secondary_structure.optimize_Q_cut_1d(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, native_contact_tol=1.3, plotfile='native_contacts_opt_1d.pdf', verbose=False, brute_step=0.1, bounds=None, homopolymer_sym=False)[source]
Given a coarse grained model and a native structure as input, optimize the distance cutoff defining the native contact pairs, with a fixed distance tolerance factor for scanning the trajectory.
- Parameters
cgmodel (class) – CGModel() class object
native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.
traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name
output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)
num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)
frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)
frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)
native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)
plotfile (str) – Path to output file for plotting results (default=’native_contacts_opt_1d.pdf’)
verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)
brute_step (float) – step size in distance units for brute force native contact cutoff optimization (final optimization searches between intervals) (default=0.1)
bounds (tuple) – bounds in distance units for brute force optimization - if None, will determine bounds based on backbone sigma parameter (default=None)
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
native_contact_cutoff ( Quantity() ) - The ideal distance below which two nonbonded, interacting particles should be defined as a “native contact”
opt_results ( dict ) - results of the native contact cutoff scipy.optimize.minimize optimization
Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’
sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)
sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt
contact_type_dict ( dict ) - a dictionary of {native contact particle type pair: counts}
- cg_openmm.parameters.secondary_structure.optimize_Q_tol_helix(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, backbone_type_name='bb', plotfile='native_contacts_helix_opt.pdf', verbose=False, brute_step=0.1, homopolymer_sym=False)[source]
Given a coarse grained model and a native structure as input, determine which helical backbone sequences are native contacts, and the optimal distance tolerance for scanning the trajectory for native contacts. Tolerance is determined by brute force scan.
- Parameters
cgmodel (class) – CGModel() class object
native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.
traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name
output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)
num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)
frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)
frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)
backbone_type_name (str or list(str)) – type name(s) in cgmodel which corresponds to the particles forming the helical backbone (default=’bb’)
plotfile (str) – Path to output file for plotting results (default=’native_contacts_helix_opt.pdf’)
verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)
brute_step (float) – step size in native distance multiples for brute force tolerance optimization (final optimization searches between intervals) (default=0.1)
homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)
- Returns
opt_seq_spacing ( int ) - the (i) to (i+n) number n defining contacting backbone beads
native_contact_tol( float ) - tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native contact distances)
opt_results ( dict ) - results of the native contact tolerance scipy.optimize.minimize optimization
Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’
sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)
sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt
- cg_openmm.parameters.secondary_structure.plot_native_contact_fraction(temperature_list, Q, Q_uncertainty, plotfile='Q_vs_T.pdf', sigmoid_dict=None)[source]
Given a list of temperatures and corresponding native contact fractions, plot Q vs T. If a sigmoid dict from bootstrapping is given, also plot the sigmoid curve. Note that this sigmoid curve is generated by using the mean values of the 4 hyperbolic fitting parameters taken over all bootstrap trials, not a direct fit to the Q vs T data.
- Parameters
temperature_list (List( SIMTK Unit() * number_replicas )) – List of temperatures that will be used to define different replicas (thermodynamics states)
Q (np.array(float * len(temperature_list))) – native contact fraction array for all temperatures in temperature_list
Q_uncertainty (np.array(float * len(temperature_list))) – uncertainty associated with Q
plotfile (str) – Path to output file for plotting results (default=’Q_vs_T.pdf’)
sigmoid_dict (dict) – dictionary containing sigmoid parameter mean values and uncertainties (default=None)
- cg_openmm.parameters.secondary_structure.plot_native_contact_timeseries(Q, time_interval=Quantity(value=1.0, unit=picosecond), frame_begin=0, plot_per_page=3, plotfile='Q_vs_time.pdf', figure_title=None)[source]
Given average native contact fractions timeseries for each replica or state, plot Q vs time.
- Parameters
Q (np.array(float * nframes x len(temperature_list))) – native contact fraction array for all replicas or states
time_interval – interval between energy exchanges (default=1.0*unit.picosecond)
frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)
plot_per_page (int) – number of subplots per pdf page (default=3)
plotfile (str) – Path to output file for plotting results (default=’Q_vs_time.pdf’)
figure_title (str) – title of overall plot (default=None)