parameters.secondary_structure submodule

cg_openmm.parameters.secondary_structure.bootstrap_native_contacts_expectation(cgmodel, traj_file_list, native_contact_list, native_contact_distances, output_data='output/output.nc', frame_begin=0, sample_spacing=1, native_contact_tol=1.3, num_intermediate_states=0, n_trial_boot=200, conf_percent='sigma', plotfile='Q_vs_T_bootstrap.pdf', homopolymer_sym=False)[source]

Given a cgmodel, native contact definitions, and trajectory file list, this function calculates the fraction of native contacts for all specified frames, and uses a bootstrapping scheme to compute the uncertainties in the Q vs T folding curve. Intended to be used after the native contact tolerance has been optimized (either the helical or generalized versions).

Parameters
  • cgmodel (class) – CGModel() class object

  • traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name

  • native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.

  • native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list

  • frame_begin (int) – Frame at which to start native contacts analysis (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)

  • num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)

  • n_trial_boot (int) – number of trials to run for generating bootstrapping uncertainties (default=200)

  • conf_percent (float) – Confidence level in percent for outputting uncertainties (default=’sigma’=68.27)

  • plotfile (str) – Path to output file for plotting results (default=’Q_vs_T_bootstrap.pdf’)

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • temp_list ( List( float * unit.simtk.temperature ) ) - The temperature list corresponding to the native contact fraction values

  • Q_values ( List( float ) ) - The native contact fraction values for all (including inserted intermediates) states

  • Q_uncertainty ( Tuple ( np.array(float) ) - confidence interval for all Q_values computed from bootstrapping

  • sigmoid_results_boot ( dict ) - dictionary containing the 4 sigmoid parameters and Q_folded (and their confidence interval tuples)

cg_openmm.parameters.secondary_structure.expectations_fraction_contacts(fraction_native_contacts, frame_begin=0, sample_spacing=1, output_data='output/output.nc', num_intermediate_states=0, bootstrap_energies=None)[source]

Given a .nc output file, temperature list, and number of intermediate states to insert for the temperature list, this function calculates the native contact fraction expectation.

Parameters
  • fraction_native_contacts (numpy array (float * nframes x nreplicas)) – The fraction of native contacts for all selected frames in the trajectories.

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)

  • bootstrap_energies (2d numpy array (float)) – a custom replica_energies array to be used for bootstrapping calculations. Used instead of the energies in the .nc file. (default=None)

Returns

  • results ( dict ) - dictionary containing complete temperature list(“T”), native contact fraction expectation (“Q”), and uncertainty of Q (“dQ”)

cg_openmm.parameters.secondary_structure.fraction_native_contacts(cgmodel, file_list, native_contact_list, native_contact_distances, frame_begin=0, native_contact_tol=1.3, subsample=True, homopolymer_sym=False)[source]

Given a cgmodel, mdtraj trajectory object, and positions for the native structure, this function calculates the fraction of native contacts for the model.

Parameters
  • cgmodel (class) – CGModel() class object

  • file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name

  • native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.

  • native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list

  • frame_begin (int) – Frame at which to start native contacts analysis (default=0)

  • native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)

  • subsample (Boolean) – option to use pymbar subsampleCorrelatedData to detect and return the interval between uncorrelated data points (default=True)

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • Q ( numpy array (float * nframes x nreplicas) ) - The fraction of native contacts for all selected frames in the trajectories.

  • Q_avg ( numpy array (float * nreplicas) ) - Mean values of Q for each replica.

  • Q_stderr ( numpy array (float * nreplicas) ) - Standard error of the mean of Q for each replica.

  • decorrelation_spacing ( int ) - Number of frames between uncorrelated native contact fractions

cg_openmm.parameters.secondary_structure.fraction_native_contacts_preloaded(cgmodel, traj_dict, native_contact_list, native_contact_distances, frame_begin=0, native_contact_tol=1.3, subsample=True, homopolymer_sym=False)[source]

Given a cgmodel, mdtraj trajectory object, and positions for the native structure, this function calculates the fraction of native contacts for the model.

Parameters
  • cgmodel (class) – CGModel() class object

  • traj_dict (dict{replica: MDTraj trajectory object}) – A dictionary of preloaded MDTraj trajectory objects

  • native_contact_list (List) – A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_distance_cutoff’.

  • native_contact_distances (Quantity) – A numpy array of the native pairwise distances corresponding to native_contact_list

  • frame_begin (int) – Frame at which to start native contacts analysis (default=0)

  • native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)

  • subsample (Boolean) – option to use pymbar subsampleCorrelatedData to detect and return the interval between uncorrelated data points (default=True)

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • Q ( numpy array (float * nframes x nreplicas) ) - The fraction of native contacts for all selected frames in the trajectories.

  • Q_avg ( numpy array (float * nreplicas) ) - Mean values of Q for each replica.

  • Q_stderr ( numpy array (float * nreplicas) ) - Standard error of the mean of Q for each replica.

  • decorrelation_spacing ( int ) - Number of frames between uncorrelated native contact fractions

cg_openmm.parameters.secondary_structure.get_helix_contacts(cgmodel, native_structure_file, backbone_type_name='bb', verbose=False)[source]

Given a coarse grained model and positions for the native structure this function determines which pairs are native contacts. This function assumes helical geometry with native contacts being backbone pairs iteracting as (i) to (i+n) neighbors, where n defines the pairs which on average are the shortest distance.

Parameters
  • cgmodel (class) – CGModel() class object

  • native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.

  • backbone_type_name (str or list(str)) – type name(s) in cgmodel which corresponds to the particles forming the helical backbone (default=’bb’)

  • verbose (bool) – Option to print detailed statistics for each helical backbone sequence considered

Returns

  • native_contact_list - A list of the nonbonded interactions whose inter-particle distances are less than the ‘native_contact_cutoff_distance’.

  • native_contact_distances - A Quantity numpy array of the native pairwise distances corresponding to native_contact_list

  • opt_seq_spacing - The (i) to (i+n) number n defining contacting backbone beads

cg_openmm.parameters.secondary_structure.get_native_contacts(cgmodel, native_structure_file, native_contact_distance_cutoff)[source]

Given a coarse grained model, positions for the native structure, and cutoff, this function determines which pairs are native contacts.

Parameters
  • cgmodel (class) – CGModel() class object

  • native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.

  • native_contact_distance_cutoff (Quantity()) – The maximum distance for two nonbonded particles that are defined as native

Returns

  • native_contact_list - A list of the nonbonded interactions whose inter-particle distances are less than the native_contact_distance_cutoff.

  • native_contact_distances - A Quantity numpy array of the native pairwise distances corresponding to native_contact_list

  • contact_type_dict - A dictionary of {native contact particle type pair: counts}

cg_openmm.parameters.secondary_structure.optimize_Q_cut(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, plotfile='native_contacts_opt_2d.pdf', verbose=False, minimizer_options=None, bounds_nc_cut=None, bounds_nc_tol=(1, 2), homopolymer_sym=False)[source]

Given a coarse grained model and a native structure as input, optimize both the distance cutoff defining the native contact pairs and the distance tolerance for scanning the trajectory for native contacts.

Parameters
  • cgmodel (class) – CGModel() class object

  • native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.

  • traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • plotfile (str) – Path to output file for plotting results (default=’native_contacts_opt_2d.pdf’)

  • verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)

  • minimizer_options (dict) – dictionary of additional options for scipy.minimize.optimize.differential_evolution (default=None)

  • bounds_nc_cut (tuple) – native contact distance cutoff bounds in distance units - if None, will determine bounds based on backbone sigma parameter (default=None)

  • bounds_nc_tol (tuple) – native contact tolerance factor bounds (default=(1,2))

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • native_contact_cutoff ( Quantity() ) - The ideal distance below which two nonbonded, interacting particles should be defined as a “native contact”

  • native_contact_tol( float ) - tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native contact distances)

  • opt_results ( dict ) - results of the native contact cutoff scipy.optimize.minimize optimization

  • Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’

  • sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)

  • sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt

  • contact_type_dict ( dict ) - a dictionary of {native contact particle type pair: counts}

cg_openmm.parameters.secondary_structure.optimize_Q_cut_1d(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, native_contact_tol=1.3, plotfile='native_contacts_opt_1d.pdf', verbose=False, brute_step=0.1, bounds=None, homopolymer_sym=False)[source]

Given a coarse grained model and a native structure as input, optimize the distance cutoff defining the native contact pairs, with a fixed distance tolerance factor for scanning the trajectory.

Parameters
  • cgmodel (class) – CGModel() class object

  • native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.

  • traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • native_contact_tol (float) – Tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native distance) (default=1.3)

  • plotfile (str) – Path to output file for plotting results (default=’native_contacts_opt_1d.pdf’)

  • verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)

  • brute_step (float) – step size in distance units for brute force native contact cutoff optimization (final optimization searches between intervals) (default=0.1)

  • bounds (tuple) – bounds in distance units for brute force optimization - if None, will determine bounds based on backbone sigma parameter (default=None)

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • native_contact_cutoff ( Quantity() ) - The ideal distance below which two nonbonded, interacting particles should be defined as a “native contact”

  • opt_results ( dict ) - results of the native contact cutoff scipy.optimize.minimize optimization

  • Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’

  • sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)

  • sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt

  • contact_type_dict ( dict ) - a dictionary of {native contact particle type pair: counts}

cg_openmm.parameters.secondary_structure.optimize_Q_tol_helix(cgmodel, native_structure_file, traj_file_list, output_data='output/output.nc', num_intermediate_states=0, frame_begin=0, frame_stride=1, backbone_type_name='bb', plotfile='native_contacts_helix_opt.pdf', verbose=False, brute_step=0.1, homopolymer_sym=False)[source]

Given a coarse grained model and a native structure as input, determine which helical backbone sequences are native contacts, and the optimal distance tolerance for scanning the trajectory for native contacts. Tolerance is determined by brute force scan.

Parameters
  • cgmodel (class) – CGModel() class object

  • native_structure_file (str) – Path to file (‘pdb’ or ‘dcd’) containing particle positions for the native structure.

  • traj_file_list (List( str ) or str) – A list of replica PDB or DCD trajectory files corresponding to the energies in the .nc file, or a single file name

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default=”output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing simulated temperature states (default=0)

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • frame_stride (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • backbone_type_name (str or list(str)) – type name(s) in cgmodel which corresponds to the particles forming the helical backbone (default=’bb’)

  • plotfile (str) – Path to output file for plotting results (default=’native_contacts_helix_opt.pdf’)

  • verbose (bool) – Option to print detailed native contacts information at each iteration (default=false)

  • brute_step (float) – step size in native distance multiples for brute force tolerance optimization (final optimization searches between intervals) (default=0.1)

  • homopolymer_sym (Boolean) – if there is end-to-end symmetry, scan forwards and backwards sequences for highest Q (default=False)

Returns

  • opt_seq_spacing ( int ) - the (i) to (i+n) number n defining contacting backbone beads

  • native_contact_tol( float ) - tolerance factor beyond the native distance for determining whether a pair of particles is ‘native’ (in multiples of native contact distances)

  • opt_results ( dict ) - results of the native contact tolerance scipy.optimize.minimize optimization

  • Q_expect_results ( dict ) - results of the native contact fraction expectation calculation containing ‘Q’ and ‘T’

  • sigmoid_param_opt ( 1D numpy array ) - optimized sigmoid parameters (x0, y0, y1, d)

  • sigmoid_param_cov ( 2D numpy array ) - estimated covariance of sigmoid_param_opt

cg_openmm.parameters.secondary_structure.plot_native_contact_fraction(temperature_list, Q, Q_uncertainty, plotfile='Q_vs_T.pdf', sigmoid_dict=None)[source]

Given a list of temperatures and corresponding native contact fractions, plot Q vs T. If a sigmoid dict from bootstrapping is given, also plot the sigmoid curve. Note that this sigmoid curve is generated by using the mean values of the 4 hyperbolic fitting parameters taken over all bootstrap trials, not a direct fit to the Q vs T data.

Parameters
  • temperature_list (List( SIMTK Unit() * number_replicas )) – List of temperatures that will be used to define different replicas (thermodynamics states)

  • Q (np.array(float * len(temperature_list))) – native contact fraction array for all temperatures in temperature_list

  • Q_uncertainty (np.array(float * len(temperature_list))) – uncertainty associated with Q

  • plotfile (str) – Path to output file for plotting results (default=’Q_vs_T.pdf’)

  • sigmoid_dict (dict) – dictionary containing sigmoid parameter mean values and uncertainties (default=None)

cg_openmm.parameters.secondary_structure.plot_native_contact_timeseries(Q, time_interval=Quantity(value=1.0, unit=picosecond), frame_begin=0, plot_per_page=3, plotfile='Q_vs_time.pdf', figure_title=None)[source]

Given average native contact fractions timeseries for each replica or state, plot Q vs time.

Parameters
  • Q (np.array(float * nframes x len(temperature_list))) – native contact fraction array for all replicas or states

  • time_interval – interval between energy exchanges (default=1.0*unit.picosecond)

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • plot_per_page (int) – number of subplots per pdf page (default=3)

  • plotfile (str) – Path to output file for plotting results (default=’Q_vs_time.pdf’)

  • figure_title (str) – title of overall plot (default=None)