thermo.calc submodule

cg_openmm.thermo.calc.bootstrap_heat_capacity(frame_begin=0, sample_spacing=1, frame_end=-1, plot_file='heat_capacity_boot.pdf', output_data='output/output.nc', num_intermediate_states=0, frac_dT=0.05, conf_percent='sigma', n_trial_boot=200, U_kln=None, sparsify_stride=1, compute_entropies=False)[source]

Calculate and plot the heat capacity curve, with uncertainty determined using bootstrapping. Uncorrelated datasets are selected using a random starting frame, repeated n_trial_boot times. Uncertainty in melting point and full-width half maximum of the Cv curve are also returned. If a re-evaluated energy array is specified, the heat capacity of the re-evaluated system is calculated using MBAR reweighting, based on the original energies in the output.nc file.

Note

If using a re-evaluated energy matrix, U_kln should include all frames after frame_begin

Parameters
  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • frame_end (int) – index of last frame to include in heat capacity calculation (default=-1)

  • plot_file (str) – path to filename to output plot (default=’heat_capacity_boot.pdf’)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default = “output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing states in ‘temperature_list’ (default=0)

  • frac_dT (float) – The fraction difference between temperatures points used to calculate finite difference derivatives (default=0.05)

  • conf_percent (float) – Confidence level in percent for outputting uncertainties (default = 68.27 = 1 sigma)

  • n_trial_boot (int) – number of trials to run for generating bootstrapping uncertainties (default=200)

  • U_kln (3d numpy array (float) with dimensions [replica, evaluated_state, frame]) – re-evaluated state energies array to be used for the MBAR calculation (starts at frame_begin)

  • sparsify_stride (int) – apply this stride to the replica energies from file, to match sparsified reevaluated energies (default=1)

  • compute_entropies (bool) – option to compute the entropy changes from lowest simulated temperature to each other temperature defined by num_intermediate_states (default=False)

Returns

  • T_list ( List( float * unit.simtk.temperature ) ) - The temperature list corresponding to the heat capacity values in ‘Cv’

  • Cv_values ( List( float * kJ/mol/K ) ) - The heat capacity values for all (including inserted intermediates) states

  • Cv_uncertainty ( Tuple ( np.array(float) * kJ/mol/K ) ) - confidence interval for all Cv_values computed from bootstrapping

  • Tm_value ( float * unit.simtk.temperature ) - Melting point mean value computed from bootstrapping

  • Tm_uncertainty ( Tuple ( float * unit.simtk.temperature ) ) - confidence interval for melting point computed from bootstrapping

  • Cv_height_value ( float * kJ/mol/K ) - Height of the Cv peak relative to the lowest value over the temperature range used.

  • Cv_height_uncertainty ( Tuple ( np.array(float) * kJ/mol/K ) ) - confidence interval for all Cv_height_value computed from bootstrapping

  • FWHM_value ( float * unit.simtk.temperature ) - Cv full width half maximum mean value computed from bootstrapping

  • FWHM_uncertainty ( Tuple ( float * unit.simtk.temperature ) ) - confidence interval for Cv full width half maximum computed from bootstrapping

  • N_eff_values ( np.array( float ) ) - The bootstrap mean number of effective samples at all simulated and non-simulated (including inserted intermediates) states

cg_openmm.thermo.calc.bootstrap_partial_heat_capacities(array_folded_states, frame_begin=0, sample_spacing=1, frame_end=-1, plot_file='partial_heat_capacity_boot.pdf', output_data='output/output.nc', num_intermediate_states=0, frac_dT=0.05, conf_percent='sigma', n_trial_boot=200)[source]

Given an array classifying each frame into discrete conformational states, compute the heat capacity curve, and contributions from each conformational class to the heat capacity curve, with uncertainties determined using bootstrapping. Uncorrelated datasets are selected using a random starting frame, repeated n_trial_boot times.

Note

array_folded_states should include all frames after frame_begin

Note

The partial contributions to heat capacity must be weighted by the probabilities of finding each conformational class to reconstruct the total heat capacity curve.

Parameters
  • array_folded_states (2d numpy array (int)) – a precomputed array classifying the different conformational states

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • frame_end (int) – index of last frame to include in heat capacity calculation (default=-1)

  • plot_file (str) – path to filename to output plot (default=’heat_capacity_boot.pdf’)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default = “output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing states in ‘temperature_list’ (default=0)

  • frac_dT (float) – The fraction difference between temperatures points used to calculate finite difference derivatives (default=0.05)

  • conf_percent (float) – Confidence level in percent for outputting uncertainties (default = 68.27 = 1 sigma)

  • n_trial_boot (int) – number of trials to run for generating bootstrapping uncertainties (default=200)

Returns

  • T_list ( np.array ( float * unit.simtk.temperature ) ) - The temperature list corresponding to the heat capacity values in ‘Cv’

  • Cv_values ( dict ( np.array ( float * kJ/mol/K ) ) ) - The heat capacity values for all (including inserted intermediates) states

  • Cv_uncertainty ( dict ( Tuple ( np.array(float) * kJ/mol/K ) ) ) - confidence interval for all Cv_values computed from bootstrapping

  • Tm_value ( dict ( float * unit.simtk.temperature ) ) - Melting point mean value computed from bootstrapping

  • Tm_uncertainty ( dict ( Tuple ( float * unit.simtk.temperature ) ) ) - confidence interval for melting point computed from bootstrapping

  • Cv_height_value ( dict ( float * kJ/mol/K ) ) - Height of the Cv peak relative to the lowest value over the temperature range used.

  • Cv_height_uncertainty ( dict ( Tuple ( np.array(float) * kJ/mol/K ) ) ) - confidence interval for all Cv_height_value computed from bootstrapping

  • FWHM_value ( dict ( float * unit.simtk.temperature ) ) - Cv full width half maximum mean value computed from bootstrapping

  • FWHM_uncertainty ( dict ( Tuple ( float * unit.simtk.temperature ) ) ) - confidence interval for Cv full width half maximum computed from bootstrapping

  • U_expect_values ( dict ( np.array ( float * kJ/mol ) ) ) - Energy expectation values for each conformational state, at each T in T_list

  • U_expect_uncertainty ( dict ( Tuple ( np.array(float) * kJ/mol ) ) ) - confidence interval for U_expect_values computed from bootstrapping

cg_openmm.thermo.calc.get_cv_FWHM(Cv_values, T_list)[source]

Internal function for getting the full-width half-maximum, melting point, and peak height from a heat capacity vs T dataset.

Parameters
  • Cv_values (float * 1D np.array) – heat capacity data series (unitless)

  • T_list (float * 1D np.array) – temperature data series (unitless)

Returns

  • FWHM ( float ) - Full width half maximum from heat capacity vs T

  • Tm ( float ) - Melting point from heat capacity vs T

  • Cv_height ( float ) - Relative height of heat capacity peak

cg_openmm.thermo.calc.get_heat_capacity(frame_begin=0, sample_spacing=1, frame_end=-1, output_data='output/output.nc', num_intermediate_states=0, frac_dT=0.05, plot_file=None, bootstrap_energies=None)[source]

Given a .nc output and a number of intermediate states to insert for the temperature list, this function calculates and plots the heat capacity profile.

Parameters
  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • frame_end (int) – index of last frame to include in heat capacity calculation (default=-1)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default = “output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing states in ‘temperature_list’ (default=0)

  • frac_dT (float) – The fraction difference between temperatures points used to calculate finite difference derivatives (default=0.05)

  • plotfile (str) – path to filename to output plot

  • bootstrap_energies (3d numpy array (float)) – a custom replica_energies array to be used for bootstrapping calculations. Used instead of the energies in the .nc file.

Returns

  • Cv ( List( float * kJ/mol/K ) ) - The heat capacity values for all (including inserted intermediates) states

  • dCv ( List( float * kJ/mol/K ) ) - The uncertainty in the heat capacity values for intermediate states

  • new_temp_list ( List( float * unit.simtk.temperature ) ) - The temperature list corresponding to the heat capacity values in ‘Cv’

  • FWHM ( float * unit.simtk.temperature ) - Full width half maximum from heat capacity vs T

  • Tm ( float * unit.simtk.temperature ) - Melting point from heat capacity vs T

  • Cv_height ( float * kJ/mol/K ) - Relative height of heat capacity peak

  • N_eff( np.array( float ) ) - The number of effective samples at all (including inserted intermediates) states

cg_openmm.thermo.calc.get_heat_capacity_derivative(Cv, temperature_list, plotfile='dCv_dT.pdf')[source]

Fit a heat capacity vs T dataset to cubic spline, and compute derivatives

Parameters
  • Cv (Quantity or numpy 1D array) – heat capacity data series

  • temperature_list – List of temperatures used in replica exchange simulations

  • plotfile (str) – path to filename to output plot

Returns

  • dCv_out ( 1D numpy array (float) ) - 1st derivative of heat capacity, from a cubic spline evaluated at each point in Cv)

  • d2Cv_out ( 1D numpy array (float) ) - 2nd derivative of heat capacity, from a cubic spline evaluated at each point in Cv)

  • spline_tck ( scipy spline object (tuple) ) - knot points (t), coefficients (c), and order of the spline (k) fit to Cv data

cg_openmm.thermo.calc.get_heat_capacity_integral_entropy(Cv, temperature_list, plotfile='Cv_entropy_change.pdf')[source]

Fit Cv/T vs T dataset to cubic spline, and compute integrals (delta_S(T)) from T_0 to T_i Note this is the total entropy change from all states upon a temperature change, separate from the entropy of folding.

Parameters
  • Cv (Quantity or numpy 1D array) – heat capacity data series

  • temperature_list – List of temperatures used in replica exchange simulations

  • plotfile (str) – path to filename to output plot (default=’Cv_entropy_change.pdf’)

Returns

  • dCv_out ( 1D numpy array (float) ) - entropy changes from lowest temperature to each temperature in temperature_list

cg_openmm.thermo.calc.get_heat_capacity_reeval(U_kln, output_data='output/output.nc', frame_begin=0, sample_spacing=1, frame_end=-1, num_intermediate_states=0, frac_dT=0.05, plot_file_sim=None, plot_file_reeval=None, bootstrap_energies=None)[source]

Given an array of re-evaluated energies at a non-simulated set of force field parameters, and a corresponding temperature list, compute heat capacity as a function of temperature.

Parameters
  • U_kln (3d numpy array (float) with dimensions [replica, evaluated_state, frame]) – re-evaluated state energies array to be used for the MBAR calculation (first frame is frame_begin)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default = “output/output.nc”)

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • frame_end (int) – index of last frame to include in heat capacity calculation (default=-1)

  • num_intermediate_states (int) – The number of states to insert between existing states in ‘temperature_list’ (default=0)

  • frac_dT (float) – The fraction difference between temperatures points used to calculate finite difference derivatives (default=0.05)

  • plot_file_sim (str) – path to filename to output plot for simulated heat capacity (default=None)

  • plot_file_reeval (str) – path to filename to output plot for reevaluated heat capacity (default=None)

  • bootstrap_energies (3d numpy array (float)) – a custom replica_energies array to be used for bootstrapping calculations. Used instead of the energies in the .nc file. (default=None)

Returns

  • Cv_sim ( np.array( float * kJ/mol/K ) ) - The heat capacity values for all simulated (including inserted intermediates) states

  • dCv_sim ( np.array( float * kJ/mol/K ) ) - The uncertainty of Cv_sim values

  • Cv_reeval ( np.array( float * kJ/mol/K ) ) - The heat capacity values for all reevaluated (including inserted intermediates) states

  • dCv_reeval ( np.array( float * kJ/mol/K ) ) - The uncertainty of Cv_reeval values

  • full_T_list ( np.array( float * unit.simtk.temperature ) ) - The temperature list corresponding to the heat capacity values in ‘Cv’

  • FWHM ( float * unit.simtk.temperature ) - Full width half maximum from heat capacity vs T

  • Tm ( float * unit.simtk.temperature ) - Melting point from heat capacity vs T

  • Cv_height ( float * kJ/mol/K ) - Relative height of heat capacity peak

  • N_eff( np.array( float ) ) - The number of effective samples at all (including inserted intermediates) states

cg_openmm.thermo.calc.get_partial_heat_capacities(array_folded_states, frame_begin=0, sample_spacing=1, frame_end=-1, output_data='output/output.nc', num_intermediate_states=0, frac_dT=0.05, plot_file=None, bootstrap_energies=None)[source]

Given an array classifying each frame into discrete conformational states, compute the heat capacity curve, and contributions from each conformational class to the heat capacity curve. Uncertainties can be calculated using the function bootstrap_partial_heat_capacities.

***TODO: add the uncertainty calculation

Note

array_folded_states should include all frames after frame_begin

Note

The partial contributions to heat capacity must be weighted by the probabilities of finding each conformational class to reconstruct the total heat capacity curve.

Parameters
  • array_folded_states (2d numpy array (int)) – a precomputed array classifying the different conformational states

  • frame_begin (int) – index of first frame defining the range of samples to use as a production period (default=0)

  • sample_spacing (int) – spacing of uncorrelated data points, for example determined from pymbar timeseries subsampleCorrelatedData (default=1)

  • frame_end (int) – index of last frame to include in heat capacity calculation (default=-1)

  • output_data (str) – Path to the output data for a NetCDF-formatted file containing replica exchange simulation data (default = “output/output.nc”)

  • num_intermediate_states (int) – The number of states to insert between existing states in ‘temperature_list’ (default=0)

  • frac_dT (float) – The fraction difference between temperatures points used to calculate finite difference derivatives (default=0.05)

  • plotfile (str) – path to filename to output plot

  • bootstrap_energies (3d numpy array (float)) – a custom replica_energies array to be used for bootstrapping calculations. Used instead of the energies in the .nc file.

Returns

  • Cv ( dict ( List( float * kJ/mol/K ) ) ) - For each conformational class, the heat capacity values for all (including inserted intermediates) states

  • ***dCv ( dict ( List( float * kJ/mol/K ) ) ) - The uncertainty in the heat capacity values for intermediate states (not yet implemented)

  • new_temp_list ( List( float * unit.simtk.temperature ) ) - The temperature list corresponding to the heat capacity values in ‘Cv’

  • FWHM_partial ( dict ( float * unit.simtk.temperature ) ) - For each conformational class, full width half maximum from heat capacity vs T

  • Tm_partial ( dict ( float * unit.simtk.temperature ) ) - For each conformational class, melting point from heat capacity vs T

  • Cv_height_partial ( dict ( float * kJ/mol/K ) ) - For each conformational class, relative height of heat capacity peak

  • U_expect_confs ( dict ( np.array( float * kJ/mol ) ) - For each conformational class, the energy expectations (in kJ/mol) at each T, including intermediate states

cg_openmm.thermo.calc.plot_heat_capacity(Cv, dCv, temperature_list, file_name='heat_capacity.pdf')[source]

Given an array of temperature-dependent heat capacity values and the uncertainties in their estimates, this function plots the heat capacity curve.

Parameters
  • Cv (List( float * kJ/mol/K )) – The heat capacity data to plot.

  • dCv (List( float * kJ/mol/K )) – The uncertainties in the heat capacity data

  • temperature_list – List of temperatures used in replica exchange simulations

  • file_name (str) – The name/path of the file where plotting output will be written, default = “heat_capacity.pdf”

cg_openmm.thermo.calc.plot_partial_heat_capacities(Cv_partial, dCv, temperature_list, file_name='heat_capacity_partial.pdf')[source]

Given an array of temperature-dependent heat capacity values and the uncertainties in their estimates, this function plots the heat capacity curve.

Parameters
  • Cv_partial (dict ( list( float * kJ/mol/K ) )) – The heat capacity data to plot, grouped by conformational state.

  • dCv (dict ( list( float * kJ/mol/K ) )) – The uncertainties corresponding to Cv_partial

  • temperature_list – List of temperatures used in replica exchange simulations

  • file_name (str) – The name/path of the file where plotting output will be written, default = “heat_capacity_partial.pdf”