correlationplus package

Subpackages

Submodules

correlationplus.calculate module

correlationplus.calculate.DCCmatrixCalculation(N, Rvector, R_average)

This function calculates upper triangle of dynamical cross-correlation matrix.

correlationplus.calculate.calcENM_LMI(selectedAtoms, cut_off, method='ANM', nmodes=100, normalized=True, saveMatrix=True, out_file='nDCC.dat')

Calculate linear mutual information matrix based on elastic network model.

Parameters
  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • cut_off (int) – Cutoff radius in Angstrom unit for ANM or GNM. Default value is 15 for ANM and 10 for GNM.

  • method (string) – This string can only take two values: ANM or GNM ANM us the default value.

  • nmodes (int) – 100 modes are default for normal mode based nDCC calculations.

  • saveMatrix (bool) – If True, an output file for the correlations will be written be written.

  • out_file (string) – Output file name for the data matrix. Default value is nDCC.dat

Returns

ccMatrix – Cross-correlation matrix.

Return type

A numpy square matrix of floats

correlationplus.calculate.calcENMnDCC(selectedAtoms, cut_off, method='ANM', nmodes=100, normalized=True, saveMatrix=True, out_file='nDCC.dat')

Calculate normalized dynamical cross-correlations based on elastic network model.

Parameters
  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • cut_off (int) – Cutoff radius in Angstrom unit for ANM or GNM. Default value is 15 for ANM and 10 for GNM.

  • method (string) – This string can only take two values: ANM or GNM ANM us the default value.

  • nmodes (int) – 100 modes are default for normal mode based nDCC calculations.

  • saveMatrix (bool) – If True, an output file for the correlations will be written be written.

  • out_file (string) – Output file name for the data matrix. Default value is nDCC.dat

Returns

ccMatrix – Cross-correlation matrix.

Return type

A numpy square matrix of floats

correlationplus.calculate.calcMD_LMI(topology, trajectory, startingFrame=0, endingFrame=- 1, normalized=True, alignTrajectory=True, atomSelection='protein and name CA', saveMatrix=True, out_file='LMI')

Calculate linear mutual information when a topology and a trajectory file is provided.

Parameters
  • topology (string) – A PDB file.

  • trajectory (string) – A trajectory file in dcd, xtc or trr format.

  • startingFrame (int) – You can specify this value if you want to exclude some initial frames from your cross-correlation calculations. Default value is 0.

  • endingFrame (int) – You can specify this value if you want to calculate cross-correlation calculations up to a certain ending frame. Default value is -1 and it indicates the last frame in your trajectory.

  • normalized (bool) – Default value is True and it means that the linear mutual information matrix will be normalized.

  • alignTrajectory (bool) – Default value is True and it means that all frames in the trajectory will be aligned to the initial frame.

  • atomSelection (string) – Default atomSelection string is “protein and name CA”. However, this argument gives flexibility to select some other atoms of the protein or even non protein atoms. Please note that even if you select some other atoms, (if alignTrajectory=True) alignment of the system will still be performed using only Calpha atoms.

  • saveMatrix (bool) – If True, linear mutual information matrix will be written to an output file.

  • out_file (string) – Output file name for the linear mutual information matrix. Default value is LMI and the file extension is .dat.

Returns

lmiMatrix – Linear mutual information matrix.

Return type

A numpy square matrix of floats

correlationplus.calculate.calcMDnDCC(topology, trajectory, startingFrame=0, endingFrame=- 1, normalized=True, alignTrajectory=True, saveMatrix=True, out_file='DCC')

Calculate normalized dynamical cross-correlations when a topology and a trajectory file is given.

Parameters
  • topology (string) – A PDB file.

  • trajectory (string) – A trajectory file in dcd, xtc or trr format.

  • startingFrame (int) – You can specify this value if you want to exclude some initial frames from your cross-correlation calculations. Default value is 0.

  • endingFrame (int) – You can specify this value if you want to calculate cross-correlation calculations up to a certain ending frame. Default value is -1 and it indicates the last frame in your trajectory.

  • normalized (bool) – Default value is True and it means that the cross-correlation matrix will be normalized.

  • alignTrajectory (bool) – Default value is True and it means that all frames in the trajectory will be aligned to the initial frame.

  • saveMatrix (bool) – If True, cross-correlation matrix will be written to an output file.

  • out_file (string) – Output file name for the cross-correlation matrix. Default value is DCC and the file extension is .dat.

Returns

ccMatrix – Cross-correlation matrix.

Return type

A numpy square matrix of floats

correlationplus.calculate.calcMDtlDCC(topology, trajectory, startingFrame=0, endingFrame=- 1, timeLag=0, normalized=True, alignTrajectory=True, saveMatrix=True, out_file='DCC')

Calculate normalized dynamical cross-correlations when a topology and a trajectory file is given.

Parameters
  • topology (string) – A PDB file.

  • trajectory (string) – A trajectory file in dcd, xtc or trr format.

  • startingFrame (int) – You can specify this value if you want to exclude some initial frames from your cross-correlation calculations. Default value is 0.

  • endingFrame (int) – You can specify this value if you want to calculate cross-correlation calculations up to a certain ending frame. Default value is -1 and it indicates the last frame in your trajectory.

  • timeLag (int) – timeLag is not a time. In fact, it is just an integer specifying frame delay. You have to multiply it with sampling time to obtain real time lag or time delay. For example, if you select timeLag=5 and your sampling time is 200 ps, your time delay/lag is 1 ns.

  • normalized (bool) – Default value is True and it means that the cross-correlation matrix will be normalized. In fact, we should note that time-lagged dynamical cross-correlation matrices are not truly ‘normalized’ like equal time dynamical cross-correlations.

  • alignTrajectory (bool) – Default value is True and it means that all frames in the trajectory will be aligned to the initial frame.

  • saveMatrix (bool) – If True, cross-correlation matrix will be written to an output file.

  • out_file (string) – Output file name for the cross-correlation matrix. Default value is DCC and the file extension is .dat.

Returns

ccMatrix – time lagged cross-correlation matrix.

Return type

A numpy square matrix of floats

correlationplus.calculate.timeLaggedDCCmatrixCalculation(N, Rvector, R_average, timeLag)

This function calculates upper triangle of time-lagged dynamical cross-correlation matrix. If time lag is zero, it gives (equal-time) dynamical cross-correlations.

correlationplus.calculate.writeSparseCorrData(out_file, cMatrix, selectedAtoms, Ctype: bool, symmetric: bool)

This function writes correlation data in sparse format.

In a sparse matrix, only nonzero elements of the matrix are given in 3 columns format: i j C_ij i and j are the indices of the matrix positions (or residue indices, not residue IDs given in PDB files). It returns nothing.

Parameters
  • out_file (string) – Correlation file to write.

  • cMatrix (A numpy square matrix of floats) – Correlation matrix.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • Ctype (boolean) – If Ctype=True, location indices i and j indices start from 0. Otherwise, it is assumed to be starting from 1.

  • symmetric (boolean) – If you select it True, it will write the upper (or lower) triangle.

Returns

Return type

Nothing.

correlationplus.centralityAnalysis module

correlationplus.centralityAnalysis.buildDynamicsNetwork(ccMatrix, distanceMatrix, valueFilter, distanceFilter, selectedAtoms)

This function build a network (graph) from a dynamical correlation matrix.

The C_ij correlation values are converted to network edges according to -log(abs(C_ij)) (See https://doi.org/10.1073/pnas.0810961106 for details). When only C_ij values are between [0-1], it gives consistent results.

Parameters
  • ccMatrix (Numpy matrix) – It is a numpy matrix of typically nDCC, nLMI or Generalized Correlations.

  • distanceMatrix (Numpy matrix) – The distances between Calpha atoms of the protein stored in a matrix.

  • valueFilter (float) – The ccMatrix values lower than the valueFilter will be ignored.

  • distanceFilter (float) – The distance values higher than the distanceFilter will be ignored and they will not be considered as edges in a network. This kind of value pruning may work for low conformational change MD simulations or ENM based calculations. However, if there are large scale structural changes, it will be necessary to eliminate the edges based on contacts and their preservation in during the entire simulation.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

Returns

Return type

A networkx graph object

correlationplus.centralityAnalysis.buildSequenceNetwork(ccMatrix, distanceMatrix, valueFilter, distanceFilter, selectedAtoms)

This function build a network (graph) from a dynamical correlation matrix.

The C_ij correlation values are converted to network edges according to (1.0/abs(C_ij)). It diminishes very fast but it doesn’t give negative weigths for values greater than 1.0.

Parameters
  • ccMatrix (Numpy matrix) – It is a numpy matrix of typically DCC, LMI or any other matrix where absolute values correlations are not between zero and one.

  • distanceMatrix (Numpy matrix) – The distances between Calpha atoms of the protein stored in a matrix.

  • valueFilter (float) – The ccMatrix values lower than the valueFilter will be ignored.

  • distanceFilter (float) – The distance values higher than the distanceFilter will be ignored and they will not be considered as edges in a network. This kind of value pruning may work for low conformational change MD simulations or ENM based calculations. However, if there are large scale structural changes, it will be necessary to eliminate the edges based on contacts and their preservation in during the entire simulation.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

Returns

Return type

A networkx graph object

correlationplus.centralityAnalysis.centralityAnalysis(graph, valueFilter, distanceFilter, out_file, centrality, selectedAtoms)

This function calculates various network (graph) centralities of a protein.

This function calculates some network centrality measures such as degree, betweenness, closeness, current flow betweenness and eigenvector. This function needs Python 3.6 or later to maintain dictionary order.!!!

Parameters
  • graph (object) – It is a Networkx Graph object.

  • valueFilter (float) – The ccMatrix values lower than the valueFilter will be ignored.

  • distanceFilter (float) – The distance values higher than the distanceFilter will be ignored and they will not be considered as edges in a network. This kind of value pruning may work for low conformational change MD simulations or ENM based calculations. However, if there are large scale structural changes, it will be necessary to eliminate the edges based on contacts and their preservation in during the entire simulation.

  • out_file (string) – Prefix of the output file. According to the centralty measure, it will be extended.

  • centrality (string) – It can have ‘degree’, ‘betweenness’, ‘closeness’, ‘current_flow_betweenness’, ‘current_flow_closeness’, ‘eigenvector’ or ‘community’.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

Returns

Return type

Nothing

correlationplus.centralityAnalysis.plotCentralities(centrality, centralityArray, out_file, selectedAtoms, scalingFactor)

Plots the centrality values on a 2D graph.

The centrality values are plotted on a 2D png file. If there are at least two chains, the function produces a figure for each chain.

Parameters
  • centrality (string) – It can have ‘degree’, ‘betweenness’, ‘closeness’, ‘current_flow_betweenness’ or ‘current_flow_closeness’.

  • centralityArray (A numpy data array ?) – It is a numpy matrix of typically nDCC, LMI or Generalized Correlations.

  • out_file (string) – Prefix of the output file. According to the centralty measure, it will be extended.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

  • ScalingFactor (float) – Sometimes, the values of the centrality arrays are too small. The scaling factor multiplies the array to make the values visible in the Bfactor colums.

Returns

Return type

Nothing

correlationplus.centralityAnalysis.projectCentralitiesOntoProteinPyMol(centrality, centralityArray, out_file, selectedAtoms, scalingFactor)

Produces PyMol output files for visualizing protein centralities.

This function writes a pml file and a PDB file that can be viewed in VMD. Bfactor field of the protein contains the centrality information. The first N residues with the highest centrality are highlighed in VDW representation. that that contains the centralities on on Bfactor field of the pdb. The output files can be visualized with VMD (Visual Molecular dynamics) program as follows: pymol output.pml

Parameters
  • centrality (string) – It can have ‘degree’, ‘betweenness’, ‘closeness’, ‘current_flow_betweenness’ or ‘current_flow_closeness’.

  • centralityArray (A numpy data array ?) – It is a numpy matrix of typically nDCC, LMI or Generalized Correlations.

  • out_file (string) – Prefix of the output file. According to the centralty measure, it will be extended.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

  • ScalingFactor (float) – Sometimes, the values of the centrality arrays are too small. The scaling factor multiplies the array to make the values visible in the Bfactor colums.

Returns

Return type

Nothing

correlationplus.centralityAnalysis.projectCentralitiesOntoProteinVMD(centrality, centralityArray, out_file, selectedAtoms, scalingFactor)

Produces VMD output files for visualizing protein centralities.

This function writes a tcl file and a PDB file that can be viewed in VMD. Bfactor field of the protein contains the centrality information. The first N residues with the highest centrality are highlighed in VDW representation. that that contains the centralities on on Bfactor field of the pdb. The output files can be visualized with VMD (Visual Molecular dynamics) program as follows. i) Load your pdb file, whether via command line or graphical interface. ii) Go to Extensions -> Tk Console and then iii) source vmd-output-general.tcl It can take some time to load the general script.

Parameters
  • centrality (string) – It can have ‘degree’, ‘betweenness’, ‘closeness’, ‘current_flow_betweenness’ or ‘current_flow_closeness’.

  • centralityArray (A numpy data array ?) – It is a numpy matrix of typically nDCC, LMI or Generalized Correlations.

  • out_file (string) – Prefix of the output file. According to the centrality measure, it will be extended.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

  • ScalingFactor (float) – Sometimes, the values of the centrality arrays are too small. The scaling factor multiplies the array to make the values visible in the Bfactor colums.

Returns

Return type

Nothing

correlationplus.centralityAnalysis.projectCommunitiesOntoProteinPyMol(sortedCommunities, out_file, selectedAtoms)

Produces PyMol output files for visualizing protein communities.

This function writes a pml file and a PDB file that can be viewed in PyMol. Occupancy field of the protein contains the community information.

The output files can be visualized with PyMol program as follows: pymol outputfile.pml

Parameters
  • sortedCommunities (Iterator over tuples of sets of nodes) – It is a tuple of lists. Each list contain a community.

  • out_file (string) – Prefix of the output file. According to the centralty measure, it will be extended.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

Returns

Return type

Nothing

correlationplus.centralityAnalysis.projectCommunitiesOntoProteinVMD(sortedCommunities, out_file, selectedAtoms)

Produces VMD output files for visualizing protein communities.

This function writes a tcl file and a PDB file that can be viewed in VMD. Occupancy field of the protein contains the community information.

The output files can be visualized with VMD (Visual Molecular dynamics) program as follows. i) Load your pdb file, whether via command line or graphical interface. ii) Go to Extensions -> Tk Console and then iii) source vmd-output-general.tcl

Parameters
  • sortedCommunities (Iterator over tuples of sets of nodes) – It is a tuple of lists. Each list contain a community.

  • out_file (string) – Prefix of the output file. According to the centralty measure, it will be extended.

  • selectedAtoms (object) – This is a prody.parsePDB object of typically CA atoms of a protein.

Returns

Return type

Nothing

correlationplus.visualize module

correlationplus.visualize.cmap_discretize(cmap, N)

Creates a discrete colormap from the continuous colormap cmap.

Parameters
  • cmap (colormap instance, eg. cm.jet.) –

  • N (number of colors.) –

Returns

cmap

Return type

A discrete color map.

Example

x = resize(arange(100), (5,100)) djet = cmap_discretize(cm.jet, 5) imshow(x, cmap=djet)

correlationplus.visualize.convertLMIdata2Matrix(inp_file, writeAllOutput: bool)

This function parses LMI matrix and returns a numpy array. If the It can handle both full matrix format or g_correlation format.

Parameters
  • inp_file (string) – LMI file to read.

  • writeAllOutput (bool) – If True, an output file for the LMI values will be written in matrix format. The matrix does not contain residue names etc.

Returns

cc – LMI values in matrix format.

Return type

A numpy array of float value arrays.

correlationplus.visualize.distanceDistribution(ccMatrix, out_file, title, selectedAtoms, absoluteValues: bool, writeAllOutput: bool)

Plot inter-chain correlations vs distances to a png files.

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • minColorBarLimit (signed int) – Mostly, -1 or 0.

  • maxColorBarLimit (unsigned int) – Mostly, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • absoluteValues (bool) – If True, an absolute values of correlations will be consideered.

  • writeAllOutput (bool) – If True, an output file for distances and correlations. This can be useful to see their distribution as well as individual values.

Returns

Return type

Nothing

correlationplus.visualize.filterCorrelationMapByDistance(ccMatrix, out_file, title, selectedAtoms, disMinValue, disMaxValue, absoluteValues: bool, writeAllOutput: bool)

Zero correlations lower than disMinValue and higher than disMaxValue.

If residues are closer to each other than a certain distance (disMinValue), make these correlations zero. If residues are farther to each other than a certain distance (disMaxValue), make these correlations also zero. This filter can be useful to get rid of high short distance correlations or just to visualize correlations in a window of distances. This function returns a filtered ccMatrix.

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • disMinValue (float) – A distance value in Angstrom unit. For example, it is good to remove high correlations for residues within less than 5.0 Angstrom distance to have a clear visualization. Default value is 0.0.

  • disMaxValue (float) – A distance value in Angstrom unit. The residues with this value or higher will not be visualized with PyMol or VMD. Default value is 9999.0 Angstrom.

  • absoluteValues (bool) – If True, an absolute values file will be written.

  • writeAllOutput (bool) – If True, an output file for distances and correlation values can be written. This can be useful to see their distribution as well as individual values.

Returns

Return type

Nothing

correlationplus.visualize.findCommonCorePDB(selectedAtomSet1, selectedAtomSet2)

Finds a common set of residues between different conformations of a protein.

This function assumes that two structures are obtained exactly from the same species and there is not any mutation in any one of them. This can happen when two conformations of a protein are obtained with different number of missing atoms. Under these conditions, we are trying to match two structures and find the common core of two structures that have different number of CA atoms. We will use this information to subtract two correlation maps!

Parameters
  • ccMatrix1 (A numpy square matrix of floats) – The first correlation matrix.

  • ccMatrix2 (A numpy square matrix of floats) – The second correlation matrix.

  • minColorBarLimit (signed int) – If nDCC maps -2. If absndcc or lmi, -1.

  • maxColorBarLimit (unsigned int) – If nDCC maps 2. If absndcc or lmi, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtomSet1 (prody object) – A list of -typically CA- atoms selected from the first PDB file.

  • selectedAtomSet2 (prody object) – A list of -typically CA- atoms selected from the second PDB file.

Returns

commonCoreDictionary – A dictionary of residues in conformation A and corresponding residue in conformation B.

Return type

dictionary

correlationplus.visualize.generatePNG(ccMatrix, minColorBarLimit, maxColorBarLimit, numOfLabels, out_file, title, selectedAtoms)

Generates a heatmap in PNG format from the ccMatrix.

Parameters
  • ccMatrix (A numpy square matrix of floats) – The first correlation matrix.

  • minColorBarLimit (signed int) – If nDCC maps -1. If absndcc or lmi, 0.

  • maxColorBarLimit (unsigned int) – If nDCC maps 1. If absndcc or lmi, 1.

  • numOfLabels (int) – Number of labels on colorbar.

  • out_file (string) – prefix for the output png files.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

Returns

Return type

Nothing

correlationplus.visualize.interChainCorrelationMaps(ccMatrix, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtoms, saveMatrix)

Plot inter-chain correlations to different png files, if there are at least two chains!

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • minColorBarLimit (signed int) – Mostly, -1 or 0.

  • maxColorBarLimit (unsigned int) – Mostly, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • saveMatrix (bool) – If True, an output file for the correlations will be written be written.

Returns

Return type

Nothing

correlationplus.visualize.intraChainCorrelationMaps(ccMatrix, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtoms, saveMatrix)

Plot intra-chain correlations to different png files, if there are at least two chains!

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • minColorBarLimit (signed int) – Mostly, -1 or 0.

  • maxColorBarLimit (unsigned int) – Mostly, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • saveMatrix (bool) – If True, an output file for the correlations will be written be written.

Returns

Return type

Nothing

correlationplus.visualize.overallCorrelationMap(ccMatrix, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtoms)

Plots nDCC maps for the whole structure.

Parameters
  • ccMatrix (A numpy square matrix of floats) –

  • minColorBarLimit (signed int) – Mostly, -1 or 0.

  • maxColorBarLimit (unsigned int) – Mostly, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

Returns

Return type

Nothing

correlationplus.visualize.overallNonUniformDifferenceMap(ccMatrix1, ccMatrix2, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtomSet1, selectedAtomSet2)

Plots the difference map between correlation maps for the entire structure. Sizes of ccMatrix1 and ccMatrix2 are not identical. A mapping for matching residues is performed before difference map plotting.

Parameters
  • ccMatrix1 (A numpy square matrix of floats) – The first correlation matrix.

  • ccMatrix2 (A numpy square matrix of floats) – The second correlation matrix.

  • minColorBarLimit (signed int) – If nDCC maps -2. If absndcc or lmi, -1.

  • maxColorBarLimit (unsigned int) – If nDCC maps 2. If absndcc or lmi, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtomSet1 (prody object) – A list of -typically CA- atoms selected from the first PDB file.

  • selectedAtomSet2 (prody object) – A list of -typically CA- atoms selected from the second PDB file.

Returns

Return type

Nothing

correlationplus.visualize.overallUniformDifferenceMap(ccMatrix1, ccMatrix2, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtoms)

Plots the difference map between correlation maps for the entire structure. Sizes of ccMatrix1 and ccMatrix2 are identical. Only one atom set is sufficient to plot the difference map.

Parameters
  • ccMatrix1 (A numpy square matrix of floats) – The first correlation matrix.

  • ccMatrix2 (A numpy square matrix of floats) – The second correlation matrix.

  • minColorBarLimit (signed int) – If nDCC maps -2. If absndcc or lmi, -1.

  • maxColorBarLimit (unsigned int) – If nDCC maps 2. If absndcc or lmi, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

Returns

Return type

Nothing

correlationplus.visualize.parseEVcouplingsScores(inp_file, selectedAtoms, writeAllOutput: bool)

This function parses sequence coupling scores obtained from EVCoupling Server at https://evcouplings.org/. The file is in csv format and we tested it only for monomeric cases but it is expected to work on multimeric cases as well. Basically, the function converts column-wise data to an array. It returns a numpy array.

Parameters
  • inp_file (string) – Couplings file to read.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • writeAllOutput (bool) – If True, an output file for the coupling values will be written in matrix format. The matrix does not contain residue names etc. They are obtained from a pdb file you provided.

Returns

cc – EVcoupling values in matrix format.

Return type

A numpy array of float value arrays.

correlationplus.visualize.parseElasticityGraph(inp_file, selectedAtoms, writeAllOutput: bool)

This function parses force constants data (a file with .enm extensio) produced by FitNMA program of Patrice Koehl.

The data in this file is in the following format: i_Num i_Type i_Resname i_ChainID i_ResID j_Num j_Type j_Resname j_ChainID j_ResID forceConstant 2 CA THR A 1 9 CA THR A 2 0.844158 Comment lines start with # character. It returns a numpy array.

Parameters
  • inp_file (string) – Force constants file to read.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • writeAllOutput (bool) – If True, an output file for the coupling values will be written in matrix format. The matrix does not contain residue names etc. They are obtained from a pdb file you provided.

Returns

cc

Return type

A numpy array of float value arrays.

correlationplus.visualize.parseSparseCorrData(inp_file, selectedAtoms, Ctype: bool, symmetric: bool, writeAllOutput: bool)

This function parses correlation data given in sparse format.

In a sparse matrix, only nonzero elements of the matrix are given in 3 columns format: i j C_ij i and j are the indices of the matrix positions (or residue indices). It returns a numpy array.

Parameters
  • inp_file (string) – Couplings file to read.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • Ctype (boolean) – If Ctype=True, location indices i and j indices start from 0. Otherwise, it is assumed to be starting from 1.

  • symmetric (boolean) – If you select it True, it will make the matrix symmetric.

  • writeAllOutput (bool) – If True, an output file for the coupling values will be written in matrix format. The matrix does not contain residue names etc. They are obtained from a pdb file you provided.

Returns

cc

Return type

A numpy array of float value arrays.

correlationplus.visualize.projectCorrelationsOntoProteinPyMol(pdb_file, ccMatrix, pml_out_file, selectedAtoms, vminFilter, vmaxFilter, cylinderRadiusScaler, absoluteValues: bool, writeAllOutput: bool)

Produces pml files that contains the correlations between residues i and j.

It produces three output files: 1-A general file that contains all correlation. 2-(If there are at least two chains) Files that contain interchain correlations. 3-(If there are at least two chains) Files that contain intrachain correlations of individual chains. The output files can be visualized with PyMol program.

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • pml_out_file (string) – prefix for the output pml files.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • vminFilter (float) – Only correlation values greater than this threshold will be written to tcl and pml visualization scripts. For example, 0.3 can be a good threshold for normalized dynamical cross-correlation data.

  • vmaxFilter (float) – Only correlation values equal or lower than this threshold will be written to tcl and pml visualization scripts. It is useful if you would like to analyze correlations in an interval.

  • cylinderRadiusScaler (a float value.) – It adjust radius of cylinders to be displayed in PyMol. The value is multiplied with the corresponding correlation value. Recommended values are between 0.01-2.00.

  • absoluteValues (bool) – If True, an absolute values of correlations will be consideered.

  • writeAllOutput (bool) – If True, an output file for distances and correlations. This can be useful to see their distribution as well as individual values.

Returns

Return type

Nothing

correlationplus.visualize.projectCorrelationsOntoProteinVMD(pdb_file, ccMatrix, vmd_out_file, selectedAtoms, vminFilter, vmaxFilter, cylinderRadiusScaler, absoluteValues: bool, writeAllOutput: bool)

Produces tcl files that contains the correlations between residues i and j.

It produces three output files: 1-A general file that contains all correlation. 2-(If there are at least two chains) Files that contain interchain correlations. 3-(If there are at least two chains) Files that contain intrachain correlations of individual chains. The output files can be visualized with VMD (Visual Molecular dynamics) program as follows. Load your pdb file, whether via command line or graphical interface. Go to Extensions -> Tk Console and then ‘source vmd-output-general.tcl’ It can take some to load the general script.

Parameters
  • ccMatrix (A numpy square matrix of floats) – Cross-correlation matrix.

  • vmd_out_file (string) – prefix for the output tcl files.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

  • vminFilter (float) – Only correlation values greater than this threshold will be written to tcl and pml visualization scripts. For example, 0.3 can be a good threshold for normalized dynamical cross-correlation data.

  • vmaxFilter (float) – Only correlation values equal or lower than this threshold will be written to tcl and pml visualization scripts. It is useful if you would like to analyze correlations in an interval.

  • cylinderRadiusScaler (a float value.) – It adjust radius of cylinders to be displayed in VMD. The value is multiplied with the corresponding correlation value. Recommended values are between 0.01-2.00.

  • absoluteValues (bool) – If True, an absolute values of correlations will be consideered.

  • writeAllOutput (bool) – If True, an output file for distances and correlations. This can be useful to see their distribution as well as individual values.

Returns

Return type

Nothing

correlationplus.visualize.triangulateMaps(ccMatrix1, ccMatrix2, minColorBarLimit, maxColorBarLimit, out_file, title, selectedAtoms)

Given two correlation maps, it puts them into upper and lower triangles. Sizes of ccMatrix1 and ccMatrix2 are identical. Only one atom set is sufficient to identify the residue IDs.

Parameters
  • ccMatrix1 (A numpy square matrix of floats) – The first correlation matrix.

  • ccMatrix2 (A numpy square matrix of floats) – The second correlation matrix.

  • minColorBarLimit (signed int) – If nDCC maps -1. If absndcc or lmi, 0.

  • maxColorBarLimit (unsigned int) – If nDCC maps 1. If absndcc or lmi, 1.

  • out_file (string) – prefix for the output png files. This prefix will get _overall.png extension.

  • title (string) – Title of the figure.

  • selectedAtoms (prody object) – A list of -typically CA- atoms selected from the parsed PDB file.

Returns

ccMatrixCombined – This is a matrix that contain ccMatrix1 in the upper triangle and the ccMatrix2 in the lower triangle.

Return type

A numpy square matrix of floats

Module contents

Program Name: correlationplus Author : Mustafa TEKPINAR Email : tekpinar@buffalo.edu

PurposeA Python package to calculate, visualize and analyze

correlations of proteins.