In this notebook, we will demonstrate the features of BigDFT for creating coarsed grained views of large, complex systems. The notebook will begin by running a large, linear scaling calculation on the system of choice. We will then use the auto fragmentation feature, which breaks down a system into smaller parts. We will continue by using the bond order tool which can be used to quantify the interaction strength between parts. We will then use these tools together to generate QM/MM runs. Finally, we will perform some basic graph analysis on these systems.
from __future__ import print_function
import pandas as pd
from os.path import join
%matplotlib inline
The first step is to select the system you are interested in studying.
system = "Laccase" # "1CRN" "MG" "Laccase" "Pentacene"
First we setup and run a calculation on the full system. To do the calculation, we first need a system calculator.
from BigDFT import Calculators as C
code = C.SystemCalculator()
code.update_global_options(skip=True)
Next we need an input file. We will use basic parameters, but it is important to use the linear-scaling mode, and to write the matrices to file.
from BigDFT import Inputfiles as I
inp = I.Inputfile()
inp.set_xc("PBE")
inp.set_hgrid(0.4)
inp.write_support_function_matrices()
inp["import"] = "linear"
Finally we run the actual calculation.
from BigDFT import Logfiles
log=code.run(input=inp, posinp=join("Geometries", system+".xyz"), name=system, run_dir="Output")
The next step is to try and break down this system using the auto fragmentation tool. To begin, we will start by reading in the file, and have each atom of the system be its own fragment.
from BigDFT import Fragments as F
try:
from BigDFT.FragmentIO import XYZReader
except ImportError:
from BigDFT.XYZ import XYZReader
fullsys = F.System()
with XYZReader(join("Geometries", system+".xyz")) as ifile:
for i, line in enumerate(ifile):
fullsys["ATOM:"+str(i)] = F.Fragment([line])
Next we will use the BigDFTool for post processing the calculation. It has a feature for automatically fragmenting a system. You only need to specify to it a choice of cutoff.
from BigDFT import PostProcessing as PP
btool = PP.BigDFTool()
We will use a cutoff value of 0.05 for this notebook.
import pickle
pname = join("Cache-BO", system+".pickle")
try:
with open(pname, "rb") as ifile:
resys = pickle.load(ifile)
except:
resys = btool.auto_fragment(fullsys, log, 0.05, verbose=True, criteria="bondorder")
with open(pname, "wb") as ofile:
pickle.dump(resys, ofile)
We can plot the purity values of this fragmented system to verify the procedure.
from matplotlib import pyplot as plt
fig, axs = plt.subplots(1,1,figsize=(12,4))
F.plot_fragment_information(axs, {x: abs(resys[x].purity_indicator) for x in resys})
axs.axhline("0.05")
We can also look at the sizes of the various fragments.
from matplotlib import pyplot as plt
fig, axs = plt.subplots(1,1)
axs.plot(sorted([len(x) for x in resys.values()]), 'x--')
axs.set_ylabel("Fragment Size")
axs.set_xlabel("Fragment")
The number of fragments will depend on our choice of cutoff. We can explore a number of different cutoff values and see how that affects the system.
from copy import deepcopy
from BigDFT.PostProcessing import BigDFTool
btool = BigDFTool()
varfrag = deepcopy(fullsys)
df = []
kxs = None
for cutoff in [1.0, 0.5, 0.25, 0.10, 0.0875, 0.075, 0.05, 0.0375, 0.025, 0.0125, 0.0075, 0.0050, 0.0025, 0.00125]:
pname = join("Cache-BO", system+"-"+str(cutoff)+".pickle")
try:
with open(pname, "rb") as ifile:
varfrag = pickle.load(ifile)
except:
varfrag=btool.auto_fragment(system=varfrag,cutoff=cutoff,log=log,kxs=kxs,criteria="bondorder")
with open(pname, "wb") as ofile:
pickle.dump(varfrag, ofile)
df.append([cutoff, len(varfrag)])
display(pd.DataFrame(df, columns=["Cutoff", "Number of Fragments"]))
We will also generate an image of the fragmentations. You can load this TCL script into VMD.
from BigDFT import Visualization as V
vmd = V.VMDGenerator()
vmd.visualize_fragments(resys, join("Viz-BO",system+".tcl"), join("Viz-BO", system+".xyz"))
With the fragmentation established, we will next try to do QM/MM calculations. For the QM/MM calculation, we will also need to know the multipole values so that we can verify the correctness.
resys.set_atom_multipoles(log)
Now we need to pick a target fragment, which is the fragment we wish to reproduce the values of. We want a good signal to noise ratio, so we will try to pick a fragment with a large dipole.
from numpy.linalg import norm
d1_strength = {x: norm(resys[x].d1()) for x in resys}
fig, axs = plt.subplots(1,1, figsize=(12,4))
F.plot_fragment_information(axs, d1_strength)
axs.set_ylabel("Dipole Strength", fontsize=12)
target = max(resys, key=d1_strength.get)
print(target, d1_strength[target])
And save this information to file.
pname = join("Cache-BO", system+"-resys-target.pickle")
with open(pname, "wb") as ofile:
pickle.dump((target, resys), ofile)
We will use the bond order tool as a measure of interaction strength. In the case of QM/MM, we care about the cumulative sum of the bond order. We will try to drive that sum down until only a small amount of density is being leaked from the QM/MM region.
pname = join("Cache-BO", system+"-fbo.pickle")
try:
with open(pname, "rb") as ifile:
bondorder = pickle.load(ifile)
except:
bondorder = btool.fragment_bond_order(resys, [target], resys.keys(), log)
bondorder = bondorder[target]
with open(pname, "wb") as ofile:
pickle.dump(bondorder, ofile)
from numpy import cumsum
fig, axs = plt.subplots(1,1)
axs.set_yscale("log")
axs.set_ylim(1e-7,1)
axs.set_xlim(0,100)
axs.set_xlabel("Fragment", fontsize=12)
axs.set_ylabel("Remaining Bond Order", fontsize=12)
axs.plot(sum(bondorder.values()) - cumsum(sorted(bondorder.values(), reverse=True)), '.--')
Now we can setup the QM/MM calculation, once again using the BigDFTool. This will look at the cumulative bond order, and add fragments to the buffer until that is reduced below our choice of cutoff. Along the way, we can also calculate the charge of the QM region, as well as its size and radius.
from copy import deepcopy
df = []
qmmmsys_bo = {}
charges_bo = {}
cutoffs = [100, 1, 0.1, 0.01, 0.001]
kxs = btool.get_matrix_kxs(log)
for cut in cutoffs:
qmmmsys_bo[cut], mm = btool.create_qmmm_system(resys, log, target, cut, kxs=kxs)
cv = 0
for fragid, frag in qmmmsys_bo[cut].items():
for at in frag:
cv += at.q0
charges_bo[cut] = cv
remainder = sum(bondorder.values()) - sum([bondorder[x] for x in qmmmsys_bo[cut]])
size = sum([len(x) for x in qmmmsys_bo[cut].values()])
distance = max([F.pairwise_distance(qmmmsys_bo[cut][target],
qmmmsys_bo[cut][x]) for x in qmmmsys_bo[cut]])
df.append([cut, remainder, size, charges_bo[cut], distance])
pname = join("Cache-BO", system+"-qmmm-"+str(cut)+".pickle")
with open(pname, "wb") as ofile:
pickle.dump((target, qmmmsys_bo[cut]), ofile)
display(pd.DataFrame(df, columns=["Cutoff", "Charge Remainder",
"Size of QM Region", "Charge of QM Region",
"Distance from Target Included"]))
With the buffer region created, we can now go ahead and perform the QM/MM calculations. Note that we round the charge to the nearest electron.
qmmm_logs_bo = {}
for cut in cutoffs[:]:
qmmm_inp = deepcopy(inp)
qmmm_inp.setdefault("dft",{})["qcharge"] = 1.0 * round(charges_bo[cut])
qmmm_logs_bo[cut] = code.run(input=qmmm_inp, posinp=qmmmsys_bo[cut].get_posinp(),
name=system+"-"+str(cut), run_dir="QMMMOut-BO")
qmmmsys_bo[cut].set_atom_multipoles(qmmm_logs_bo[cut])
Last we will look at the error in the dipole. We can do this comparison both for the norm of the error, and the angle.
from numpy.linalg import norm
from numpy import dot, arccos, pi
df = []
ref_d0 = resys[target].d0()
ref_cv = 0
for at in resys[target]:
ref_cv += at.q0
for cut in cutoffs[:]:
computed_cv = 0
for at in qmmmsys_bo[cut][target]:
computed_cv += at.q0
computed_d0 = qmmmsys_bo[cut][target].d0()
error = norm(computed_d0 - ref_d0) / norm(ref_d0)
angle = (180/pi) * arccos(dot(computed_d0, ref_d0)/(norm(computed_d0) * norm(ref_d0)))
remainder = sum(bondorder.values()) - sum([bondorder[x] for x in qmmmsys_bo[cut]])
df.append([cut, remainder, computed_cv - ref_cv, error, angle])
summary = {}
summary["Charge Error"] = computed_cv - ref_cv
summary["D1 Error"] = error
summary["D1 Angle"] = angle
summary["Size"] = sum([len(x) for x in qmmmsys_bo[cut].values()])
summary["Remainder"] = remainder
pname = join("Cache-BO", system+"-"+str(cut)+"-spillageqmmm.pickle")
with open(pname, "wb") as ofile:
pickle.dump(summary, ofile)
display(pd.DataFrame(df, columns=["Cutoff", "Remainder", "Charge Error",
"Error (Relative Norm)", "Error (Angle Degrees)"]))
Another option is to build a QM/MM region using distance as a criteria. We can handle this case is much the same way, just changing the criteria value.
from copy import deepcopy
df = []
qmmmsys_distance = {}
charges_distance = {}
distances = [2, 3, 4, 5, 6]
for cut in distances:
qmmmsys_distance[cut], mm = btool.create_qmmm_system(resys, log,
target, cut, criteria="distance")
cv = 0
for fragid, frag in qmmmsys_distance[cut].items():
for at in frag:
cv += at.q0
charges_distance[cut] = cv
size = sum([len(x) for x in qmmmsys_distance[cut].values()])
distance = max([F.pairwise_distance(qmmmsys_distance[cut][target],
qmmmsys_distance[cut][x]) for x in qmmmsys_distance[cut]])
df.append([cut, size, charges_distance[cut], distance])
display(pd.DataFrame(df, columns=["Cutoff", "Size of QM Region",
"Charge of QM Region", "Distance from Target Included"]))
With this second type of buffer region created, we can now go ahead and perform the QM/MM calculations.
qmmm_logs_distance = {}
for cut in distances[:]:
qmmm_inp = deepcopy(inp)
qmmm_inp.setdefault("dft",{})["qcharge"] = 1.0 * round(charges_distance[cut])
qmmm_logs_distance[cut] = code.run(input=qmmm_inp,
posinp=qmmmsys_distance[cut].get_posinp(),
name=system+"-dist-"+str(cut), run_dir="QMMMOut-BO")
qmmmsys_distance[cut].set_atom_multipoles(qmmm_logs_distance[cut])
Once again, we do a comparison based on dipole values.
from numpy.linalg import norm
from numpy import dot, arccos, pi
df = []
ref_d0 = resys[target].d0()
ref_cv = 0
for at in resys[target]:
ref_cv += at.q0
for cut in distances[:]:
computed_cv = 0
for at in qmmmsys_distance[cut][target]:
computed_cv += at.q0
computed_d0 = qmmmsys_distance[cut][target].d0()
error = norm(computed_d0 - ref_d0) / norm(ref_d0)
angle = (180/pi) * arccos(dot(computed_d0, ref_d0)/(norm(computed_d0) * norm(ref_d0)))
df.append([cut, computed_cv - ref_cv, error, angle])
summary = {}
summary["Charge Error"] = computed_cv - ref_cv
summary["D1 Error"] = error
summary["D1 Angle"] = angle
summary["Size"] = sum([len(x) for x in qmmmsys_distance[cut].values()])
pname = join("Cache-BO", system+"-"+str(cut)+"-distanceqmmm.pickle")
with open(pname, "wb") as ofile:
pickle.dump(summary, ofile)
display(pd.DataFrame(df, columns=["Cutoff", "Charge Error", "Error (Norm)",
"Error (Angle Degrees)"]))
We can also look at the convergence of the atomic forces with different values of the bond order.
def get_force_mat(sys, target, log):
from numpy import zeros
sys.set_atom_forces(log)
forcemat = zeros((3, len(sys[target])))
for i, at in enumerate(sys[target]):
forcemat[:,i] = at["force"]
return forcemat
from numpy.linalg import norm
from numpy import std, average
df = []
fullforce = get_force_mat(resys, target, log)
forcerror = {}
for cut in cutoffs:
qmforce = get_force_mat(qmmmsys_bo[cut], target, qmmm_logs_bo[cut])
diffmat = fullforce - qmforce
errors = []
for i in range(0, diffmat.shape[1]):
errors.append(norm(diffmat[:,i]))
comp_errors = []
for i in range(0, diffmat.shape[1]):
comp_errors.extend([abs(x) for x in diffmat[:,i]])
pname = join("Cache-BO", system+"-"+str(cut)+"-forces.pickle")
with open(pname, "wb") as ofile:
pickle.dump(errors, ofile)
pname = join("Cache-BO", system+"-"+str(cut)+"-comp_forces.pickle")
with open(pname, "wb") as ofile:
pickle.dump(comp_errors, ofile)
df.append([cut, average(errors), max(errors), std(errors), average(comp_errors)])
display(pd.DataFrame(df, columns=["Cutoff", "Norm Error (Average)",
"Norm Error (Max)", "Norm Error (STD)", "Component Error (Average)"]))
Last, we will look at the structure of the system from a graph perspective. In this case, we will use the fragment bond order tool to determine the links in the graph. For each fragment, we sort the other fragments by the strength of the bond order. We then add links to those other fragments in that order until the sum of the remaining bond order drops below a certain threshold.
def graph_bond(sys, threshold, pairwise_bo):
from numpy import zeros
mat = zeros((len(sys),len(sys)))
for i, fragid1 in enumerate(sys):
spilldict = pairwise_bo[fragid1]
ifrag = [(x, y) for x,y in enumerate(spilldict)]
sorted_ifrag = sorted(ifrag, key=lambda x: spilldict[x[1]], reverse=True)
remainder = sum(spilldict.values())
for j, frag2 in sorted_ifrag:
mat[i,j] = 1
remainder -= spilldict[frag2]
if remainder < threshold:
break
return mat
We can compare this to a distance based metric.
def distance_bond(sys, threshold, log):
from numpy import zeros
from BigDFT.Fragments import pairwise_distance
mat = zeros((len(sys),len(sys)))
for i, fragid1 in enumerate(sys):
distdict = {x: pairwise_distance(sys[fragid1], sys[x]) for x in sys.keys()}
ifrag = [(x, y) for x,y in enumerate(distdict)]
sorted_ifrag = sorted(ifrag, key=lambda x: distdict[x[1]])
mat[i,i] = 1
for j, frag2 in sorted_ifrag:
if distdict[frag2] > threshold:
break
mat[i,j] = 1
return mat
Here we perform the calculation.
kxs = btool.get_matrix_kxs(log)
pname = join("Cache-BO", system+"-pairwise_bo.pickle")
try:
with open(pname, "rb") as ifile:
pairwise_bo = pickle.load(ifile)
except:
pairwise_bo = btool.fragment_bond_order(resys, resys.keys(), resys.keys(), log, kxs=kxs)
with open(pname, "wb") as ofile:
pickle.dump(pairwise_bo, ofile)
from networkx import from_numpy_matrix
bond_mats = {}
for cut in [100, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]:
pname = join("Cache-BO", system+"-"+str(cut)+"-graph.pickle")
try:
with open(pname, "rb") as ifile:
bond_mats[cut] = pickle.load(ifile)
except:
bond_mats[cut] = from_numpy_matrix(graph_bond(resys, cut, pairwise_bo))
with open(pname, "wb") as ofile:
pickle.dump(bond_mats[cut], ofile)
dist_mats = {}
for cut in [2, 4, 6, 8, 10, 12, 14, 16, 18]:
pname = join("Cache-BO", system+"-dist-"+str(cut)+"-graph.pickle")
try:
with open(pname, "rb") as ifile:
dist_mats[cut] = pickle.load(ifile)
except:
dist_mats[cut] = from_numpy_matrix(distance_bond(resys, cut, log))
with open(pname, "wb") as ofile:
pickle.dump(dist_mats[cut], ofile)
From this graph, we can compute certain graph metrics, such as the average clustering or shorest path length.
from networkx import average_clustering, average_shortest_path_length
df = []
for cut in [100, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]:
G = bond_mats[cut]
ac = average_clustering(G)
try:
aspl = average_shortest_path_length(G)
except:
aspl = "Not Connected"
df.append(["Bond", cut, ac, aspl])
for cut in [2, 4, 6, 8, 10, 12, 14, 16, 18]:
G = dist_mats[cut]
ac = average_clustering(G)
try:
aspl = average_shortest_path_length(G)
except:
aspl = "Not Connected"
df.append(["Distance", cut, ac, aspl])
display(pd.DataFrame(df, columns=["Type", "Cutoff", "Average Clustering", "Average Shortest Path Length"]))
Instead of trying different values of distance/bond cutoff, we can also try fixing the bond order, and modifying the purity indicator.
from networkx import connected_components
df = []
nnodes = []
asplvals = []
for cutoff in [0.075, 0.05, 0.0375, 0.025, 0.0125, 0.0075, 0.0050, 0.0025, 0.00125]:
pname = join("Cache-BO", system+"-"+str(cutoff)+".pickle")
with open(pname, "rb") as ifile:
varfrag = pickle.load(ifile)
pname = join("Cache-BO", system+"-"+str(cutoff)+"-0.01-avg-graph.pickle")
try:
with open(pname, "rb") as ifile:
G = pickle.load(ifile)
except:
pairwise_bo = btool.fragment_bond_order(varfrag, varfrag.keys(), varfrag.keys(), log, kxs=kxs)
G = from_numpy_matrix(graph_bond(varfrag, 0.01, pairwise_bo))
with open(pname, "wb") as ofile:
pickle.dump(G, ofile)
nodes = G.number_of_nodes()
average_edges = sum([x[1] for x in G.degree()])/G.number_of_nodes() - 1
try:
aspl = average_shortest_path_length(G)
except:
sub_graphs = connected_components(G)
aspl = 0
count = 0
for sub in sub_graphs:
aspl += average_shortest_path_length(G.subgraph(sub))
count += 1
aspl /= count
df.append([cutoff, nodes, average_edges, aspl])
nnodes.append(nodes)
asplvals.append(aspl)
display(pd.DataFrame(df, columns=["Cutoff", "Number of Nodes", "Average Edges per Node", "Average Shortest Path Length"]))
And plot the average shortest path length compared to the log of the number of nodes.
fig, axs = plt.subplots(1,1)
axs.plot(nnodes, asplvals, 'x--')
axs.set_xscale("log", basex=2)
axs.axvline(nnodes[3], label='PI = -0.025')
axs.set_ylabel("Average Shortest Path Length", fontsize=12)
axs.set_ylabel("Nodes", fontsize=12)
axs.legend()