Complexity Reduction

In this notebook, we will demonstrate the features of BigDFT for creating coarsed grained views of large, complex systems. The notebook will begin by running a large, linear scaling calculation on the system of choice. We will then use the auto fragmentation feature, which breaks down a system into smaller parts. We will continue by using the bond order tool which can be used to quantify the interaction strength between parts. We will then use these tools together to generate QM/MM runs. Finally, we will perform some basic graph analysis on these systems.

In [1]:
from __future__ import print_function
import pandas as pd
from os.path import join
%matplotlib inline

The first step is to select the system you are interested in studying.

In [2]:
system = "1CRN" # "1CRN" "MG" "Laccase" "Pentacene" 

Setup

First we setup and run a calculation on the full system. To do the calculation, we first need a system calculator.

In [3]:
from BigDFT import Calculators as C
code = C.SystemCalculator()
code.update_global_options(skip=True)
Initialize a Calculator with OMP_NUM_THREADS=1 and command mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft

Next we need an input file. We will use basic parameters, but it is important to use the linear-scaling mode, and to write the matrices to file.

In [4]:
from BigDFT import Inputfiles as I
inp = I.Inputfile()
inp.set_xc("PBE")
inp.set_hgrid(0.4)
inp.write_support_function_matrices()
inp["import"] = "linear"

Finally we run the actual calculation.

In [5]:
from BigDFT import Logfiles
log=code.run(input=inp, posinp=join("Geometries", system+".xyz"), name=system, run_dir="Output")
Copy the posinp file 'Geometries/1CRN.xyz' into 'Output'
Creating the yaml input file "Output/1CRN.yaml"
Run directory Output
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN -s Yes

Fragmentations

The next step is to try and break down this system using the auto fragmentation tool. To begin, we will start by reading in the file, and have each atom of the system be its own fragment.

In [6]:
from BigDFT import Fragments as F
try:
    from BigDFT.FragmentIO import XYZReader
except ImportError:
    from BigDFT.XYZ import XYZReader
fullsys = F.System()
with XYZReader(join("Geometries", system+".xyz")) as ifile:
    for i, line in enumerate(ifile):
        fullsys["ATOM:"+str(i)] = F.Fragment([line])

Next we will use the BigDFTool for post processing the calculation. It has a feature for automatically fragmenting a system. You only need to specify to it a choice of cutoff.

In [7]:
from BigDFT import PostProcessing as PP
btool = PP.BigDFTool()

We will use a cutoff value of 0.05 for this notebook.

In [8]:
import pickle
pname = join("Cache-BO", system+".pickle")
try:
    with open(pname, "rb") as ifile:
        resys = pickle.load(ifile)
except:
    resys = btool.auto_fragment(fullsys, log, 0.05, verbose=True, criteria="bondorder")
    with open(pname, "wb") as ofile:
        pickle.dump(resys, ofile)

We can plot the purity values of this fragmented system to verify the procedure.

In [9]:
from matplotlib import pyplot as plt
fig, axs = plt.subplots(1,1,figsize=(12,4))
F.plot_fragment_information(axs, {x: abs(resys[x].purity_indicator) for x in resys})
axs.axhline("0.05")
Out[9]:
<matplotlib.lines.Line2D at 0x7f2792da0ac8>

We can also look at the sizes of the various fragments.

In [10]:
from matplotlib import pyplot as plt
fig, axs = plt.subplots(1,1)
axs.plot(sorted([len(x) for x in resys.values()]), 'x--')
axs.set_ylabel("Fragment Size")
axs.set_xlabel("Fragment")
Out[10]:
Text(0.5, 0, 'Fragment')

The number of fragments will depend on our choice of cutoff. We can explore a number of different cutoff values and see how that affects the system.

In [11]:
from copy import deepcopy
from BigDFT.PostProcessing import BigDFTool

btool = BigDFTool()
varfrag = deepcopy(fullsys)

df = []
kxs = None
for cutoff in [1.0, 0.5, 0.25, 0.10, 0.0875, 0.075, 0.05, 0.0375, 0.025, 0.0125, 0.0075, 0.0050, 0.0025, 0.00125]:
    pname = join("Cache-BO", system+"-"+str(cutoff)+".pickle")
    try:
        with open(pname, "rb") as ifile:
            varfrag = pickle.load(ifile)
    except:
        varfrag=btool.auto_fragment(system=varfrag,cutoff=cutoff,log=log,kxs=kxs,criteria="bondorder")
        with open(pname, "wb") as ofile:
            pickle.dump(varfrag, ofile)
    df.append([cutoff, len(varfrag)])
    
display(pd.DataFrame(df, columns=["Cutoff", "Number of Fragments"]))
Cutoff Number of Fragments
0 1.00000 642
1 0.50000 642
2 0.25000 204
3 0.10000 100
4 0.08750 94
5 0.07500 61
6 0.05000 39
7 0.03750 32
8 0.02500 20
9 0.01250 7
10 0.00750 5
11 0.00500 1
12 0.00250 1
13 0.00125 1

We will also generate an image of the fragmentations. You can load this TCL script into VMD.

In [12]:
from BigDFT import Visualization as V
In [13]:
vmd = V.VMDGenerator()
vmd.visualize_fragments(resys, join("Viz-BO",system+".tcl"), join("Viz-BO", system+".xyz"))

QM/MM

With the fragmentation established, we will next try to do QM/MM calculations. For the QM/MM calculation, we will also need to know the multipole values so that we can verify the correctness.

In [14]:
resys.set_atom_multipoles(log)

Now we need to pick a target fragment, which is the fragment we wish to reproduce the values of. We want a good signal to noise ratio, so we will try to pick a fragment with a large dipole.

In [15]:
from numpy.linalg import norm
d1_strength = {x: norm(resys[x].d1()) for x in resys}
In [16]:
fig, axs = plt.subplots(1,1, figsize=(12,4))
F.plot_fragment_information(axs, d1_strength)
axs.set_ylabel("Dipole Strength", fontsize=12)
Out[16]:
Text(0, 0.5, 'Dipole Strength')
In [17]:
target = max(resys, key=d1_strength.get)
print(target, d1_strength[target])
FRAG:17 10.17461763285362

And save this information to file.

In [18]:
pname = join("Cache-BO", system+"-resys-target.pickle")
with open(pname, "wb") as ofile:
    pickle.dump((target, resys), ofile)

We will use the bond order tool as a measure of interaction strength. In the case of QM/MM, we care about the cumulative sum of the bond order. We will try to drive that sum down until only a small amount of density is being leaked from the QM/MM region.

In [19]:
pname = join("Cache-BO", system+"-fbo.pickle")
try:
    with open(pname, "rb") as ifile:
        bondorder = pickle.load(ifile)
except:
    bondorder = btool.fragment_bond_order(resys, [target], resys.keys(), log)
    bondorder = bondorder[target]
    with open(pname, "wb") as ofile:
        pickle.dump(bondorder, ofile)
In [20]:
from numpy import cumsum

fig, axs = plt.subplots(1,1)
axs.set_yscale("log")
axs.set_ylim(1e-7,1)
axs.set_xlim(0,100)
axs.set_xlabel("Fragment", fontsize=12)
axs.set_ylabel("Remaining Bond Order", fontsize=12)

axs.plot(sum(bondorder.values()) - cumsum(sorted(bondorder.values(), reverse=True)), '.--')
Out[20]:
[<matplotlib.lines.Line2D at 0x7f2792969978>]

Now we can setup the QM/MM calculation, once again using the BigDFTool. This will look at the cumulative bond order, and add fragments to the buffer until that is reduced below our choice of cutoff. Along the way, we can also calculate the charge of the QM region, as well as its size and radius.

In [21]:
from copy import deepcopy

df = []
qmmmsys_bo = {}
charges_bo = {}

cutoffs = [100, 1, 0.1, 0.01, 0.001]

kxs = btool.get_matrix_kxs(log)

for cut in cutoffs:
    qmmmsys_bo[cut], mm = btool.create_qmmm_system(resys, log, target, cut, kxs=kxs)
    cv = 0
    for fragid, frag in qmmmsys_bo[cut].items():
        for at in frag:
            cv += at.q0
    charges_bo[cut] = cv
    remainder = sum(bondorder.values()) - sum([bondorder[x] for x in qmmmsys_bo[cut]])
    size = sum([len(x) for x in qmmmsys_bo[cut].values()])
    distance = max([F.pairwise_distance(qmmmsys_bo[cut][target], 
                                        qmmmsys_bo[cut][x]) for x in qmmmsys_bo[cut]])
    df.append([cut, remainder, size, charges_bo[cut], distance])
    
    pname = join("Cache-BO", system+"-qmmm-"+str(cut)+".pickle")
    with open(pname, "wb") as ofile:
        pickle.dump((target, qmmmsys_bo[cut]), ofile)
    
display(pd.DataFrame(df, columns=["Cutoff", "Charge Remainder", 
                                  "Size of QM Region", "Charge of QM Region", 
                                  "Distance from Target Included"]))
Cutoff Charge Remainder Size of QM Region Charge of QM Region Distance from Target Included
0 100.000 6.862754 56 0.210669 0.000000
1 1.000 0.296671 122 0.911266 2.926834
2 0.100 0.048764 181 0.821209 4.509318
3 0.010 0.007764 228 0.970707 4.885366
4 0.001 0.000820 269 -0.080516 6.867320

With the buffer region created, we can now go ahead and perform the QM/MM calculations. Note that we round the charge to the nearest electron.

In [22]:
qmmm_logs_bo = {}
for cut in cutoffs[:]:
    qmmm_inp = deepcopy(inp)
    qmmm_inp.setdefault("dft",{})["qcharge"] = 1.0 * round(charges_bo[cut])
    qmmm_logs_bo[cut] = code.run(input=qmmm_inp, posinp=qmmmsys_bo[cut].get_posinp(),
                              name=system+"-"+str(cut), run_dir="QMMMOut-BO")
    qmmmsys_bo[cut].set_atom_multipoles(qmmm_logs_bo[cut])
Creating the yaml input file "QMMMOut-BO/1CRN-100.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-100 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-1.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-1 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-0.1.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-0.1 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-0.01.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-0.01 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-0.001.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-0.001 -s Yes

Last we will look at the error in the dipole. We can do this comparison both for the norm of the error, and the angle.

In [23]:
from numpy.linalg import norm
from numpy import dot, arccos, pi
df = []

ref_d0 = resys[target].d0()
ref_cv = 0
for at in resys[target]:
    ref_cv += at.q0
    
for cut in cutoffs[:]:
    computed_cv = 0
    for at in qmmmsys_bo[cut][target]:
        computed_cv += at.q0
        
    computed_d0 = qmmmsys_bo[cut][target].d0()
    error = norm(computed_d0 - ref_d0) / norm(ref_d0)
    angle = (180/pi) * arccos(dot(computed_d0, ref_d0)/(norm(computed_d0) * norm(ref_d0)))
    remainder = sum(bondorder.values()) - sum([bondorder[x] for x in qmmmsys_bo[cut]])
    df.append([cut, remainder, computed_cv - ref_cv, error, angle])

    summary = {}
    summary["Charge Error"] = computed_cv - ref_cv
    summary["D1 Error"] = error
    summary["D1 Angle"] = angle
    summary["Size"] = sum([len(x) for x in qmmmsys_bo[cut].values()])
    summary["Remainder"] = remainder
    pname = join("Cache-BO", system+"-"+str(cut)+"-spillageqmmm.pickle")
    with open(pname, "wb") as ofile:
        pickle.dump(summary, ofile)
    
display(pd.DataFrame(df, columns=["Cutoff", "Remainder", "Charge Error", 
                                  "Error (Relative Norm)", "Error (Angle Degrees)"]))
Cutoff Remainder Charge Error Error (Relative Norm) Error (Angle Degrees)
0 100.000 6.862754 -0.210664 0.545345 25.785099
1 1.000 0.296671 0.116877 0.165159 3.306158
2 0.100 0.048764 0.169230 0.093047 1.231079
3 0.010 0.007764 0.076258 0.063041 0.286826
4 0.001 0.000820 0.006044 0.017176 0.109553

Distance Based QM/MM

Another option is to build a QM/MM region using distance as a criteria. We can handle this case is much the same way, just changing the criteria value.

In [24]:
from copy import deepcopy

df = []
qmmmsys_distance = {}
charges_distance = {}

distances = [2, 3, 4, 5, 6]

for cut in distances:
    qmmmsys_distance[cut], mm = btool.create_qmmm_system(resys, log, 
                                                         target, cut, criteria="distance")
    cv = 0
    for fragid, frag in qmmmsys_distance[cut].items():
        for at in frag:
            cv += at.q0
    charges_distance[cut] = cv
    size = sum([len(x) for x in qmmmsys_distance[cut].values()])
    distance = max([F.pairwise_distance(qmmmsys_distance[cut][target], 
                                        qmmmsys_distance[cut][x]) for x in qmmmsys_distance[cut]])
    df.append([cut, size, charges_distance[cut], distance])
    
display(pd.DataFrame(df, columns=["Cutoff", "Size of QM Region", 
                                  "Charge of QM Region", "Distance from Target Included"]))
Cutoff Size of QM Region Charge of QM Region Distance from Target Included
0 2 56 0.210669 0.000000
1 3 122 0.911266 2.926834
2 4 146 0.890486 3.657768
3 5 237 0.156348 4.885366
4 6 237 0.156348 4.885366

With this second type of buffer region created, we can now go ahead and perform the QM/MM calculations.

In [25]:
qmmm_logs_distance = {}
for cut in distances[:]:
    qmmm_inp = deepcopy(inp)
    qmmm_inp.setdefault("dft",{})["qcharge"] = 1.0 * round(charges_distance[cut])
    qmmm_logs_distance[cut] = code.run(input=qmmm_inp, 
                                       posinp=qmmmsys_distance[cut].get_posinp(),
                                       name=system+"-dist-"+str(cut), run_dir="QMMMOut-BO")
    qmmmsys_distance[cut].set_atom_multipoles(qmmm_logs_distance[cut])
Creating the yaml input file "QMMMOut-BO/1CRN-dist-2.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-dist-2 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-dist-3.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-dist-3 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-dist-4.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-dist-4 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-dist-5.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-dist-5 -s Yes
Creating the yaml input file "QMMMOut-BO/1CRN-dist-6.yaml"
Run directory QMMMOut-BO
Executing command:  mpirun -machinefile /var/lib/oar/1918626 /home/wdawson/binaries/ase/install/bin/bigdft -n 1CRN-dist-6 -s Yes

Once again, we do a comparison based on dipole values.

In [26]:
from numpy.linalg import norm
from numpy import dot, arccos, pi
df = []

ref_d0 = resys[target].d0()
ref_cv = 0
for at in resys[target]:
    ref_cv += at.q0
    
for cut in distances[:]:
    computed_cv = 0
    for at in qmmmsys_distance[cut][target]:
        computed_cv += at.q0
        
    computed_d0 = qmmmsys_distance[cut][target].d0()
    error = norm(computed_d0 - ref_d0) / norm(ref_d0)
    angle = (180/pi) * arccos(dot(computed_d0, ref_d0)/(norm(computed_d0) * norm(ref_d0)))
    df.append([cut, computed_cv - ref_cv, error, angle])
    
    summary = {}
    summary["Charge Error"] = computed_cv - ref_cv
    summary["D1 Error"] = error
    summary["D1 Angle"] = angle
    summary["Size"] = sum([len(x) for x in qmmmsys_distance[cut].values()])
    pname = join("Cache-BO", system+"-"+str(cut)+"-distanceqmmm.pickle")
    with open(pname, "wb") as ofile:
        pickle.dump(summary, ofile)
    
display(pd.DataFrame(df, columns=["Cutoff", "Charge Error", "Error (Norm)", 
                                  "Error (Angle Degrees)"]))
Cutoff Charge Error Error (Norm) Error (Angle Degrees)
0 2 -0.210664 0.545345 25.785094
1 3 0.116875 0.165156 3.306072
2 4 0.229785 0.154074 2.234894
3 5 -0.004744 0.021210 0.468946
4 6 -0.004744 0.021210 0.468950

Forces

We can also look at the convergence of the atomic forces with different values of the bond order.

In [27]:
def get_force_mat(sys, target, log):
    from numpy import zeros
    
    sys.set_atom_forces(log)
    forcemat = zeros((3, len(sys[target])))
    
    for i, at in enumerate(sys[target]):
        forcemat[:,i] = at["force"]
        
    return forcemat
In [28]:
from numpy.linalg import norm
from numpy import std, average

df = []

fullforce = get_force_mat(resys, target, log)
forcerror = {}

for cut in cutoffs:
    qmforce = get_force_mat(qmmmsys_bo[cut], target, qmmm_logs_bo[cut])
    diffmat = fullforce - qmforce
    
    errors = []
    for i in range(0, diffmat.shape[1]):
        errors.append(norm(diffmat[:,i]))
    comp_errors = []
    for i in range(0, diffmat.shape[1]):
        comp_errors.extend([abs(x) for x in diffmat[:,i]])
        
    pname = join("Cache-BO", system+"-"+str(cut)+"-forces.pickle")
    with open(pname, "wb") as ofile:
        pickle.dump(errors, ofile)
        
    pname = join("Cache-BO", system+"-"+str(cut)+"-comp_forces.pickle")
    with open(pname, "wb") as ofile:
        pickle.dump(comp_errors, ofile)
        
    df.append([cut, average(errors), max(errors), std(errors), average(comp_errors)])
    
display(pd.DataFrame(df, columns=["Cutoff", "Norm Error (Average)",
                                  "Norm Error (Max)", "Norm Error (STD)", "Component Error (Average)"]))
Cutoff Norm Error (Average) Norm Error (Max) Norm Error (STD) Component Error (Average)
0 100.000 0.017433 0.146533 0.028406 0.008938
1 1.000 0.004399 0.036707 0.006213 0.002282
2 0.100 0.002271 0.020321 0.003510 0.001138
3 0.010 0.001839 0.020686 0.003115 0.000926
4 0.001 0.001715 0.016854 0.003143 0.000882

Graph Analysis

Last, we will look at the structure of the system from a graph perspective. In this case, we will use the fragment bond order tool to determine the links in the graph. For each fragment, we sort the other fragments by the strength of the bond order. We then add links to those other fragments in that order until the sum of the remaining bond order drops below a certain threshold.

In [29]:
def graph_bond(sys, threshold, pairwise_bo):
    from numpy import zeros
    mat = zeros((len(sys),len(sys)))
    
    for i, fragid1 in enumerate(sys):
        spilldict = pairwise_bo[fragid1]
        ifrag = [(x, y) for x,y in enumerate(spilldict)]
        sorted_ifrag = sorted(ifrag, key=lambda x: spilldict[x[1]], reverse=True)
        
        remainder = sum(spilldict.values())
        for j, frag2 in sorted_ifrag:
            mat[i,j] = 1
            remainder -= spilldict[frag2]
            if remainder < threshold:
                break
    return mat

We can compare this to a distance based metric.

In [30]:
def distance_bond(sys, threshold, log):
    from numpy import zeros
    from BigDFT.Fragments import pairwise_distance
    
    mat = zeros((len(sys),len(sys)))
    for i, fragid1 in enumerate(sys):
        distdict = {x: pairwise_distance(sys[fragid1], sys[x]) for x in sys.keys()}
        ifrag = [(x, y) for x,y in enumerate(distdict)]
        sorted_ifrag = sorted(ifrag, key=lambda x: distdict[x[1]])
        mat[i,i] = 1
        for j, frag2 in sorted_ifrag:
            if distdict[frag2] > threshold:
                break
            mat[i,j] = 1
    return mat

Here we perform the calculation.

In [31]:
kxs = btool.get_matrix_kxs(log)
In [32]:
pname = join("Cache-BO", system+"-pairwise_bo.pickle")
try:
    with open(pname, "rb") as ifile:
        pairwise_bo = pickle.load(ifile)
except:
    pairwise_bo = btool.fragment_bond_order(resys, resys.keys(), resys.keys(), log, kxs=kxs)
    with open(pname, "wb") as ofile:
        pickle.dump(pairwise_bo, ofile)
In [33]:
from networkx import from_numpy_matrix

bond_mats = {}
for cut in [100, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]:
    pname = join("Cache-BO", system+"-"+str(cut)+"-graph.pickle")
    try:
        with open(pname, "rb") as ifile:
            bond_mats[cut] = pickle.load(ifile)
    except:
        bond_mats[cut] = from_numpy_matrix(graph_bond(resys, cut, pairwise_bo))
        with open(pname, "wb") as ofile:
            pickle.dump(bond_mats[cut], ofile)
In [34]:
dist_mats = {}
for cut in [2, 4, 6, 8, 10, 12, 14, 16, 18]:
    pname = join("Cache-BO", system+"-dist-"+str(cut)+"-graph.pickle")
    try:
        with open(pname, "rb") as ifile:
            dist_mats[cut] = pickle.load(ifile)
    except:
        dist_mats[cut] = from_numpy_matrix(distance_bond(resys, cut, log))
        with open(pname, "wb") as ofile:
            pickle.dump(dist_mats[cut], ofile)

From this graph, we can compute certain graph metrics, such as the average clustering or shorest path length.

In [35]:
from networkx import average_clustering, average_shortest_path_length

df = []

for cut in [100, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]:
    G = bond_mats[cut]
    ac = average_clustering(G)
    try:
        aspl = average_shortest_path_length(G)
    except:
        aspl = "Not Connected"
    df.append(["Bond", cut, ac, aspl])
    
for cut in [2, 4, 6, 8, 10, 12, 14, 16, 18]:
    G = dist_mats[cut]
    ac = average_clustering(G)
    try:
        aspl = average_shortest_path_length(G)
    except:
        aspl = "Not Connected"
    df.append(["Distance", cut, ac, aspl])

display(pd.DataFrame(df, columns=["Type", "Cutoff", "Average Clustering", "Average Shortest Path Length"]))
Type Cutoff Average Clustering Average Shortest Path Length
0 Bond 100.000000 0.000000 Not Connected
1 Bond 1.000000 0.000000 5.49663
2 Bond 0.100000 0.403391 3.23482
3 Bond 0.010000 0.527657 2.52227
4 Bond 0.001000 0.611333 2.30904
5 Bond 0.000100 0.656829 1.94737
6 Bond 0.000010 0.715909 1.73684
7 Bond 0.000001 0.745218 1.58974
8 Distance 2.000000 0.000000 Not Connected
9 Distance 4.000000 0.153419 3.61404
10 Distance 6.000000 0.541795 2.49663
11 Distance 8.000000 0.598456 2.2861
12 Distance 10.000000 0.642010 2.00945
13 Distance 12.000000 0.664375 1.78677
14 Distance 14.000000 0.721535 1.63158
15 Distance 16.000000 0.736407 1.55196
16 Distance 18.000000 0.761505 1.47233

Instead of trying different values of distance/bond cutoff, we can also try fixing the bond order, and modifying the purity indicator.

In [36]:
from networkx import connected_components

df = []
nnodes = []
asplvals = []
for cutoff in [0.075, 0.05, 0.0375, 0.025, 0.0125, 0.0075, 0.0050, 0.0025, 0.00125]:
    pname = join("Cache-BO", system+"-"+str(cutoff)+".pickle")
    with open(pname, "rb") as ifile:
        varfrag = pickle.load(ifile)
    
    pname = join("Cache-BO", system+"-"+str(cutoff)+"-0.01-avg-graph.pickle")
    try:
        with open(pname, "rb") as ifile:
            G = pickle.load(ifile)
    except:
        pairwise_bo = btool.fragment_bond_order(varfrag, varfrag.keys(), varfrag.keys(), log, kxs=kxs)
        G = from_numpy_matrix(graph_bond(varfrag, 0.01, pairwise_bo))
        with open(pname, "wb") as ofile:
            pickle.dump(G, ofile)

    nodes = G.number_of_nodes()
    average_edges = sum([x[1] for x in G.degree()])/G.number_of_nodes() - 1
    
    try:
        aspl = average_shortest_path_length(G)
    except:
        sub_graphs = connected_components(G)
        aspl = 0
        count = 0
        for sub in sub_graphs:
            aspl += average_shortest_path_length(G.subgraph(sub))
            count += 1
        aspl /= count
    
    df.append([cutoff, nodes, average_edges, aspl])
    nnodes.append(nodes)
    asplvals.append(aspl)

display(pd.DataFrame(df, columns=["Cutoff", "Number of Nodes", "Average Edges per Node", "Average Shortest Path Length"]))
Cutoff Number of Nodes Average Edges per Node Average Shortest Path Length
0 0.07500 61 7.295082 2.909290
1 0.05000 39 6.589744 2.522267
2 0.03750 32 6.437500 2.288306
3 0.02500 20 6.600000 1.931579
4 0.01250 7 4.714286 1.380952
5 0.00750 5 3.800000 1.300000
6 0.00500 1 1.000000 0.000000
7 0.00250 1 1.000000 0.000000
8 0.00125 1 1.000000 0.000000

And plot the average shortest path length compared to the log of the number of nodes.

In [37]:
fig, axs = plt.subplots(1,1)
axs.plot(nnodes, asplvals, 'x--')
axs.set_xscale("log", basex=2)
axs.axvline(nnodes[3], label='PI = -0.025')
axs.set_ylabel("Average Shortest Path Length", fontsize=12)
axs.set_ylabel("Nodes", fontsize=12)
axs.legend()
Out[37]:
<matplotlib.legend.Legend at 0x7f278d558128>
In [ ]: