NWChem: Difference between revisions
From Modelado Foundation
imported>Adam No edit summary |
imported>Adam No edit summary |
||
Line 64: | Line 64: | ||
The additional memory requirements for this work skipping test are minimal as the array only has one dimension of size nbfn. | The additional memory requirements for this work skipping test are minimal as the array only has one dimension of size nbfn. | ||
===Symmetry of g()=== | |||
The g function has several symmetries which can be exploited to reduce the number of times it must be run. | |||
<code>g(i, j, k, l)</code> returns the same value in the following cases: | |||
* The i and j parameters are swapped. | |||
* The k and l parameters are swapped. | |||
* The i,j and k,l pairs are swapped. | |||
Combining all possibly symmetries leads to an overall 8-way symmetry where the following calls all return the same value: | |||
* <code>g(i, j, k, l)</code> | |||
* <code>g(i, j, l, k)</code> | |||
* <code>g(j, i, k, l)</code> | |||
* <code>g(j, i, l, k)</code> | |||
* <code>g(k, l, i, j)</code> | |||
* <code>g(k, l, j, i)</code> | |||
* <code>g(l, k, i, j)</code> | |||
* <code>g(l, k, j, i)</code> | |||
By calculating <code>gg = g(i, j, k, l)</code> once, we can then simply perform multiple updates to the fock matrix. | |||
* <code>update_fock(gg, i, j, k, l)</code> | |||
* <code>update_fock(gg, i, j, l, k)</code> | |||
* <code>update_fock(gg, j, i, k, l)</code> | |||
* <code>update_fock(gg, j, i, l, k)</code> | |||
* <code>update_fock(gg, k, l, i, j)</code> | |||
* <code>update_fock(gg, k, l, j, i)</code> | |||
* <code>update_fock(gg, l, k, i, j)</code> | |||
* <code>update_fock(gg, l, k, j, i)</code> | |||
With <code>update()</code> being a macro or inlined function similar to: | |||
update(gg, a, b, c, d) { | |||
g_fock[a][b] += (gg * g_dens[c][d]); | |||
g_fock[a][c] -= (0.50 * gg * g_dens[b][d]); | |||
} | |||
When taking advantage of the symmetry, there are special cases which need to be considered. Most obvious is that the iteration range of the loops must be modified so that g() is only called once for each equivalent ordering of its arguments then all possible updates to the fock matrix are made with the result. Secondly, when arguments which can be swapped are equal, there are fewer updates to be made to the fock matrix. | |||
As an example, consider the case where i, j, k, and l are all equal to 0. The reference implementation can only process this once since the loop indices can only all be zero once. If all symmetric updates are performed in this case, the result would be that the exact same update is performed eight times since the arguments to <code>update(0, 0, 0, 0)</code> are unchanged by the swap operations. | |||
To ensure proper operation, there are six separate variants of the twoel loops, one for each possible symmetry restriction: | |||
* No restrictions | |||
* i = j | |||
* k = l | |||
* i,j pair is identical to k,l pair | |||
* i = j and k = l | |||
* i, j, k, and l are all equal | |||
With the special cases taken in account, the loop structure for the fully symmetric case is: | |||
for (i = 0; i < nbfn; i++) { | |||
for (j = i + 1; j < nbfn; j++) { | |||
for (k = i; k < nbfn; k++) { | |||
if (k == i) l_start = 1 + j; | |||
else l_start = 1 + k; | |||
for (l = l_start; l < nbfn; l++) { | |||
gg = g(i,j,k,l); | |||
//perform all eight updates | |||
update_fock(gg, i, j, k, l); | |||
update_fock(gg, i, j, l, k); | |||
update_fock(gg, j, i, k, l); | |||
update_fock(gg, j, i, l, k); | |||
update_fock(gg, k, l, i, j); | |||
update_fock(gg, k, l, j, i); | |||
update_fock(gg, l, k, i, j); | |||
update_fock(gg, l, k, j, i); | |||
} | |||
} | |||
} | |||
} | |||
The loop structure for the other five cases can be trivially generated by removing the loops which are no longer independent. The variables should then simply be replaced with the independent index. Finally any duplicate identical updates should be removed. | |||
As an example, the i = j case is: | |||
for (i = 0; i < nbfn; i++) { | |||
// removed j loop | |||
for (k = i; k < nbfn; k++) { | |||
for (l = 1 + k; l < nbfn; l++) { | |||
gg = g(i,i,k,l); // Note that first two arguments are both i | |||
//perform four updates | |||
update_fock(gg, i, i, k, l); | |||
update_fock(gg, i, i, l, k); | |||
// duplicate: update_fock(gg, i, i, k, l); | |||
// duplicate: update_fock(gg, i, i, l, k); | |||
update_fock(gg, k, l, i, i); | |||
// duplicate: update_fock(gg, k, l, i, i); | |||
update_fock(gg, l, k, i, i); | |||
// duplicate: update_fock(gg, l, k, i, i); | |||
} | |||
} | |||
} | |||
} | |||
Note that the special calculation of the starting index for the l loop is only required in the fully symmetric case. In the i = j case, the initial value is: | |||
if (k == i) l_start = 1 + i; | |||
else l_start = 1 + k; | |||
However, we can substitute i with k when k == i: | |||
if (k == i) l_start = 1 + k; | |||
else l_start = 1 + k; | |||
Which removes the need for the conditional assignment since the initial value is <code>1 + k</code> in both cases. | |||
For all other symmetry restrictions, the l loop is removed entirely. |
Revision as of 16:56, February 5, 2013
NWChem | |
---|---|
Location to an image/logo (if any) Image Caption | |
Developer(s) | Institute's name |
Stable Release | x.y.z/Latest Release Date here |
Operating Systems | Linux, Unix, etc. |
Type | Computational Chemistry? |
License | Open Source or else? |
Website | URL here |
NWChem is <...your description here...>
Applications
Kernel Name
Description
Download
Optimizations
Additional Work Skipping
The reference implementation has two tests to skip unnecessary work, one at the i,j level and one at the i,j,k,l level (innermost loop). An additional test can be added to the i,j,k level which will allow skipping entire iterations of the inner most loop.
Reference twoel function:
double twoel(double schwmax) { int i, j, k, l; for (i = 0; i < nbfn; i++) { for (j = 0; j < nbfn; j++) { if ((g_schwarz[i][j] * schwmax ) < tol2e) {icut1 += nbfn * nbfn; continue;} double KLTest = tol2e / g_schwarz[i][j]; for (k = 0; k < nbfn; k++) { for (l = 0; l < nbfn; l++) { if (g_schwarz[k][l] < KLTest) {icut2 ++; continue;} icut3 ++; double gg = g(i, j, k, l); g_fock[i][j] += ( gg * g_dens[k][l]); g_fock[i][k] -= (0.50 * gg * g_dens[j][l]); } } } } return (0.50 * contract_matrices(g_fock, g_dens)); }
The inner-most test, (g_schwarz[k][l] < KLTest)
, can be modified to work in the k loop under certain conditions. If the maximum value of all entries for a given row of g_schwarz is known, that can be compared to KLTest in the k loop and we can skip the l loop entirely in cases where it would not perform any updates.
This requires one additional line to be inserted between the k and l loops:
for (k = 0; k < nbfn; k++) { if (g_schwarz_max_row_value[k] < KLTest) {icut4 ++; continue;} for (l = 0; l < nbfn; l++) {
To enable this additional test, the array g_schwarz_max_row_value should be populated with the maximum value for each row in makesz while it is initializing the g_schwarz matrix.
The additional memory requirements for this work skipping test are minimal as the array only has one dimension of size nbfn.
Symmetry of g()
The g function has several symmetries which can be exploited to reduce the number of times it must be run.
g(i, j, k, l)
returns the same value in the following cases:
- The i and j parameters are swapped.
- The k and l parameters are swapped.
- The i,j and k,l pairs are swapped.
Combining all possibly symmetries leads to an overall 8-way symmetry where the following calls all return the same value:
g(i, j, k, l)
g(i, j, l, k)
g(j, i, k, l)
g(j, i, l, k)
g(k, l, i, j)
g(k, l, j, i)
g(l, k, i, j)
g(l, k, j, i)
By calculating gg = g(i, j, k, l)
once, we can then simply perform multiple updates to the fock matrix.
update_fock(gg, i, j, k, l)
update_fock(gg, i, j, l, k)
update_fock(gg, j, i, k, l)
update_fock(gg, j, i, l, k)
update_fock(gg, k, l, i, j)
update_fock(gg, k, l, j, i)
update_fock(gg, l, k, i, j)
update_fock(gg, l, k, j, i)
With update()
being a macro or inlined function similar to:
update(gg, a, b, c, d) { g_fock[a][b] += (gg * g_dens[c][d]); g_fock[a][c] -= (0.50 * gg * g_dens[b][d]); }
When taking advantage of the symmetry, there are special cases which need to be considered. Most obvious is that the iteration range of the loops must be modified so that g() is only called once for each equivalent ordering of its arguments then all possible updates to the fock matrix are made with the result. Secondly, when arguments which can be swapped are equal, there are fewer updates to be made to the fock matrix.
As an example, consider the case where i, j, k, and l are all equal to 0. The reference implementation can only process this once since the loop indices can only all be zero once. If all symmetric updates are performed in this case, the result would be that the exact same update is performed eight times since the arguments to update(0, 0, 0, 0)
are unchanged by the swap operations.
To ensure proper operation, there are six separate variants of the twoel loops, one for each possible symmetry restriction:
- No restrictions
- i = j
- k = l
- i,j pair is identical to k,l pair
- i = j and k = l
- i, j, k, and l are all equal
With the special cases taken in account, the loop structure for the fully symmetric case is:
for (i = 0; i < nbfn; i++) { for (j = i + 1; j < nbfn; j++) { for (k = i; k < nbfn; k++) { if (k == i) l_start = 1 + j; else l_start = 1 + k; for (l = l_start; l < nbfn; l++) { gg = g(i,j,k,l); //perform all eight updates update_fock(gg, i, j, k, l); update_fock(gg, i, j, l, k); update_fock(gg, j, i, k, l); update_fock(gg, j, i, l, k); update_fock(gg, k, l, i, j); update_fock(gg, k, l, j, i); update_fock(gg, l, k, i, j); update_fock(gg, l, k, j, i); } } } }
The loop structure for the other five cases can be trivially generated by removing the loops which are no longer independent. The variables should then simply be replaced with the independent index. Finally any duplicate identical updates should be removed.
As an example, the i = j case is:
for (i = 0; i < nbfn; i++) { // removed j loop for (k = i; k < nbfn; k++) { for (l = 1 + k; l < nbfn; l++) { gg = g(i,i,k,l); // Note that first two arguments are both i //perform four updates update_fock(gg, i, i, k, l); update_fock(gg, i, i, l, k); // duplicate: update_fock(gg, i, i, k, l); // duplicate: update_fock(gg, i, i, l, k); update_fock(gg, k, l, i, i); // duplicate: update_fock(gg, k, l, i, i); update_fock(gg, l, k, i, i); // duplicate: update_fock(gg, l, k, i, i); } } } }
Note that the special calculation of the starting index for the l loop is only required in the fully symmetric case. In the i = j case, the initial value is:
if (k == i) l_start = 1 + i; else l_start = 1 + k;
However, we can substitute i with k when k == i:
if (k == i) l_start = 1 + k; else l_start = 1 + k;
Which removes the need for the conditional assignment since the initial value is 1 + k
in both cases.
For all other symmetry restrictions, the l loop is removed entirely.