GPU-based multi-layer perceptron as efficient method for approximation complex light models in per-vertex lighting.

This paper describes a display method of the sky color on GeForce FX hardware. Lighting model used here is taken from “Display of the Earth taking into account atmospheric scattering” by Tomoyuki Nishita et.al., however this model is not the only suitable one in the proposed method.

A model of lighting used here is described by the following expression:

_{}

which describes a dependence of observed light color of position and orientation of observer.

_{}- Optical length,
amount of air along viewing direction.

_{}

_{}- Phase function,
amount of light scattered in

_{}
- angle between viewing direction *V*
and vertical direction *U*.

_{}
- angle between *V* and sun direction *L*.

_{
} - angle between *U* and *L*.

_{
}

_{
} – altitude of observer.

_{
}

_{
} – distance between observer and planet
center.

_{
}- scale factor

_{
}- planet radius

I. Optical length

Attenuation of a light beam depends exponentially on optical length. In this implementation optical length is calculated only for acute angles between viewing direction and zenith.

In order to
improve speed of calculations of optical length, values of _{
}
are calculated, and used as lookup table _{
}. Because _{
}
depends exponentially on _{
}, light-air
interactions primarily depends on density, all calculations are made on _{
}. This
allows a constant interval in the above integral. Therefore function _{
}is sampled on _{
}.

A single sample _{
}is evaluated as
follows:

_{
}

where:

_{
}
and_{
}

Function _{
}
is nearly constant at small angles, but it grows rapidly, when _{
}
is close to right
angle. In order to improve precision, and speed, the second dimension of _{
}_{
}

After
lookup table _{
}_{
}

_{
}

While _{
}_{
}

_{
}

Lookup
table _{
}_{
}_{
}

_{
}

where:

_{
}

II. Color of atmosphere

The color of atmosphere is
calculated alike as linear mass of atmosphere. Increased complexity is caused
by the fact that treated directions lie in three-dimensional space, summed
samples depend on four parameters _{
}
, which depend in different manner on integration step _{
}.

Step _{
}*V*. If four coordinates of point of observation are marked as _{
}_{
}*V*
has coordinates calculated as follows:

Let us mark
a distance from planet center to point of observation as _{
}

_{
}

Distance
from planet center to point _{
}

_{
}

Density in
point _{
}

_{
}

As point _{
}*U*
changes due to planet curvature. Angle a changes accordingly:

_{
}

_{
}

Angle b is constant because light beams are considered parallel.

Angle q changes as follows:

_{
}

_{
}

Angle _{
}*U-V* vertices and *L-V* surface. During translation
of point _{
}*V*
only direction *U* is changing, and only in surface *U-V*, so angle
between surfaces *U-V* and *L-V* is constant.

_{
}

Finally, color of atmosphere is equal:

_{
}

Additionally,
when _{
}_{
}_{
}

**I** and **b** consist of three
coordinates: red, green and blue. This gives three expressions used in application
for obtaining the color of atmosphere:

_{
}

_{}

_{}

III. Approximation

In previous points implementation of lighting model was described.

The color
of atmosphere _{
}

In order to avoid rendering multiple layers, the light model was approximated. Additionally approximation is performed directly on GPU’s vertex processor.

Precisely,
function of color of atmosphere** I** is approximated by function _{
}

Neural network used here has a simple topology. It consists of four independent and identical perceptrons, each one for calculating a single output.

Each sub-network
shares the same input, which is four-dimensional vector _{
}

Single sub-network consists of two hidden layers, with eight neurons in first layer, and four in second. Whole network uses 324 weights. Each neuron is biased, and has a following activation function:

_{
}

The form of this approximator is deeply influenced by graphic processor architecture, and exploits its speed in calculating dot products.

Network is learned of the light model by separate program, and after learning process, it is used in GPU for coloring the sky and surface.

Three first outputs are learned of scaled color values:

_{
}_{
}_{
}

, where _{
}. Fourth output is learned of value:

_{
}

, where _{
}
is calculated from
100 thousands of random samples _{
}

This
„normalization” of color values, and separate calculation by independent
sub-networks was necessary to achieve low-error results. As inputs, values of _{
}from ranges _{
}are used. During
learning process, these values are taken in a random fashion with uniform
distribution.

In case of
night, when _{
}, network is learned of value:

_{
}

This case caused multiplication by 0.9, and addition of 0.05 for coefficient M, for better recognition between night and day.

In order to reduce error, and to exploit extrapolation property of perceptron, in case of night R,G and B outputs are not learned.

IV. Color of surface

Method presented above, gives only the color of atmosphere, when observed from inside into space. It is necessary to modify this method, in order to obtain surface color, or color of atmosphere between observer and surface.

This
situation is presented on picture below. Direction *L* is constant, due to
infinite light source. An incoming light beam to observer looking into
direction *V* may be divided into two parts: Light reflected from surface,
and light scattered in atmosphere along direction *V*.

Scattered light part is obtained from approximator by using it twice.

Observer **I**
sees the same atmosphere as observer **II**, plus light scattered between
both observers. Thus color of the light scattered between observer and surface
can be obtained as a difference between neural network’s output for observer,
and output received for probe positioned on watched point on surface.

Reflected light part is obtained by using the model of atmosphere directly.:

_{
}

, where
angles and densities are taken for observers **I** and **II**, *C*
means color of surface with no atmosphere in point **II**, function *F*
is in most general case a BDRF for surface in point **II.** This calculation
is performed on GPU’s fragment processor.

V. Multiple atmospheres

Atmosphere of earth is a multi-component atmosphere. It consists of gaseous part, and aerosols. Each of components scatters incident light differently, and thus must be treated separately. All above mechanisms are described for single atmosphere. In order to obtain proper color of atmosphere for multi-component atmospheres, a small addition is needed.

This
addition is correct, if a single component differs only by phase function *F*,
coefficient **b**, and scaling factor _{
}.

Because all the above calculations are based on densities, rather than on heights, it is necessary to scale all densities to density of the highest atmospheric component:

_{
}

The light model becomes:

_{
}

By this method, model can be expanded to any number of components.

VI. Neural network on GPU

This implementation of perceptron depends on NV_vertex_program2_option extension. It uses both address registers for weights access, and branching and subroutines to make program fit into 256 instruction limit. It uses 9 temporaries and 86 local parameters, and only first vertex attribute. Weights used for matrix multiplication are separated from weights used as bias. One address register is used for access to matrix multiplication weights and it moves by 4 during address addition, while the other is used for access to bias weights and it moves by 1.

There are
two versions of vertex program used, one for observer, and one for probe
positioned on watched point on surface. They differ by method of *U*
vector calculation.

!!ARBvp1.0

OPTION NV_vertex_program2;

ATTRIB iPos = vertex.attrib[0];

PARAM mvinv[4] = { state.matrix.modelview.invtrans };

PARAM mv[4] = { state.matrix.modelview };

PARAM mvp[4] = { state.matrix.mvp };

PARAM lightDir = state.light[0].position;

PARAM const1 = { 1.0, -2.0, 10.0, 0.0 };

PARAM a1const = { 0.0, 0.0, 0.0, 4.0 };

PARAM a2const = { 0.0, 0.0, 0.0, 1.0 };

PARAM xxx = { 1.1111, 0.2, 0.05, -0.05 };

PARAM nn[83] = { program.local[4..86] };

PARAM nndat = program.local[0];

PARAM pdat = program.local[1];

PARAM pdot = program.local[2];

TEMP t1, t2, t3, t4, t5, t6, t7, t8, t9, t10;

ADDRESS addr1, addr2;

OUTPUT oPos = result.position;

ALIAS up = t3;

ALIAS lPos = t4;

ALIAS eye = t5;

#nndat.y is the number of weights used in matrix multiplication.

MOV t1, a2const;

ADD t1.y, t1, nndat.y;

ARL addr1, a1const;

ARL addr2, t1;

#vector U:

MOV t1, mvinv[3];

DP3 up.x, mvinv[0], t1;

DP3 up.y, mvinv[1], t1;

DP3 up.z, mvinv[2], t1;

DP3 t1.w, up, up;

RSQ t1.w, t1.w;

MUL up.xyz, t1.w, up;

#vector L

DP3 lPos.x, mvinv[0], lightDir;

DP3 lPos.y, mvinv[1], lightDir;

DP3 lPos.z, mvinv[2], lightDir;

DP3 t1.w, lPos, lPos;

RSQ t1.w, t1.w;

MUL lPos.xyz, t1.w, lPos;

#this is used for drawing atmosphere layer when planet sphere is scaled by small factor

MOV t2, iPos;

MUL t2.xyz, t2, pdat.x;

MOV t2.w, const1.x;

#vector V

DPH eye.x, t2, mv[0];

DPH eye.y, t2, mv[1];

DPH eye.z, t2, mv[2];

DP3 t1.w, eye, eye;

RSQ t1.w, t1.w;

MUL eye.xyz, t1.w, eye;

DP4 oPos.x, t2, mvp[0];

DP4 oPos.y, t2, mvp[1];

DP4 oPos.z, t2, mvp[2];

DP4 oPos.w, t2, mvp[3];

ALIAS nnIn = t7;

#this is calculation of input of neural network

DP3 nnIn.x, eye, up;

DP3 nnIn.y, eye, lPos;

DP3 nnIn.z, lPos, up;

MOV nnIn.w, pdat.y;

MUL nnIn, nnIn, pdot;

ALIAS arg = t5;

ALIAS res = t6;

ALIAS arg2 = t8;

ALIAS res2 = t9;

#Neural network:

CAL nnSubNet;

MOV t1, res;

CAL nnSubNet;

MOV t2, res;

CAL nnSubNet;

MOV t3, res;

CAL nnSubNet;

MOV t4, res;

DP4 res.x, nn[addr1.y + 0], t1;

DP4 res.y, nn[addr1.y + 1], t2;

DP4 res.z, nn[addr1.y + 2], t3;

DP4 res.w, nn[addr1.y + 3], t4;

#ARA addr1.xy, addr1;

CAL nnBiasFunc (TR);

#final transformation

SGE arg, res, xxx.z;

MUL arg, arg, xxx.x;

ADD res, res, xxx.w;

MUL res, res, arg;

MUL res.w, res.w, nndat.x;

MUL arg.xyz, res, res.w;

MUL arg.xyz, arg, const1.z;

MOV arg.w, const1.x;

MOV result.color.front, arg;

BRA endProg (TR);

#a single subnetwork

nnSubNet:

MOV arg, nnIn;

CAL nnLayer4_8 (TR);

MOV arg, res;

MOV arg2, res2;

CAL nnLayer8_4 (TR);

RET;

#transformation from 4-dim nn-layer to 4-dim nn-layer (unused)

nnLayer4_4:

DP4 res.x, nn[addr1.y + 0], arg;

DP4 res.y, nn[addr1.y + 1], arg;

DP4 res.z, nn[addr1.y + 2], arg;

DP4 res.w, nn[addr1.y + 3], arg;

ARA addr1.xy, addr1;

CAL nnBiasFunc (TR);

RET;

#transformation from 4-dim nn-layer to 8-dim nn-layer

# arg => res, res2

nnLayer4_8:

DP4 res.x, nn[addr1.y + 0], arg;

DP4 res.y, nn[addr1.y + 1], arg;

DP4 res.z, nn[addr1.y + 2], arg;

DP4 res.w, nn[addr1.y + 3], arg;

ARA addr1.xy, addr1;

DP4 res2.x, nn[addr1.y + 0], arg;

DP4 res2.y, nn[addr1.y + 1], arg;

DP4 res2.z, nn[addr1.y + 2], arg;

DP4 res2.w, nn[addr1.y + 3], arg;

ARA addr1.xy, addr1;

CAL nnBiasFunc (TR);

MOV arg2, res;

MOV res, res2;

CAL nnBiasFunc (TR);

MOV res2, res;

MOV res, arg2;

RET;

#transformation from 8-dim nn-layer to 4-dim nn-layer

# arg, arg2 => res

nnLayer8_4:

DP4 res.x, nn[addr1.y + 0], arg;

DP4 res2.x, nn[addr1.y + 1], arg2;

DP4 res.y, nn[addr1.y + 2], arg;

DP4 res2.y, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

DP4 res.z, nn[addr1.y + 0], arg;

DP4 res2.z, nn[addr1.y + 1], arg2;

DP4 res.w, nn[addr1.y + 2], arg;

DP4 res2.w, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

ADD res, res, res2;

CAL nnBiasFunc (TR);

RET;

#transformation from 8-dim nn-layer to 8-dim nn-layer (10th temporary is needed)

# arg, arg2 => res, res2

nnLayer8_8:

DP4 res.x, nn[addr1.y + 0], arg;

DP4 res2.x, nn[addr1.y + 1], arg2;

DP4 res.y, nn[addr1.y + 2], arg;

DP4 res2.y, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

DP4 res.z, nn[addr1.y + 0], arg;

DP4 res2.z, nn[addr1.y + 1], arg2;

DP4 res.w, nn[addr1.y + 2], arg;

DP4 res2.w, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

ADD res, res, res2;

MOV t10, res;

DP4 res.x, nn[addr1.y + 0], arg;

DP4 res2.x, nn[addr1.y + 1], arg2;

DP4 res.y, nn[addr1.y + 2], arg;

DP4 res2.y, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

DP4 res.z, nn[addr1.y + 0], arg;

DP4 res2.z, nn[addr1.y + 1], arg2;

DP4 res.w, nn[addr1.y + 2], arg;

DP4 res2.w, nn[addr1.y + 3], arg2;

ARA addr1.xy, addr1;

ADD res, res, res2;

MOV arg, res;

MOV res, t10;

CAL nnBiasFunc (TR);

MOV t10, res;

MOV res, arg;

CAL nnBiasFunc (TR);

MOV res2, res;

MOV res, t10;

RET;

#Bias addition and perceptron activation function.

nnBiasFunc:

ADD res, nn[addr2.y + 0], res;

MUL res, res, const1.y;

EX2 res.x, res.x;

EX2 res.y, res.y;

EX2 res.z, res.z;

EX2 res.w, res.w;

ADD res, res, const1.xxxx;

RCP res.x, res.x;

RCP res.y, res.y;

RCP res.z, res.z;

RCP res.w, res.w;

ARA addr2.xy, addr2;

RET;

endProg:

END

VII. Results

Images below depict atmosphere and planet form different heights. . This atmosphere consists of two components, one with following parameters:

_{
}

_{
}

and second:

_{
}

_{
}

Color of
surface is constant (_{
}

On Athlon 1700+, GeForce FX 5500, in 1280x1024x32 resolution, when surface mesh consists of 2000 triangles, the program runs with speed of 9 frames per second.

Sea level (r=10000, m=10000)

A bit higher (r=10000, m=10000)

...higher (r=10000, m=10000)

..and higher (r=10000, m=10000)

Sun is low. Observer at sea level. (r=10000, m=10000)

Very high. (r=10000, m=10000)

Dense atmosphere, sea level (r=100, m=100)

Dense atmosphere, space (r=100, m=100)

Neural network used here during learning uses 800.000 random samples of function I.

It is presented below how error was changing during learning in case of 800.000 samples, and in case of 8.000.000 samples. As this charts show, in first case, the network is not fully learned, but this case is much faster, and it yields good results.

Konrad Pietras

keyei 'at' stud.ics.p.lodz.pl

March, 2005