GPU-based multi-layer perceptron as efficient method for approximation complex light models in per-vertex lighting.

This paper describes a display method of the sky color on GeForce FX hardware. Lighting model used here is taken from “Display of the Earth taking into account atmospheric scattering” by Tomoyuki Nishita, however this model is not the only suitable one in the proposed method.

A model of lighting used here is described by the following expression:

which describes a dependence of observed light color of position and orientation of observer.

- Optical length, amount of air along viewing direction.

- Phase function, amount of light scattered in

- angle between viewing direction V and vertical direction U.

- angle between V and sun direction L.

 - angle between U and L.

 - relative density (surface = 1). Real density is contained in coefficient b.

 – altitude of observer.

 – distance between observer and planet center.

- scale factor

- planet radius

I.                   Optical length

Attenuation of a light beam depends exponentially on optical length. In this implementation optical length is calculated only for acute angles between viewing direction and zenith.

In order to improve speed of calculations of optical length, values of are calculated, and used as lookup table . Because depends exponentially on , light-air interactions primarily depends on density, all calculations are made on . This allows a constant interval in the above integral. Therefore function is sampled on .

A single sample is evaluated as follows:



Function is nearly constant at small angles, but it grows rapidly, when is close to right angle. In order to improve precision, and speed, the second dimension of is sampled at constant intervals of

After lookup table is calculated, optical length is sampled using the same intervals as with function :

While decreases during summation in interval-length steps, must be calculated as follows:

Lookup table is used directly for angles, where . For angles, where , value of optical length may be obtained by the following property:


II.                Color of atmosphere

The color of atmosphere is calculated alike as linear mass of atmosphere. Increased complexity is caused by the fact that treated directions lie in three-dimensional space, summed samples depend on four parameters , which depend in different manner on integration step .

Step moves along direction V. If four coordinates of point of observation are marked as , then a point moved a distance in direction V has coordinates calculated as follows:

Let us mark a distance from planet center to point of observation as

Distance from planet center to point is:

Density in point equals:

As point moves, direction U changes due to planet curvature. Angle a changes accordingly:

Angle b is constant because light beams are considered parallel.

Angle q changes as follows:

Angle is measured between surface spanned by U-V vertices and L-V surface. During translation of point along direction V only direction U is changing, and only in surface U-V, so angle between surfaces U-V and L-V is constant.

Finally, color of atmosphere is equal:

Additionally, when and , then point is in planet’s shadow, and the sample is not summed.

I and b consist of three coordinates: red, green and blue. This gives three expressions used in application for obtaining the color of atmosphere:

III.             Approximation

In previous points implementation of lighting model was described.


The color of atmosphere is a function dependent of four parameters. Its complexity causes impossibility of using it directly in real-time working applications. Dobashi, Yamamoto and Nishita in “Interactive Rendering of Atmospheric Scattering Effects Using Graphic Hardware” proposed using many sampling layers, and exploiting blending for integration.

In order to avoid rendering multiple layers, the light model was approximated. Additionally approximation is performed directly on GPU’s vertex processor.

Precisely, function of color of atmosphere I is approximated by function , which is a multi-layer percepron.

Neural network used here has a simple topology. It consists of four independent and identical perceptrons, each one for calculating a single output.

Each sub-network shares the same input, which is four-dimensional vector .

Single sub-network consists of two hidden layers, with eight neurons in first layer, and four in second. Whole network uses 324 weights. Each neuron is biased, and has a following activation function:

The form of this approximator is deeply influenced by graphic processor architecture, and exploits its speed in calculating dot products.

Network is learned of the light model by separate program, and after learning process, it is used in GPU for coloring the sky and surface.

Three first outputs are learned of scaled color values:


, where . Fourth output is learned of value:

, where is calculated from 100 thousands of random samples

This „normalization” of color values, and separate calculation by independent sub-networks was necessary to achieve low-error results. As inputs, values of from ranges are used. During learning process, these values are taken in a random fashion with uniform distribution.

In case of night, when , network is learned of value:


This case caused multiplication by 0.9, and addition of 0.05 for coefficient M, for better recognition between night and day.

In order to reduce error, and to exploit extrapolation property of perceptron, in case of night R,G and B outputs are not learned.

IV.             Color of surface

Method presented above, gives only the color of atmosphere, when observed from inside into space. It is necessary to modify this method, in order to obtain surface color, or color of atmosphere between observer and surface.

This situation is presented on picture below. Direction L is constant, due to infinite light source. An incoming light beam to observer looking into direction V may be divided into two parts: Light reflected from surface, and light scattered in atmosphere along direction V.

Scattered light part is obtained from approximator by using it twice.

Observer I sees the same atmosphere as observer II, plus light scattered between both observers. Thus color of the light scattered between observer and surface can be obtained as a difference between neural network’s output for observer, and output received for probe positioned on watched point on surface.

Reflected light part is obtained by using the model of atmosphere directly.:

, where angles and densities are taken for observers I and II, C means color of surface with no atmosphere in point II, function F is in most general case a BDRF for surface in point II. This calculation is performed on GPU’s fragment processor.

V.                Multiple atmospheres

Atmosphere of earth is a multi-component atmosphere. It consists of gaseous part, and aerosols. Each of components scatters incident light differently, and thus must be treated separately. All above mechanisms are described for single atmosphere. In order to obtain proper color of atmosphere for multi-component atmospheres, a small addition is needed.

This addition is correct, if a single component differs only by phase function F, coefficient b, and scaling factor .

Because all the above calculations are based on densities, rather than on heights, it is necessary to scale all densities to density of the highest atmospheric component:

The light model becomes:

By this method, model can be expanded to any number of components.

VI.             Neural network on GPU

This implementation of perceptron depends on NV_vertex_program2_option extension. It uses both address registers for weights access, and branching and subroutines to make program fit into 256 instruction limit. It uses 9 temporaries and 86 local parameters, and only first vertex attribute. Weights used for matrix multiplication are separated from weights used as bias. One address register is used for access to matrix multiplication weights and it moves by 4 during address addition, while the other is used for access to bias weights and it moves by 1.

There are two versions of vertex program used, one for observer, and one for probe positioned on watched point on surface. They differ by method of U vector calculation.

OPTION NV_vertex_program2;

ATTRIB iPos = vertex.attrib[0];

PARAM mvinv[4] = { state.matrix.modelview.invtrans };
PARAM mv[4] = { state.matrix.modelview };
PARAM mvp[4] = { state.matrix.mvp };
PARAM lightDir = state.light[0].position;
PARAM const1 = { 1.0, -2.0, 10.0, 0.0 };
PARAM a1const = { 0.0, 0.0, 0.0, 4.0 };
PARAM a2const = { 0.0, 0.0, 0.0, 1.0 };
PARAM xxx = { 1.1111, 0.2, 0.05, -0.05 };
PARAM nn[83] = { program.local[4..86] };
PARAM nndat = program.local[0];
PARAM pdat = program.local[1];
PARAM pdot = program.local[2];
TEMP t1, t2, t3, t4, t5, t6, t7, t8, t9, t10;

ADDRESS addr1, addr2;

OUTPUT oPos = result.position;

ALIAS up = t3;
ALIAS lPos = t4;
ALIAS eye = t5;

#nndat.y is the number of weights used in matrix multiplication.
MOV t1, a2const;
ADD t1.y, t1, nndat.y;
ARL addr1, a1const;
ARL addr2, t1;

#vector U:
MOV t1, mvinv[3];
DP3 up.x, mvinv[0], t1;
DP3 up.y, mvinv[1], t1;
DP3 up.z, mvinv[2], t1;
DP3 t1.w, up, up;
RSQ t1.w, t1.w;
MUL, t1.w, up;

#vector L
DP3 lPos.x, mvinv[0], lightDir;
DP3 lPos.y, mvinv[1], lightDir;
DP3 lPos.z, mvinv[2], lightDir;
DP3 t1.w, lPos, lPos;
RSQ t1.w, t1.w;
MUL, t1.w, lPos;

#this is used for drawing atmosphere layer when planet sphere is scaled by small factor
MOV t2, iPos;
MUL, t2, pdat.x;
MOV t2.w, const1.x;

#vector V
DPH eye.x, t2, mv[0];
DPH eye.y, t2, mv[1];
DPH eye.z, t2, mv[2];
DP3 t1.w, eye, eye;
RSQ t1.w, t1.w;
MUL, t1.w, eye;

DP4 oPos.x, t2, mvp[0];
DP4 oPos.y, t2, mvp[1];
DP4 oPos.z, t2, mvp[2];
DP4 oPos.w, t2, mvp[3];

ALIAS nnIn = t7;
#this is calculation of input of neural network
DP3 nnIn.x, eye, up;
DP3 nnIn.y, eye, lPos;
DP3 nnIn.z, lPos, up;
MOV nnIn.w, pdat.y;
MUL nnIn, nnIn, pdot;

ALIAS arg = t5;
ALIAS res = t6;
ALIAS arg2 = t8;
ALIAS res2 = t9;

#Neural network:
CAL nnSubNet;
MOV t1, res;
CAL nnSubNet;
MOV t2, res;
CAL nnSubNet;
MOV t3, res;
CAL nnSubNet;
MOV t4, res;
DP4 res.x, nn[addr1.y + 0], t1;
DP4 res.y, nn[addr1.y + 1], t2;
DP4 res.z, nn[addr1.y + 2], t3;
DP4 res.w, nn[addr1.y + 3], t4;
#ARA addr1.xy, addr1;
CAL nnBiasFunc (TR);

#final transformation
SGE arg, res, xxx.z;
MUL arg, arg, xxx.x;
ADD res, res, xxx.w;
MUL res, res, arg;
MUL res.w, res.w, nndat.x;
MUL, res, res.w;
MUL, arg, const1.z;
MOV arg.w, const1.x;
MOV result.color.front, arg;

BRA endProg (TR);

#a single subnetwork
MOV arg, nnIn;
CAL nnLayer4_8 (TR);
MOV arg, res;
MOV arg2, res2;
CAL nnLayer8_4 (TR);

#transformation from 4-dim nn-layer to 4-dim nn-layer (unused)
DP4 res.x, nn[addr1.y + 0], arg;
DP4 res.y, nn[addr1.y + 1], arg;
DP4 res.z, nn[addr1.y + 2], arg;
DP4 res.w, nn[addr1.y + 3], arg;
ARA addr1.xy, addr1;
CAL nnBiasFunc (TR);

#transformation from 4-dim nn-layer to 8-dim nn-layer
# arg => res, res2
DP4 res.x, nn[addr1.y + 0], arg;
DP4 res.y, nn[addr1.y + 1], arg;
DP4 res.z, nn[addr1.y + 2], arg;
DP4 res.w, nn[addr1.y + 3], arg;
ARA addr1.xy, addr1;
DP4 res2.x, nn[addr1.y + 0], arg;
DP4 res2.y, nn[addr1.y + 1], arg;
DP4 res2.z, nn[addr1.y + 2], arg;
DP4 res2.w, nn[addr1.y + 3], arg;
ARA addr1.xy, addr1;
CAL nnBiasFunc (TR);
MOV arg2, res;
MOV res, res2;
CAL nnBiasFunc (TR);
MOV res2, res;
MOV res, arg2;

#transformation from 8-dim nn-layer to 4-dim nn-layer
# arg, arg2 => res
DP4 res.x, nn[addr1.y + 0], arg;
DP4 res2.x, nn[addr1.y + 1], arg2;
DP4 res.y, nn[addr1.y + 2], arg;
DP4 res2.y, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
DP4 res.z, nn[addr1.y + 0], arg;
DP4 res2.z, nn[addr1.y + 1], arg2;
DP4 res.w, nn[addr1.y + 2], arg;
DP4 res2.w, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
ADD res, res, res2;
CAL nnBiasFunc (TR);

#transformation from 8-dim nn-layer to 8-dim nn-layer (10th temporary is needed)
# arg, arg2 => res, res2
DP4 res.x, nn[addr1.y + 0], arg;
DP4 res2.x, nn[addr1.y + 1], arg2;
DP4 res.y, nn[addr1.y + 2], arg;
DP4 res2.y, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
DP4 res.z, nn[addr1.y + 0], arg;
DP4 res2.z, nn[addr1.y + 1], arg2;
DP4 res.w, nn[addr1.y + 2], arg;
DP4 res2.w, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
ADD res, res, res2;
MOV t10, res;
DP4 res.x, nn[addr1.y + 0], arg;
DP4 res2.x, nn[addr1.y + 1], arg2;
DP4 res.y, nn[addr1.y + 2], arg;
DP4 res2.y, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
DP4 res.z, nn[addr1.y + 0], arg;
DP4 res2.z, nn[addr1.y + 1], arg2;
DP4 res.w, nn[addr1.y + 2], arg;
DP4 res2.w, nn[addr1.y + 3], arg2;
ARA addr1.xy, addr1;
ADD res, res, res2;
MOV arg, res;
MOV res, t10;
CAL nnBiasFunc (TR);
MOV t10, res;
MOV res, arg;
CAL nnBiasFunc (TR);
MOV res2, res;
MOV res, t10;

#Bias addition and perceptron activation function.
ADD res, nn[addr2.y + 0], res;
MUL res, res, const1.y;
EX2 res.x, res.x;
EX2 res.y, res.y;
EX2 res.z, res.z;
EX2 res.w, res.w;
ADD res, res, const1.xxxx;
RCP res.x, res.x;
RCP res.y, res.y;
RCP res.z, res.z;
RCP res.w, res.w;
ARA addr2.xy, addr2;


VII.          Results

Images below depict atmosphere and planet form different heights. . This atmosphere consists of two components, one with following parameters:

and second:

Color of surface is constant ( ), and any variation from this value is due to atmosphere. Surface and atmosphere is a triangular mesh, and aplication uses a slightly modified CLOD algoritm described by Henri Hakl in “Diamond Algorithm” Mesh is based on sides of icosahedron (on left).

On Athlon 1700+, GeForce FX 5500, in 1280x1024x32 resolution, when surface mesh consists of 2000 triangles, the program runs with speed of 9 frames per second.

Sea level (r=10000, m=10000)

A bit higher (r=10000, m=10000)

...higher (r=10000, m=10000)

..and higher (r=10000, m=10000)

Sun is low. Observer at sea level. (r=10000, m=10000)

Very high. (r=10000, m=10000)

Dense atmosphere, sea level (r=100, m=100)

Dense atmosphere, space (r=100, m=100)

Neural network used here during learning uses 800.000 random samples of function I.

It is presented below how error was changing during learning in case of 800.000 samples, and in case of 8.000.000 samples. As this charts show, in first case, the network is not fully learned, but this case is much faster, and it yields good results.

Konrad Pietras

keyei 'at'

March, 2005