The Algorithm
In this section, we will briefly sketch the algorithm we use to compute the
results presented in our paper. In order to ease the understanding, we
abstract from the mathematical and physical background, and refer the reader
to our paper for details.
Initialisation
As a first step, we precompute the force field induced by the input image. To
this extent, we move an imaginary test charge from grid point to grid point,
and evaluate the force acting on this charge. This vector is stored in a big
lookup matrix. In the following, all particles are represented by their
positions.
for all grid points p {
forcefieldp ← 0
for all grid points g ≠ p {
forcefieldp ← forcefieldp + force(p,g)
}
}
Next, we distribute the particles to the image domain. Any distribution for
which the position of each pair of particles is distinct can be used. For a
faster convergence, we arrange them proportionally to the grey value of the
underlying image, though:
until all particles p are distributed {
x ← select_random_pixel()
if x contains no particle {
if random_number(0,max_grey) > grey_value(x) {
place p at x and continue with next p
}
}
}
Evolution
After the initialisation is done, we start the main iterations which are
continued until a certain number of iterations is made, or until the system
converges. The optional 'shaking' step depicted in the pseudocode below is
only required for certain applications (like dithering), and prevents pixels
from getting stuck in a local minimum. Its strength is dependent on the
iteration number. It is stronger in the beginning and weaker in the end. This
simulated annealing strategy is described in more detail in the paper.
for i = 1 .. iterations (or until convergence) {
for all particles p {
forcep ← bilinear_interpolation(forcefield, p)
for all particles q ≠ p {
forcep ← forcep + force(q,p)
}
}
for all particles p {
p ← p + τ ⋅ forcep
if (i MOD 10) = 0 {
p ← p + random_shaking(i)
}
}
}
All extensions described in the previous section
can be included into this framework in a straightforward manner. We will thus
not go into more detail.
GPU-Specific Notes
Due to its high computational complexity, the algorithm we describe in our
paper is designed for CUDA-enabled graphics cards. Here, we obtain a speedup of
about 350 compared to a simple CPU implementation. The reason for this enormous
speedup is given by the high computational complexity of the algorithm and by
efficient memory access patterns.
Our GPU algorithm still adheres to the pseudocode above, but computes the
summation over all forces in a style described by Nyland
et al. in context of general N-Body problems. As one can check,
the interaction of particles in our system is nothing more than a classical
N-Body simulation, and the same holds for the precomputation of the attractive
force field. To this end, our algorithm differs in a lookup of force field
entries during the particle evolution (which we efficiently realise by texture
fetches), by a different distance measure, and by a simplified computation of
increments.
See also: Fast Summation Algorithm
Recently, we developed a fast summation algorithm based on the non-equispaced
fast Fourier transform (NFFT). This algorithm obtains the same quality as the
aforementioned method in a fraction of its runtime. While the algorithm in [1] describes the implementation on the CPU, [2] presents a fast parallel variant for modern GPUs.
-
T. Teuber, G. Steidl, P. Gwosdek, C. Schmaltz, J. Weickert:
Dithering by differences of convex functions.
SIAM Journal on Imaging Sciences, Vol. 4, No. 1, 79-108, January 2011.
Revised version of
Technical Report No. Pr-2010-01,
Department of Mathematics, University of Mannheim, Germany, April 2010.
-
P. Gwosdek, C. Schmaltz, J. Weickert, T. Teuber:
Fast Electrostatic Halftoning.
Technical Report No. 295, Department of Mathematics,
Saarland University, Saarbrücken, Germany, June 2011.
< Extensions Main Page Examples >
|