Pixels flow on the GPU

posted by on 2019.04.10, under openFrameworks

This post is about how to implement a vector field generated by Perlin noise directly on the GPU using GLSL. If you want, you can regard this as the step 0 before implementing fluid simulations with shaders. I have written elsewhere here about vector fields, and how to implement them on the CPU, but let’s recall the idea, and in particular the concept of a “lookup table”. Namely, in our case a lookup table is a two-dimensional array which encodes the vector field: you can imagine a grid with coordinates (i,j), such that to each of its cells a 2-dimensional vector is attached. The motion of a given particle in the vector field is obtained by displacing the particle in the cell (i,j) via the value of the lookup table at the given cell. To represent it on the screen, we compute the new position of the particle, and draw there. This seemingly redundant comment will be relevant later, just wait.
Okay, the procedure above is easy-peasy, if we want to move pixels, we just define a grid as big as the whole screen, so that the cell (i,j) will correspond to the pixel at position (i,j) indeed; we do the computations above, and go home. Boom! Ehm, no, not really. The point is that if you tried to do just that, the iteration over all the pixels in the screen would take so long that you would get barely a frame or two every couple of seconds (probably less). Sorry, about that.
Enter shaders! We can indeed use the GPU to perform all these calculations via a custom made fragment shader. First, we need to sort out how to send the information contained in the lookup table to the shader. Since the table is nothing else than a two-dimensional array, we can write the value of the field directly in a texture. On modern graphic cards, textures are very quick to upload, and we can upload more than one. Wait a minute, aren’t textures supposed to be used for… colors and stuff? Yes, but. A texture is nothing else that a carrier for data: more precisely, at each of its coordinate, it contains the value of red, green, blue and alpha which will be combined to provide the color for the pixel at the given coordinate. We can then use two of the color channels to provide the x and y component of a vector. In this case, I have chosen a unit vector field, i.e. at each point the vector is specified just by an angle, given by the value of Perlin noise at the given coordinate. This datum is written on the blue channel of the ofImage field in the code, from which we will obtain a texture. Another texture will contain the pixels we want to displace: I will refer to it as the “ink” texture. Finally to update the ink texture we will use a “ping-pong technique”, about which I have written here.
Now that we have sorted out a slick way to send data to the GPU, we have to deal with the elephant in the room. As I commented earlier, the CPU algorithm is based on the fact that we calculate the new position of the particle at (x,y) by obtaining the value of the vector field at the very same position, move the particle, and draw something (a circle, a pixel, etc.) “there”. Unfortunately, the fragment shader does not allow to “move” fragments, since everything it knows is the given fragment! This is encapsulated in my favourite motto concerning shaders “It is a lonely world, no one talks to no one else here!”: this means that everything we know about a vertex or a fragment can’t be shared. :(
Luckily, there is a way out, and it comes courtesy of textures. A texture can indeed be “looked up” from any fragment: so, instead of moving the particle, we trace back its motion. In other words, if we are at the fragment in position p, instead of saying “go to the fragment p + field(p) and draw my color”, we say “the color at p is the color of the ink texture at p-field(p)”. There is an explanation why this is a sensible idea, and it has to do with (partial) differential equations, and their (local) flows.
We can now look at the code in openFrameworks, where I have added some mouse interactivity for fun. Notice you need to provide an image to start with.


#include "ofMain.h"
#include "ofApp.h"

int main( ){
    ofGLFWWindowSettings settings;
    settings.setGLVersion(3, 2); //we define the OpenGL version we want to use
    settings.setSize(1024, 680);
    // this kicks off the running of my app
    ofRunApp(new ofApp());



#pragma once

#include "ofMain.h"

class ofApp : public ofBaseApp{

        void setup();
        void update();
        void draw();
        ofImage field;
        ofImage ink;
        ofImage photo;
        ofTexture inkTex;
        ofTexture fieldTex;
        ofShader shader;
        ofFbo fbo;
        ofFbo main;

        float t = 0;
        float mouse_x;
        float mouse_y;

        void keyPressed(int key);
        void keyReleased(int key);
        void mouseMoved(int x, int y );
        void mouseDragged(int x, int y, int button);
        void mousePressed(int x, int y, int button);
        void mouseReleased(int x, int y, int button);
        void mouseEntered(int x, int y);
        void mouseExited(int x, int y);
        void windowResized(int w, int h);
        void dragEvent(ofDragInfo dragInfo);
        void gotMessage(ofMessage msg);


#include "ofApp.h"

void ofApp::setup(){


    //Allocating images and textures

    ink.allocate(ofGetWidth(), ofGetHeight(), OF_IMAGE_COLOR);
    field.allocate(ofGetWidth(), ofGetHeight(), OF_IMAGE_COLOR);
    fbo.allocate(ofGetWidth(), ofGetHeight(), GL_RGB);
    main.allocate(ofGetWidth(), ofGetHeight(), GL_RGB);

    //Setting up the lookup table
    ofPixels pix = field.getPixelsRef();

    for (float x = 0; x < ofGetWidth(); x++) {
        for (float y = 0; y < ofGetHeight(); y++) {
            float st = ofNoise(x * 0.0001, y * 0.0001);
            pix.setColor(x, y, ofColor(0.0, 0.0, st * 2.0 ));



    fieldTex = field.getTexture();

    photo.resize(ofGetWidth(), ofGetHeight());

    inkTex = photo.getTexture();   

    photo.draw(0, 0);

    shader.load("shader.vert", "shader.frag");

void ofApp::update(){
    if (t < 2) {
        t += 0.001;
    mouse_x = (ofGetMouseX() - mouse_x) * 0.1;
    mouse_y = (ofGetMouseY() - mouse_y) * 0.1;

void ofApp::draw(){
        shader.setUniformTexture("tex0", main.getTexture(), 0);
        shader.setUniformTexture("tex1", fieldTex, 1);
        shader.setUniform1f("windowWidth", ofGetWidth());
        shader.setUniform1f("windowHeight", ofGetHeight());
        shader.setUniform1f("mx", mouse_x/ ofGetWidth());
        shader.setUniform1f("my", mouse_y /ofGetHeight() );
        shader.setUniform1f("t", t);
        main.draw(0, 0);
        swap(fbo, main);

       fbo.draw(0, 0);

void ofApp::mousePressed(int x, int y, int button){
    ofSetColor(ofRandom(0.0, 255.0), ofRandom(0.0, 255.0), ofRandom(0.0, 255.0));
    ofDrawCircle(ofGetMouseX(), ofGetMouseY(), 30, 30);

Here are is the vertex shader


#version 150

uniform mat4 modelViewMatrix;
uniform mat4 projectionMatrix;
uniform mat4 textureMatrix;
uniform mat4 modelViewProjectionMatrix;

in vec4 position;
in vec4 color;
in vec4 normal;
in vec2 texcoord;

out vec2 varyingtexcoord;
uniform sampler2DRect tex0;

void main()
    varyingtexcoord = texcoord.xy;
    gl_Position = modelViewProjectionMatrix * position;

and the fragment shader


#version 150

// this is how we obtain the textures
uniform sampler2DRect tex0;
uniform sampler2DRect tex1;

in vec2 varyingtexcoord;
uniform float mx;
uniform float my;
uniform float windowWidth;
uniform float windowHeight;
uniform float t;
out vec4 outputColor;

void main()

    float x = gl_FragCoord.x / windowWidth;
    float y = gl_FragCoord.y / windowHeight;
    float l = sqrt((x - mx) * (x - mx) + (y - my) * (y - my) ) * t;
    vec2 xy = vec2(cos( ((texture(tex1, varyingtexcoord).z) + mx) * 2 * 3.14), sin(((texture(tex1, varyingtexcoord).z) + my) * 2 * 3.14));
    outputColor =  texture(tex0, (varyingtexcoord - l * xy) );

It looks like this

A better result can be obtained by interpolating on the output color, but I was lazy. :)
Why did I mention that this is step 0 in understanding how to implement a fluid simulation (like smoke, dust, etc.) on the GPU? This is related to the fact that the vector field in this case is *fixed*, i.e. it is an external vector field. In a fluid, the velocity field is a solution to the Navier-Stokes equations which can be exactly solved for very few cases, and it is “advected”, i.e. you have to imagine that it gets transported itself by the fluid. A nice article on these topics can be found here: if you get comfortable with the basic concepts in this post, you will able to follow it, modulo probably some of the maths.
Nowadays there are tons of libraries to do fluid simulations, like for instance ofxFluids, ofxFlowTools, etc. Why bother, then? I could say because understanding things is always better, but the truth is I do not have the answer to this question: you will have to find your own. :)



your code can be optimized a bit
You should use that part only once and store the result in a vec4 :

vec4 im1 = texture(tex1, varyingtexcoord);

You also could use the RGB channel of an image to create a distorsion , and define (with an uniform) a distanceMax by channel. For example (it’s really just an example, not a silver bullet)

uniform vec2 distances; //contrains something like (50.0,100.0)

const PI2 = 6.2832…;

vec4 im1 = texture(tex1, varyingtexcoord);
float offsetX = cos( img1.r * PI2) * img1.g * distance.r / windowwidth;
float offsetY = sin(img1.b * PI2) * img1.a * distance.g / windowheight;
vec2 offset = vec2 (offsetX ,offsetY );

gl_FragColor = texture(tex0 , varyingtexcoord + offset);


You also could do this kind of effect directly in the vertex shader, it would be much much more optimized (even if it should work like a charm in the fragment shader)

Thomas Le Coz ( 10/04/2019 at 9:47 am )

    thanks for your comment. As a rule of thumb, my code can *always* be optimized! :)
    Generally, everything I write on this blog has to be understood as a proof of concept, and never as a final version. Not being a computer programmer, I myself have little interest for optimization or advanced techniques unless they provide a clear improvement, like for instance the use of shaders in this post. Actually, I find that optimization sometimes can hinder artistic practice, which is what I am after: I have written about it somewhere, but forgot where…
    Clearly, if I have to make an installation or deploy my code for small devices, or in all the cases where resources are scarce, I am much more careful.
    Your corrections are indeed quite elegant, but I would be very surprised if they introduced a tangible improvement on a restricted scale…
    The comment is nevertheless appreciated.

    me ( 10/04/2019 at 12:08 pm )

“my code can *always* be optimized! ”
Mine too of course ! :)

“I find that optimization sometimes can hinder artistic practice, which is what I am after”

I agree because a beautiful error can emerge from spaghetti-code (it happens to me all the time)

But I also disagree because I believe in “the elegance of code”: a beautiful result can be based on a beautiful and easy to read code. I use to start with speghetti and finish with well-defined variables.

Also, I already tryed to create a kind of GLSL language by myself ( a high level language that auto-translate into assembly-code , targeting Adobe Flash & Stage3D ) and I’m sure that the code should be decomposed into small functions and reuse as much as possible because the real final code executed by the GPU has, sometimes, nothing to do with the GLSL version you wrote, because of the tons of limitation you have in order to execute the calculation.

“I would be very surprised if they introduced a tangible improvement on a restricted scaleā€¦”

in GLSL if you write

float result = 0.0;
if(img.r > 0.5) result = doThis()
else if(img.g>0.3) result = doThat()
else result = doSomething();

on the CPU , a single function will be called.
on the GPU , the three function will be called, and the result of two of them will be multiplyed by zero before being added to the final result “result”

Another example, imagine you have something like that

vec4 v0 = (vec4(1.0) + vec4(2.0) + vec4(3.0) + vec4(4.0) );
vec4 v1 = (vec4(5.0) + vec4(6.0) + vec4(7.0) + vec4(8.0) );
vec4 v2 = (vec4(9.0) + vec4(10.0) + vec4(11.0) + vec4(12.0) );
vec4 v3 = (vec4(13.0) + vec4(14.0) + vec4(15.0) + vec4(16.0) );
vec4 v4 = v0 + v1;
vec4 v5 = v2 + v3;
vec4 result = v4 + v5;

Look simple isn’t it ?

Well, in my opinion this code may requier hundreds of computations if you run it on a low graphic card because you have a finite amount of variable that you can handle at the same time. On some graphic card, you can only handle 8 vec4 at the same time, then you will meet a problem.

The computation is doable and it will be done, but the core of the code will look like the solution of a rubik’s cube – doable but not obvious, with a lot of intermediate steps in order to process

on the first line
“vec4 v0 = (vec4(1.0) + vec4(2.0) + vec4(3.0) + vec4(4.0) );”

already 5 vec4 are involved , if you have only 8 , imagine the complexity that need to be applyed during the translation in order to compute the result :)

Honnestly, I doubt the limitation is so hard today (because 8 , it’s really hard :) ) but even with 16 or 32 , if you don’t care about it you will produce a tons of useless code without knowing it, a tons of complexity and it will impact your feeling with your demo – it usually happens when it’s slow and you really don’t know why cause the code you looks simple –

The faster way (for the machine) to write the code would be something like that

vec4 result = vec4(1.0);
result += vec4(2.0);

result += vec4(16.0);

with this approach, at the end, you keep only one vec4 => “result”

The GLSL make the code feel like CPU code but it’s not at all the same way to work actually, it’s just an easier way to write assembly-code without writing assembly-code by hand.

I think it’s hard to not take it in consideration once you know it.

It was my two cents :)

Thomas Le Coz ( 11/04/2019 at 1:36 am )

    I appreciate your two cents, and the fact that there is people like you taking care of these issues. Indeed, everything on this blog is released under a Creative Commons license, so feel free to modify it at your own taste and preference.
    I hope you will still find the main content of this blog interesting.

    me ( 11/04/2019 at 8:05 am )

(after re-read, my example with the vec4 is not perfect – not complex enough to blow your mind if you try to figure out how to solve it :) – but I hope you get the idea )

Thomas Le Coz ( 11/04/2019 at 1:47 am )

Please Leave a Reply

Current ye@r *

TrackBack URL :