Two months ago was introduced the G80 and the set of OpenGL extensions allowing access to all of its new functionalities (cf. my posts on G80 architecture and OpenGL extensions, it is in French but can be translated with the British flag button on top left. You can also refer to http://www.g-truc.net/ that also provides a good analysis on G80 OpenGL extensions in french). Among them are GPU Shaders model 4 and the new Geometry Shader they introduce.
For the first time, on-GPU dynamic generation of geometric primitives becomes possible. But till now, very few examples and tutorials on theses functionalities are available. The only tutorial I have been able to find is the one from Xie Yongming, it’s a good start but unfortunately its source code is unavailable.
This simple example will show how to use GLSL Geometry Shaders (EXT_geometry_shader4 and EXT_gpu_shader4) and the new integer texture formats (EXT_texture_integer) extensions to implement the Marching Cubes algorithm entirely on the GPU. The implementation I will present is probably not the most efficient but I think it is a good example, showing the usage of the new shader stage, new texture formats and binary operations within shader.
Generalities
Geometry Shader stage take place between Vertex Shader and Viewport Clipping stage. It operates on assembled primitives transformed by Vertex Shader (Points, Lines or Triangles) and generates -per primitive- a set of new primitives (whose types are currently limited to Points, Line Strip and Triangle Strip) that are assembled in another assembly stage before being sent to Clipping stage and rasterization. All input primitive vertices can be accessed by the program and an arbitrary number of output primitives can be generated, it can also generate no primitive. New primitives types are provided allowing to pass primitives adjacencies informations to the shader.
The first and immediate application of this stage is for implicit surface determination from scalar field and I will show a first approach to implement Marching Cubes algorithm entirely on a Geometry Shader. I am currently making performances benchmarks and optimizations but it already seems to be quite efficient compared to classical CPU implementations.
This implementation is based on Paul Bourke Marching Cubes implementation that can be found at http://local.wasp.uwa.edu.au/~pbourke/geometry/polygonise/ . This code use two tables helping to find quickly edges where vertices should be created and the triangulation of theses vertices for a marching cube grid element.
The idea of the algorithm is to send points primitives to the Geometry Shader, each point representing one Marching Cube grid cell. The Geometry Shader Operate on each point and generate for a point the set of triangles needed to cut the cell with isosurface. 3D scalar field values are fetched from a 16 bits floating point 3D texture and the two accelerations tabs are fetched with integer access to 2D integer textures.
Till the next revision of Cg (certainly 1.6 which is the revision reported by driver’s GLSL compiler), GLSL is the only high level language to allow programming the Geometry Shader Stage. This tutorial will highlight interesting code portions and the full source code is available in the next section.
Source Code and Demo
The following zip file contain all the C++ source code of this tutorial, a visual studio 8 project and a Windows executable (to use only with a GeForce 8800 or with G80 emulation enabled). It uses GLUT (for Windows: http://www.xmission.com/~nate/glut.html ) for window management and the last version of the GLEW extension loader library (http://glew.sourceforge.net/ ).
System Setup
With current nVidia drivers under Windows (97.44, I didn’t try under Linux), G80 new OpenGL extensions and GLSL compiler new functionalities are not exposed by default. They must be enabled via the NVIDIA OpenGL Emulation Tools by choosing “G80 (GeForce 8800 GTS, Quadro Fx)” into the “GLSL Compiler Device Support” combo box. This tool can also allow to emulate G80 functionalities on older hardwares.
OpenGL initialization
The first step is to create the GLSL Shader program and to load Vertex, Geometry and Fragment Shaders codes from files. The code is not detailed here as it is classical GLSL loading procedure (you will find it in the source code archive). You can notice that a Vertex Shader must be used to be able to use Geometry Shader.
//Program object creation programObject = glCreateProgramObjectARB(); ////Shaders loading//// //Geometry Shader loading initShader("Shaders/TestG80_GS.glsl", GL_GEOMETRY_SHADER_EXT); //Geometry Shader require a Vertex Shader to be used initShader("Shaders/TestG80_VS.glsl", GL_VERTEX_SHADER_ARB); //Fragment Shader for per-fragment lighting initShader("Shaders/TestG80_FS.glsl", GL_FRAGMENT_SHADER_ARB); ////////
Then the Geometry Shader must be setup with input and output primitives types and maximum number of output vertices. This parameter is very important as high values widely impact performances.
//Get max number of geometry shader output vertices GLint temp; glGetIntegerv(GL_MAX_GEOMETRY_OUTPUT_VERTICES_EXT,&temp); std::cout<<"Max GS output vertices:"<<temp<<"\n"; ////Setup Geometry Shader//// //Set POINTS primitives as INPUT glProgramParameteriEXT(programObject,GL_GEOMETRY_INPUT_TYPE_EXT , GL_POINTS ); //Set TRIANGLE STRIP as OUTPUT glProgramParameteriEXT(programObject,GL_GEOMETRY_OUTPUT_TYPE_EXT , GL_TRIANGLE_STRIP); //Set maximum number of vertices to be generated by Geometry Shader to 16 //16 is the maximum number of vertices a marching cube configuration can own //This parameter is very important and have an important impact on Shader performances //Its value must be chosen closer as possible to real maximum number of vertices glProgramParameteriEXT(programObject,GL_GEOMETRY_VERTICES_OUT_EXT, 16);
Then the program is linked and validated. After that, textures are generated to store optimization tables that will be used per marching cube (and so per geometry shader) for triangle generation. We will use for that a new integer texture format introduced by the GL_EXT_texture_integer extension. Theses textures allow direct integer fetch within shaders and can be accessed directly with integer address via new texture fetch commands.
//Edge Table texture// //This texture store the 256 different configurations of a marching cube. //This is a table accessed with a bitfield of the 8 cube edges states //(edge cut by isosurface or totally in or out). //(cf. MarchingCubes.cpp) glGenTextures(1, &(this->edgeTableTex)); glActiveTexture(GL_TEXTURE1); glEnable(GL_TEXTURE_2D); glBindTexture(GL_TEXTURE_2D, this->edgeTableTex); //Integer textures must use nearest filtering mode glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE); //We create an integer texture with new GL_EXT_texture_integer formats glTexImage2D( GL_TEXTURE_2D, 0, GL_ALPHA16I_EXT, 256, 1, 0, GL_ALPHA_INTEGER_EXT, GL_INT, &edgeTable); //Triangle Table texture// //This texture store the vertex index list for //generating the triangles of each configurations. //(cf. MarchingCubes.cpp) glGenTextures(1, &(this->triTableTex)); glActiveTexture(GL_TEXTURE2); glEnable(GL_TEXTURE_2D); glBindTexture(GL_TEXTURE_2D, this->triTableTex); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE); glTexImage2D( GL_TEXTURE_2D, 0, GL_ALPHA16I_EXT, 16, 256, 0, GL_ALPHA_INTEGER_EXT, GL_INT, &triTable);
Volume data are simply a distance field generated manually for this example but could be any scalar field.
//Datafield// //Store the volume data to polygonise glGenTextures(1, &(this->dataFieldTex)); glActiveTexture(GL_TEXTURE0); glEnable(GL_TEXTURE_3D); glBindTexture(GL_TEXTURE_3D, this->dataFieldTex); glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE); //Generate a distance field to the center of the cube dataField=new float[128*128*128]; for(int k=0; k<128; k++) for(int j=0; j<128; j++) for(int i=0; i<128; i++){ dataField[i+j*128+k*128*128]=Vector3f(i, j, k).distance(Vector3f(64,64,64))/64.0f; } glTexImage3D( GL_TEXTURE_3D, 0, GL_ALPHA32F_ARB, 128, 128, 128, 0, GL_ALPHA, GL_FLOAT, dataField); delete [] dataField; dataField=NULL;
The final initialization step is to assign uniforms attributes to the shaders. We precompute the x, y and z decal associated with each vertex of a marching cube allowing to get their position in the Geometry Shader.
////Samplers assignment/// glUniform1iARB(glGetUniformLocationARB(programObject, "dataFieldTex"), 0); glUniform1iARB(glGetUniformLocationARB(programObject, "edgeTableTex"), 1); glUniform1iARB(glGetUniformLocationARB(programObject, "triTableTex"), 2); ////Uniforms parameters//// //Initial isolevel glUniform1fARB(glGetUniformLocationARB(programObject, "isolevel"), isolevel); //Step in data 3D texture for gradient computation (lighting) glUniform3fARB(glGetUniformLocationARB(programObject, "dataStep"), cubeStep.x, cubeStep.y, cubeStep.z); //Decal for each vertex in a marching cube glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[0]"), 0.0f, 0.0f, 0.0f); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[1]"), cubeStep.x, 0.0f, 0.0f); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[2]"), cubeStep.x, cubeStep.y, 0.0f); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[3]"), 0.0f, cubeStep.y, 0.0f); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[4]"), 0.0f, 0.0f, cubeStep.z); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[5]"), cubeStep.x, 0.0f, cubeStep.z); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[6]"), cubeStep.x, cubeStep.y, cubeStep.z); glUniform3fARB(glGetUniformLocationARB(programObject, "vertDecals[7]"), 0.0f, cubeStep.y, cubeStep.z);
OpenGL Rendering loop
The rendering loop only send one vertex per marching cube of the marching cubes grid as points. Theses points are passed through and kept “as is” by the Vertex Shader. Then the Geometry Shader is called for each point and generate the triangles for the the marching cube grid element associated with this point.
Theses triangles are then rasterized and a Fragment Shader lit them.
Geometry Shader GLSL Code
//GLSL version 1.20 #version 120 //New G80 extensions #extension GL_EXT_geometry_shader4 : enable #extension GL_EXT_gpu_shader4 : enable //Volume data field texture uniform sampler3D dataFieldTex; //Edge table texture uniform isampler2D edgeTableTex; //Triangles table texture uniform isampler2D triTableTex; //Global iso level uniform float isolevel; //Marching cubes vertices decal uniform vec3 vertDecals[8]; //Vertices position for fragment shader varying vec4 position; //Get vertex i position within current marching cube vec3 cubePos(int i){ return gl_PositionIn[0].xyz + vertDecals[i]; } //Get vertex i value within current marching cube float cubeVal(int i){ return texture3D(dataFieldTex, (cubePos(i)+1.0f)/2.0f).a; } //Get triangle table value int triTableValue(int i, int j){ return texelFetch2D(triTableTex, ivec2(j, i), 0).a; } //Compute interpolated vertex along an edge vec3 vertexInterp(float isolevel, vec3 v0, float l0, vec3 v1, float l1){ return mix(v0, v1, (isolevel-l0)/(l1-l0)); } //Geometry Shader entry point void main(void) { int cubeindex=0; float cubeVal0 = cubeVal(0); float cubeVal1 = cubeVal(1); float cubeVal2 = cubeVal(2); float cubeVal3 = cubeVal(3); float cubeVal4 = cubeVal(4); float cubeVal5 = cubeVal(5); float cubeVal6 = cubeVal(6); float cubeVal7 = cubeVal(7); //Determine the index into the edge table which //tells us which vertices are inside of the surface cubeindex = int(cubeVal0 < isolevel); cubeindex += int(cubeVal1 < isolevel)*2; cubeindex += int(cubeVal2 < isolevel)*4; cubeindex += int(cubeVal3 < isolevel)*8; cubeindex += int(cubeVal4 < isolevel)*16; cubeindex += int(cubeVal5 < isolevel)*32; cubeindex += int(cubeVal6 < isolevel)*64; cubeindex += int(cubeVal7 < isolevel)*128; //Cube is entirely in/out of the surface if (cubeindex ==0 || cubeindex == 255) return; vec3 vertlist[12]; //Find the vertices where the surface intersects the cube vertlist[0] = vertexInterp(isolevel, cubePos(0), cubeVal0, cubePos(1), cubeVal1); vertlist[1] = vertexInterp(isolevel, cubePos(1), cubeVal1, cubePos(2), cubeVal2); vertlist[2] = vertexInterp(isolevel, cubePos(2), cubeVal2, cubePos(3), cubeVal3); vertlist[3] = vertexInterp(isolevel, cubePos(3), cubeVal3, cubePos(0), cubeVal0); vertlist[4] = vertexInterp(isolevel, cubePos(4), cubeVal4, cubePos(5), cubeVal5); vertlist[5] = vertexInterp(isolevel, cubePos(5), cubeVal5, cubePos(6), cubeVal6); vertlist[6] = vertexInterp(isolevel, cubePos(6), cubeVal6, cubePos(7), cubeVal7); vertlist[7] = vertexInterp(isolevel, cubePos(7), cubeVal7, cubePos(4), cubeVal4); vertlist[8] = vertexInterp(isolevel, cubePos(0), cubeVal0, cubePos(4), cubeVal4); vertlist[9] = vertexInterp(isolevel, cubePos(1), cubeVal1, cubePos(5), cubeVal5); vertlist[10] = vertexInterp(isolevel, cubePos(2), cubeVal2, cubePos(6), cubeVal6); vertlist[11] = vertexInterp(isolevel, cubePos(3), cubeVal3, cubePos(7), cubeVal7); // Create the triangle gl_FrontColor=vec4(cos(isolevel*5.0-0.5), sin(isolevel*5.0-0.5), 0.5, 1.0); int i=0; //Strange bug with this way, uncomment to test //for (i=0; triTableValue(cubeindex, i)!=-1; i+=3) { while(true){ if(triTableValue(cubeindex, i)!=-1){ //Generate first vertex of triangle// //Fill position varying attribute for fragment shader position= vec4(vertlist[triTableValue(cubeindex, i)], 1); //Fill gl_Position attribute for vertex raster space position gl_Position = gl_ModelViewProjectionMatrix* position; EmitVertex(); //Generate second vertex of triangle// //Fill position varying attribute for fragment shader position= vec4(vertlist[triTableValue(cubeindex, i+1)], 1); //Fill gl_Position attribute for vertex raster space position gl_Position = gl_ModelViewProjectionMatrix* position; EmitVertex(); //Generate last vertex of triangle// //Fill position varying attribute for fragment shader position= vec4(vertlist[triTableValue(cubeindex, i+2)], 1); //Fill gl_Position attribute for vertex raster space position gl_Position = gl_ModelViewProjectionMatrix* position; EmitVertex(); //End triangle strip at firts triangle EndPrimitive(); }else{ break; } i=i+3; //Comment it for testing the strange bug } }
Performances
As I said in introduction, the code I have posted here is not optimized at all and is for example purpose only. The archive file I provide contain a more optimized version of this source code enhanced with the help of Jan Vlietinck (you can visit his webpage: http://users.belgacom.net/gc610902/ ) I would like to thanks a lot for it’s help. With this new code, we can achieve very interesting performances on iso-surface extraction.
For instance, framerates from 220 to 500FPS (depending on the number of triangles generated) are achieves on 128^3 data with 32^3 marching cubes grid on a GeForce 8800GTS. On a 64^3 marching cubes grid, I obtained 44 to 77 FPS on the same data and configuration. To compare, I have implemented a simple software marching cubes that can achieves approximatively 53FPS (32^3) and 7FPS (34^3) on the same tests on my Athlon 64 3400+.
But this code is not optimized at all and I don’t think it can be used as a realistic reference of software codes performances, it just help to get an idea. At this subject if anybody have a good and highly optimized software marching cubes implementation I would be interested…
More details on theses optimizations and last version of the code to come soon…
Contacts and feedbacks
Any commentaries, feedbacks and questions on this code and tutorial are welcome and can be done sending me an E-MAIL (cyril AT icare3d.org) or letting a commentary bellow. I hope this mini-tutorial will be useful.
References
– NVidia G80 Extensions reference: http://developer.nvidia.com/object/nvidia_opengl_specs.html
– Xie Yongming tutorial: http://appsrv.cse.cuhk.edu.hk/~ymxie/Geometry_Shader/
– Paul Bourke Marching Cubes Tutorial: http://local.wasp.uwa.edu.au/~pbourke/geometry/polygonise/
– Original Lorensen and Cline Article for SIGGRAPH 87: http://www.cs.duke.edu/education/courses/fall01/cps124/resources/p163-lorensen.pdf
January 7th, 2007 on 4:32 pm
Speedup Great work!
With some optimization of the glsl it got about 50% faster.
Most of the speedup comes from not testing for edge intersections and always calculating all of them. Saves a lot of conditional branching:
Modified as:
/**** Geometry Shader Marching Cubes
* Copyright Cyril Crassin, Junuary 2007.
* This code is partially based on the example of
* Paul Bourke \”Polygonising a scalar field\” located at :
* http://local.wasp.uwa.edu.au/~pbourke/geometry/polygonise/
****/
//GLSL version 1.20
#version 120
//New G80 extensions
#extension GL_EXT_geometry_shader4 : enable
#extension GL_EXT_gpu_shader4 : enable
//Volume data field texture
uniform sampler3D dataFieldTex;
//Edge table texture
uniform isampler2D edgeTableTex;
//Triangles table texture
uniform isampler2D triTableTex;
//Global iso level
uniform float isolevel;
//Marching cubes vertices decal
uniform vec3 vertDecals[8];
//Vertices position for fragment shader
varying vec4 position;
//Get vertex i position within current marching cube
vec3 cubePos(int i){
return gl_PositionIn[0].xyz + vertDecals[i];
}
//Get vertex i value within current marching cube
float cubeVal(int i){
return texture3D(dataFieldTex, (cubePos(i)+1.0f)/2.0f).a;
}
//Get triangle table value
int triTableValue(int i, int j){
return texelFetch2D(triTableTex, ivec2(j, i), 0).a;
}
//Compute interpolated vertex along an edge
vec3 vertexInterp(float isolevel, vec3 v0, float l0, vec3 v1, float l1){
return mix(v0, v1, (isolevel-l0)/(l1-l0));
}
//Geometry Shader entry point
void main(void) {
int cubeindex=0;
float cubeVal0 = cubeVal(0);
float cubeVal1 = cubeVal(1);
float cubeVal2 = cubeVal(2);
float cubeVal3 = cubeVal(3);
float cubeVal4 = cubeVal(4);
float cubeVal5 = cubeVal(5);
float cubeVal6 = cubeVal(6);
float cubeVal7 = cubeVal(7);
//Determine the index into the edge table which
//tells us which vertices are inside of the surface
cubeindex = int(cubeVal0
January 7th, 2007 on 12:25 pm
RE: Tiny mistake OK thank you I correct that. I didn\’t know if it was compliant but I suppose it was without verifying in the spec :roll
(Your post didn’t appear immediately because I must validate it before publishing, I haven’t done that for some sort of censure, just because regularly my site was spamed by robots via theses comments)
January 7th, 2007 on 11:02 am
There is a little mistake that isn\’t compliant to GLSL spec.
float cubeVal(int i){
return texture3D(dataFieldTex, (cubePos(i)+1.0f)/2.0f).a;
}
It should be:
float cubeVal(int i){
return texture3D(dataFieldTex, (cubePos(i)+1.0)/2.0).a;
}
The \’1.0f\’ stuff is the inheritance of the nVidia Cg compiler used behind the GLSL compiler but isn\’t not valid.
If you keep that way, this program will never run on ATI R600…
[B]null[/B]
January 7th, 2007 on 11:59 am
Tiny mistake There is a little mistake that isn\’t compliant to GLSL spec.
float cubeVal(int i){
return texture3D(dataFieldTex, (cubePos(i)+1.0f)/2.0f).a;
}
It should be:
float cubeVal(int i){
return texture3D(dataFieldTex, (cubePos(i)+1.0)/2.0).a;
}
The \’1.0f\’ stuff is the inheritance of the nVidia Cg compiler used behind the GLSL compiler but isn\’t not valid.
If you keep that way, this program will never run on ATI R600…
January 6th, 2007 on 10:27 pm
does it run on 6600 GT? hello
i downloaded the demo, launched nvemulate but the program just crashed 🙁
is it supposed to work on a 6600GT + emulation ?
January 6th, 2007 on 10:42 pm
RE: does it run on 6600 GT? Hello
Yes I think it should work on 6600GT with G80 feature set emulation level set in nvemulate but I didn\’t try myself an older card than 8800. Did the console indicate something when crash ? You may have to set \”Force software rasterization\” in nvemulate.
January 6th, 2007 on 10:48 pm
RE: does it run on 6600 GT? Compilation error: (1) : error C0201: unsupported version 120
(18) : warning C7502: OpenGL does not allow type suffix \’f\’ on constant literals
\”Force software rasterization\” did not help
installed forceware driver version is 93.71
January 7th, 2007 on 2:05 am
RE: does it run on 6600 GT? OK I think you should have to install 97.44 drivers (that are officially reserved to G80) modified with the 2 files contained in the zip file I have placed at [URL=http://www.icare3d.org/files/nv4_disp_97.44_G70MOD.zip ]http://www.icare3d.org/files/nv4_disp_97.44_G70MOD.zip [/URL] to be able to install the driver on older card.
To do so, you must run 97.44 installation program a first time to extract the driver files and stop after the extraction. Then you must replace the 2 files I have provided in the extraction folder and then relaunch installation from the setup within extraction folder. I think after that emulation will work, I have tried myself with 7600GT.
January 6th, 2007 on 5:58 pm
pareil, félicitation pour ton boulot !
enfin un p\’tit gars de l\’UTBM qui sort du lot 😉 c\’était pas si mal comme école en fin de compte! j\’en suis sorti en 2002, et les projets étaient vraiment moins intéressants que ceux d\’aujourd\’hui apparement!
bien joué !
Greg
January 6th, 2007 on 6:33 pm
Merci JegX et Greg 🙂
Pour l\’UTBM je penses qu\’il y a qd même un ptit paquet d\’anciens étudiants qui s\’en sortent pas trop mal !
Pour les projets UTBM, je ne sais pas si ils sont plus intéressant maintenant. Il est possible que j\’ai dû en détourner un peu quelques uns pour les rendre plus intéressants 😉
Tu avais fait quoi toi comme filière ? Et tu fais quoi maintenant ?
January 5th, 2007 on 10:11 pm
Très très bon job, ca fait vraiment plaisir de rencontrer un amateur aussi talentueux. j\’ai fait un petit tour de tous tes projets c\’est très intéressant, je prends note pour l\’avenir ;).
PS: je n\’ai pas trouvé ton email alors j\’écris ici.
January 6th, 2007 on 6:30 pm
Merci ! Ca fait un petit moment maintenant que je fais joujou avec les GPU 😉
Pour mon adresse mail c\’est cyril AT icare3d.org.
Vous travaillez dans le domaine j\’imagine ?
January 6th, 2007 on 10:45 am
Nice Work! Beau travail Cyril. Je n\’ai pas encore eu le temps de mettre à la prog sm4/gf8 mais tu vas me faire gagner un peu de temps ce qui dans le domaine de la 3D temps réel est plus que précieux 😉
JeGX – http://www.oZone3D.Net
January 7th, 2007 on 9:08 pm
RE: Speedup Thank You very much for this improvement !
I didn\’t write the code for speed and I was precisely looking for this kind of improvement. Without verifying generated ASM I hoped that GLSL compiler would correctly re-used fetched textures values for example but it seems not to be the case. Conditional computation of interpolated values was also not a good idea it seems :roll
On my side I also got approximatively 12% of performance improvement reordering the point primitives with a Swizzled walk to improve spatial coherency, instead of the linear walk I used previously.
I have integrated your modifications into my program, and if you don\’t disagree I will post a modified version of all sources.
Thank you for your help 🙂
January 19th, 2007 on 11:43 am
RE: tri strip vs list? Hello.
In fact Geometry Shader currently only support point, line strip or triangle strip output. Others output primitives may be added in futures extensions but I think it may have no performances issue with theses primitives if NVidia has chosen to provided them. It is certenly a fast path 🙂
January 20th, 2007 on 6:49 pm
🙂 🙂
January 10th, 2007 on 9:49 pm
Pas mal intéressant! Pas mal cool démo. 😉
Si ça te tente, on recherche activement des développeurs qui trippent sur le 3D temps réel pour nos projets internes et pour ceux de nos clients (ex: Epic Games, Google et Sony Computer Entertainment). Ça pourrait te donner l\’occasion de visiter Montréal et d\’être bien payé à travailler sur des projets cools en développement 3D.
http://www.feelingsoftware.com/content/view/46/68/lang,en/
Christian Laforte
Président, Feeling Software
January 19th, 2007 on 10:55 am
tri strip vs list? just a note, it\’s a bit confusing that you\’re setting the output to tri.strips but outputting individual triangles 3 vertices at a time – guess there\’s another possible optimization? 🙂
anyway thanks for the example, I\’ll have to play around with this more.
January 8th, 2007 on 10:29 am
Salut Cyril,
en fait j\’ai fait la filière image, ça devait cependant être la 1ère ou 2ème année que cette filière existait. ceci explique certainement le contenu des UVs à l\’époque. J\’avais aussi utilisé les TX pour faire ce qui me semblait important.
J\’ai fait mon ST50 à Lyon, dans un studio de jeu video. puis j\’ai bosser 3 ans en Suisse, dans une boite qui travaillait dans le domaine de la simulation et GIS (genre Google Earth); elle a fait faillite en octobre, en ce moment je bosse à mon compte, toujours dans le domaine 3D. donc si un jour j\’ai besoin d\’embaucher (espérons ;), je penserai à toi 😉
January 8th, 2007 on 9:50 pm
97.44 hack for 6600GT hacking the driver setup worked but as expected, i got 0 fps 🙂
thx
January 8th, 2007 on 9:38 am
Re Speedup Indeed the compiler seems to be not as smart as it could be.
I looked into the ASM to find the right optimizations. For example, the if () assignments are compiled into IF/ENDIF, where they could be done with conditional masking…
Some further ideas for speedup:
Instead of calculating single cubes, calculating 2x2x2 cubes at once may be faster.
This would need only 27 3D texture samples instead of 64, and 54 edge interpolations versus 96.
A second volume texture of half resolution could be used to store the minimum and maximum of 2x2x2 blocks. This could be used to quickly discard non intersecting 2x2x2 blocks.
Jan Vlietinck
March 6th, 2007 on 9:40 am
RE: Nice source Hi,
I didn\’t know that presentation before. The general idea seems to be the same: implementing marching cubes with geometry shaders using index tables. But, without talking of DirectX, they don\’t seems to implement it exactly in the same fashion. As my implementation is mostly single pass and texture based, they seems to access Marching Grid vertices through the binding of a Vertex Buffer generated via a first pass of vertex shader producing a stream out (equivalent to OpenGL transform feedback). It may be interesting to compare performances of the two implementations on static data.
My goal was only to show an example of OpenGL use of Geometry Shader and this code was written very quickly only for demonstration purpose. It was then improved and now it seems to provide quite good performances 🙂
And OpenGL is a lot better than DirectX 10, only the fact you don\’t need Windows Vista… 😉
March 6th, 2007 on 9:15 am
Nice source Hi, thanks for your great article.But from your reference it seems that you where the first to implement it.
http://developer.download.nvidia.com/presentations/2006/siggraph/dx10-effects-siggraph-06.pdf
Is this presentation related to yours?
March 6th, 2007 on 10:28 pm
Nice source II Hi,
Thanks for your replay.Maybe you can come up with other methods too like Bspine registration,Rigid registration.
This code is realy nice, the problem is that matching cubes speed is not so critical in my opinion.You compute it only once in the beginning and use the polygon results later for rendering.
March 6th, 2007 on 10:31 pm
Comparison with other GPU implementation Hi,
http://www.humus.ca/index.php?page=3D&&start=8
I\’m curious how fast is compared to this one in Metaballs
February 12th, 2007 on 11:58 pm
Crash ASUS GF 8800GTX:
GL_ARB_color_buffer_float GL_ARB_depth_texture GL_ARB_draw_buffers GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_half_float_pixel GL_ARB_imaging GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_shadow GL_ARB_shader_objects GL_ARB_shading_language_100 GL_ARB_texture_border_clamp GL_ARB_texture_compression GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_dot3 GL_ARB_texture_float GL_ARB_texture_mirrored_repeat GL_ARB_texture_non_power_of_two GL_ARB_texture_rectangle GL_ARB_transpose_matrix GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_framebuffer_object GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_sRGB GL_EXT_gpu_program_parameters GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_specular_color GL_EXT_shadow_funcs GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_sRGB GL_EXT_texture_shared_exponent GL_EXT_timer_query GL_EXT_vertex_array GL_HP_occlusion_test GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KTX_buffer_region GL_NV_blend_square GL_NV_copy_depth_to_color GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_fence GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_framebuffer_multisample_ex GL_NV_gpu_program4 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_pixel_data_range GL_NV_point_sprite GL_NV_primitive_restart GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_texgen_reflection GL_NV_texture_compression_latc GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_expand_normal GL_NV_texture_rectangle GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_vertex_array_range GL_NV_vertex_array_range2 GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NVX_conditional_render GL_OES_conditional_query GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum GL_WIN_swap_hint WGL_EXT_swap_control
Compilation error: 8╞>
InitShader: Shaders/TestG80_GS2.glsl Errors:invalid operation
InitShader: Shaders/TestG80_VS.glsl Errors:no error
InitShader: Shaders/TestG80_FS.glsl Errors:no error
Max GS output vertices:20019360
May 11th, 2007 on 9:59 pm
What about 8600 / 8500 ? I own a 8600gt and drivers are 15x.xx windows XP.
Software mode seems to be a little faster (70 compared to 50 fps). Does this mean the shaders arent hardware-accelerated though I have a PS4.0 compatible G-Card?
June 22nd, 2007 on 4:29 pm
compatibilite Bonjour,
Il n\’y a pas moyen d\’utiliser des GL ARB plutot que des GL EXT pour rendre l\’algorithme plus compatible avec ttes les cartes?
Je suppose que c\’est ceci qui empeche le programme de tourner avec une carte ATI par exemple.
Merci
June 25th, 2007 on 12:02 pm
RE: compatibilite Bonjour,
Tout dépend de quelles extensions vous parlez mais les extensions EXT liées au G80 devaient etre également adoptées par ATI pour le R600. Ce qui empéche les cartes ATI de faire tourner le programme actuellement c\’est le fait que les drivers ATI ne supportent pas encore les extensions exposant les fonctionnalités DX10 (genre geometry shader).
June 25th, 2007 on 5:39 pm
RE: compatibilite Bonjour,
Je me demandais juste si il n\’y avait pas moyen de faire cet algo en utilisant des extensions generiques qui marcheraient aussi bien sur Nvidia que sur ATI. Comme apparement ATI est assez lent a mettre a jour ses drivers.
June 25th, 2007 on 6:15 pm
RE: compatibilite En fait cet algo est entièrement conçu autour des geometry shaders et une extension les supportrant est donc indispensable. Après il existe des techniques pour réaliser de l\’extraction de surface GPU sans geometry shader (dans un fragment shader facon GPGPU, je n\’ai pas de ref en tête mais ca se trouve).
June 26th, 2007 on 10:19 am
RE: compatibilite ok merci, je vais regarder mais je n\’avais trouve que cet algo ci qd j\’avais cherche apres.
July 11th, 2008 on 7:08 pm
:upset :sigh :p 😉 :eek 🙂 :roll 😕 :cry 🙁 :zzz 😡
April 23rd, 2010 on 7:17 pm
Please port it to gnu/linux I did the port quiet easly …
can it work on ATI / Intel cards ?
—
rzr.online.fr/q/gl
June 1st, 2010 on 9:04 pm
luigia Hello everybody,
I\’m a computer scientist and I\’m working with a particle based fluid simulator. I have a problem: due to the fact that the fluid is particle based I can see only particles. I\’m looking for an algorithm able to build isosurfaces corresponding to my particles. I think that your code can do it but I need your help: do you think that I should save in a file my simulation data (i.e. position and velocity for each particle) and then apply your code to this file or is it possible to integrate your code in my application? thank you
September 1st, 2010 on 10:07 am
Hello. Very interesting site and you lea Hello. Very interesting site and you lead a very interesting discussion. There is a nice atmosphere here and I\’m sure I will often read your posts.
From time to time I will also try to write something interesting.
Gry platformowe Gry
July 11th, 2012 on 2:33 am
Hi,Cyril.The website of demo and source code looks like invalid now.Is there somewhere I can download it?
July 11th, 2012 on 3:03 am
Sorry I broke the link when migrating the website, it is fixed now 🙂