discuss@lists.openscad.org

OpenSCAD general discussion Mailing-list

View all threads

Multi-threaded render discussion #1

M
MichaelAtOz
Mon, Apr 17, 2017 12:38 AM

This is to save clogging up GitHub.

Re  Multithreaded CGAL geometry evaluation #1980
https://github.com/openscad/openscad/pull/1980  .
Snapshot  2017.04.05...multithread here
http://files.openscad.org/snapshots/  . You need to enable the feature in
preferences.

I have some spooky results to collate, it will take a little while.

In the mean time, I'd be interested in results of anyone rendering the
following CSG file.
random-threads-k=566519-n=4000-parts=64.csg
http://forum.openscad.org/file/n21249/random-threads-k%3D566519-n%3D4000-parts%3D64.csg
(generated by slightly different  code than I posted on GitHub)
(anywhere from <2 minutes to 5+ minutes to render depending on PC, I
wouldn't try on under-powered systems)

If you don't get an error, set the openscad.exe process CPU affinity to a
reduced number of CPUs ~50%/75% of cores/threads.


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

This is to save clogging up GitHub. Re Multithreaded CGAL geometry evaluation #1980 <https://github.com/openscad/openscad/pull/1980> . Snapshot 2017.04.05...multithread here <http://files.openscad.org/snapshots/> . You need to enable the feature in preferences. I have some spooky results to collate, it will take a little while. In the mean time, I'd be interested in results of anyone rendering the following CSG file. random-threads-k=566519-n=4000-parts=64.csg <http://forum.openscad.org/file/n21249/random-threads-k%3D566519-n%3D4000-parts%3D64.csg> (generated by slightly different code than I posted on GitHub) (anywhere from <2 minutes to 5+ minutes to render depending on PC, I wouldn't try on under-powered systems) If you don't get an error, set the openscad.exe process CPU affinity to a reduced number of CPUs ~50%/75% of cores/threads. ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249.html Sent from the OpenSCAD mailing list archive at Nabble.com.
M
MichaelAtOz
Mon, Apr 17, 2017 1:32 AM

tl;dr - concurrency issues (presumably) & weird scheduling behavior.

This is all on my new box;
SSD 16 GB 4 core hyperthread  (8 threads)  i7-3770 (3.5/3.9 turbo GHz)
Windows 7 64 bit (fresh install - no updates applied - still
un-authenticated)
Snapshot 2017.04.05/64
Microsoft  sysinternals
https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx
process-explorer monitoring, little else running.

I first came across the error using the random-thread code on my 2 core (non
hyper) system, then was able to get it repeating by using the same key seed.
Then as Marius pointed out, using the exported CSG would ensure the
repeatability.

So I whittled the threads down to the CSG file posted above. On my new box:
"Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores"

All results below are rendering that file, unless mentioned otherwise. All
renders were preceded by a Flush Caches.

Background.
The i7-3370 has 4 cores  hyperthreaded
https://en.wikipedia.org/wiki/Hyper-threading  , "For each processor core
that is physically present, the operating system addresses two virtual
(logical) cores and shares the workload between them when possible. The main
function of hyper-threading is to increase the number of independent
instructions in the pipeline; it takes advantage of superscalar
architecture, in which multiple instructions operate on separate data in
parallel. With HTT, one physical core appears as two processors to the
operating system, allowing concurrent scheduling of two processes per core.
In addition, two or more processes can use the same resources: if resources
for one process are not available, then another process can continue if its
resources are available."

So to Windows it looks like 8 CPUs 0-7, each pair (0-1,2-3,4-5,6-7) is one
core, with 1 physical processor and one 'logical' processor. "the logical
processors in a hyper-threaded core share the execution resources. These
resources include the execution engine, caches, and system bus interface;
the sharing of resources allows two logical processors to work with each
other more efficiently, and allows a logical processor to borrow resources
from a stalled logical core (assuming both logical cores are associated with
the same physical core). A processor stalls when it is waiting for data it
has sent for so it can finish processing the present thread."

So, in terms of CPU use, 1 thread at 100%, shows as 12.5% CPU usage, with
pipelining, well cached threads can be seen to use 12.5% on all 8 threads.

Note that the multi-threaded C code uses a non-yielding Spinlock, a tight
loop waiting for the lock to be freed. I speculate here, that if I wrote the
kernel and/or the compiler optimiser, I would detect a spin condition and
yield to other processes rather than chewing the CPU.

Windows allows the "processor affinity" to be set for a process, 8 check
boxes to select which threads are available to the process, all 8 by
default.

Results.
tl;dr - changing the affinity, causes a CGAL error in some configurations (I
speculate that this is due to a concurrency issue), and affects the thread
behavior in strange ways (I speculate that the spinlock may have some issue
with CPU cache architecture peculiarities)

End of part 1 - I need a break...


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21250.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

tl;dr - concurrency issues (presumably) & weird scheduling behavior. This is all on my new box; SSD 16 GB 4 core hyperthread (8 threads) i7-3770 (3.5/3.9 turbo GHz) Windows 7 64 bit (fresh install - no updates applied - still un-authenticated) Snapshot 2017.04.05/64 Microsoft sysinternals <https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx> process-explorer monitoring, little else running. I first came across the error using the random-thread code on my 2 core (non hyper) system, then was able to get it repeating by using the same key seed. Then as Marius pointed out, using the exported CSG would ensure the repeatability. So I whittled the threads down to the CSG file posted above. On my new box: "Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores" All results below are rendering that file, unless mentioned otherwise. All renders were preceded by a Flush Caches. Background. The i7-3370 has 4 cores hyperthreaded <https://en.wikipedia.org/wiki/Hyper-threading> , "For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture, in which multiple instructions operate on separate data in parallel. With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: if resources for one process are not available, then another process can continue if its resources are available." So to Windows it looks like 8 CPUs 0-7, each pair (0-1,2-3,4-5,6-7) is one core, with 1 physical processor and one 'logical' processor. "the logical processors in a hyper-threaded core share the execution resources. These resources include the execution engine, caches, and system bus interface; the sharing of resources allows two logical processors to work with each other more efficiently, and allows a logical processor to borrow resources from a stalled logical core (assuming both logical cores are associated with the same physical core). A processor stalls when it is waiting for data it has sent for so it can finish processing the present thread." So, in terms of CPU use, 1 thread at 100%, shows as 12.5% CPU usage, with pipelining, well cached threads can be seen to use 12.5% on all 8 threads. Note that the multi-threaded C code uses a non-yielding Spinlock, a tight loop waiting for the lock to be freed. I speculate here, that if I wrote the kernel and/or the compiler optimiser, I would detect a spin condition and yield to other processes rather than chewing the CPU. Windows allows the "processor affinity" to be set for a process, 8 check boxes to select which threads are available to the process, all 8 by default. Results. tl;dr - changing the affinity, causes a CGAL error in some configurations (I speculate that this is due to a concurrency issue), and affects the thread behavior in strange ways (I speculate that the spinlock may have some issue with CPU cache architecture peculiarities) End of part 1 - I need a break... ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21250.html Sent from the OpenSCAD mailing list archive at Nabble.com.
MK
Marius Kintel
Mon, Apr 17, 2017 1:48 AM

On Apr 16, 2017, at 20:38, MichaelAtOz oz.at.michael@gmail.com wrote:

This is to save clogging up GitHub.

Re  Multithreaded CGAL geometry evaluation #1980
https://github.com/openscad/openscad/pull/1980  .
Snapshot  2017.04.05...multithread here
http://files.openscad.org/snapshots/  . You need to enable the feature in
preferences.

I have some spooky results to collate, it will take a little while.

In the mean time, I'd be interested in results of anyone rendering the
following CSG file.

$ time ./OpenSCAD.app/Contents/MacOS/OpenSCAD random-threads.csg --enable thread-traversal -o out.stl
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
real 14m40.628s
user 18m8.360s
sys 5m29.745s

This ran without errors, but only saturated 4 out of 8 cores.
Mac OS X doesn’t do per-process CPU affinity..

-Marius

> On Apr 16, 2017, at 20:38, MichaelAtOz <oz.at.michael@gmail.com> wrote: > > This is to save clogging up GitHub. > > Re Multithreaded CGAL geometry evaluation #1980 > <https://github.com/openscad/openscad/pull/1980> . > Snapshot 2017.04.05...multithread here > <http://files.openscad.org/snapshots/> . You need to enable the feature in > preferences. > > I have some spooky results to collate, it will take a little while. > > In the mean time, I'd be interested in results of anyone rendering the > following CSG file. $ time ./OpenSCAD.app/Contents/MacOS/OpenSCAD random-threads.csg --enable thread-traversal -o out.stl Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores real 14m40.628s user 18m8.360s sys 5m29.745s This ran without errors, but only saturated 4 out of 8 cores. Mac OS X doesn’t do per-process CPU affinity.. -Marius
M
MichaelAtOz
Mon, Apr 17, 2017 3:11 AM

Part 2.

Results. Cont.

I indicate the affinity on/off as 0/1 - threads 0-7, ie A=11111111 all CPUs
available.

I had a textual display of the treads and a performance graph visible.
This is representative example of multi-thread, the lines on the CPU graph
are around (manually sized) 1 thread (12.5%). I forget the configuration at
the time.
http://forum.openscad.org/file/n21253/cpu_mem.png


A baseline.

A=01010101 -  thread feature disabled (affinity doesn't matter)
One thread @ 12.x% most of the time.

Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 60
CGAL cache size in bytes: 104662192
Total rendering time: 0 hours, 4 minutes, 15 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    26334
Halfedges:  79002
Edges:      39501
Halffacets:  40766
Facets:      20383
Volumes:      3451
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a01010101-feature-off-no_error.png


All following results have multi-thread feature enabled.


Affinity=11110000
4 (busy) threads @~12.x%, to tail-off one thread @12.x%
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 60
CGAL cache size in bytes: 104662192
Total rendering time: 0 hours, 3 minutes, 17 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    26334
Halfedges:  79002
Edges:      39501
Halffacets:  40766
Facets:      20383
Volumes:      3451
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a11110000-no_error.png


Affinity=11111111
Used all threads, chunks @ mostly @~12.x%, chunks @~7-9%
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
ERROR: CGAL error in CGALUtils::applyBinaryOperator union: CGAL ERROR:
assertion violation! Expr: cet->get_index() == ce->twin()->get_index() File:
/opt/mxe/usr/x86_64-w64-mingw32.static/include/CGAL/Nef_3/SNC_external_structure.h
Line: 1169
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 63
CGAL cache size in bytes: 104536656
Total rendering time: 0 hours, 3 minutes, 10 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    25902
Halfedges:  77706
Edges:      38853
Halffacets:  40120
Facets:      20060
Volumes:      3399
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a11111111-error.png

Note the stats are different.

Affinity=10101010
It appeared to use all 4 CPUs watching the threads (~8 threads ~6%), but see
CPU graph
[perhaps I wasn't paying attention, but it could be that
misinterpretation...]
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 60
CGAL cache size in bytes: 104662192
Total rendering time: 0 hours, 3 minutes, 31 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    26334
Halfedges:  79002
Edges:      39501
Halffacets:  40766
Facets:      20383
Volumes:      3451
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a10101010-no_error.png


A=11010111
Used most of 6 CPUs at times
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
ERROR: CGAL error in CGALUtils::applyBinaryOperator union: CGAL ERROR:
assertion violation! Expr: e_below != SHalfedge_handle() File:
/opt/mxe/usr/x86_64-w64-mingw32.static/include/CGAL/Nef_3/SNC_FM_decorator.h
Line: 417
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 66
CGAL cache size in bytes: 35077472
Total rendering time: 0 hours, 1 minutes, 30 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:      570
Halfedges:    1710
Edges:        855
Halffacets:    610
Facets:        305
Volumes:        11
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a11010111-error.png

Note stats and missing objects in display. Different error.


A=00011000
Used 2 cpus
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 60
CGAL cache size in bytes: 104662192
Total rendering time: 0 hours, 3 minutes, 23 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    26334
Halfedges:  79002
Edges:      39501
Halffacets:  40766
Facets:      20383
Volumes:      3451
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a00011000-no_error.png

This is representative of the chunks where CPU was < 12.x%

http://forum.openscad.org/file/n21253/threads-a%3D00011000.png

Red is a finished thread (showing CPU at the time it finished) Green is a
new thread since last refresh. ie shows two sequential pairs of threads.


This is the weird scheduling one...

A=01010101
Only used 1 CPU
Compiling design (CSG Tree generation)...
Rendering Polygon Mesh using CGAL...
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
Geometries in cache: 15916
Geometry cache size in bytes: 11829776
CGAL Polyhedrons in cache: 60
CGAL cache size in bytes: 104662192
Total rendering time: 0 hours, 4 minutes, 2 seconds
Top level object is a 3D object:
Simple:        yes
Vertices:    26334
Halfedges:  79002
Edges:      39501
Halffacets:  40766
Facets:      20383
Volumes:      3451
Rendering finished.

http://forum.openscad.org/file/n21253/csg-a01010101-no_error.png

This is representative of the thread display

http://forum.openscad.org/file/n21253/threads-a%3D01010101.png

ie shows three sequential threads.


Can there be a non-concurrency reason the errors occur or not, purely based
on #threads available?

There were other multiple CGAL errors I'll post next, using different
geometry.

End of part 2.


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21253.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

Part 2. Results. Cont. I indicate the affinity on/off as 0/1 - threads 0-7, ie A=11111111 all CPUs available. I had a textual display of the treads and a performance graph visible. This is representative example of multi-thread, the lines on the CPU graph are around (manually sized) 1 thread (12.5%). I forget the configuration at the time. <http://forum.openscad.org/file/n21253/cpu_mem.png> ------------------------------------ A baseline. A=01010101 - thread feature disabled (affinity doesn't matter) One thread @ 12.x% most of the time. Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 60 CGAL cache size in bytes: 104662192 Total rendering time: 0 hours, 4 minutes, 15 seconds Top level object is a 3D object: Simple: yes Vertices: 26334 Halfedges: 79002 Edges: 39501 Halffacets: 40766 Facets: 20383 Volumes: 3451 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a01010101-feature-off-no_error.png> ------------------------------------------------- All following results have multi-thread feature enabled. ------------------------------------------------- Affinity=11110000 4 (busy) threads @~12.x%, to tail-off one thread @12.x% Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 60 CGAL cache size in bytes: 104662192 Total rendering time: 0 hours, 3 minutes, 17 seconds Top level object is a 3D object: Simple: yes Vertices: 26334 Halfedges: 79002 Edges: 39501 Halffacets: 40766 Facets: 20383 Volumes: 3451 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a11110000-no_error.png> --------------------------------------------------- Affinity=11111111 Used all threads, chunks @ mostly @~12.x%, chunks @~7-9% Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores ERROR: CGAL error in CGALUtils::applyBinaryOperator union: CGAL ERROR: assertion violation! Expr: cet->get_index() == ce->twin()->get_index() File: /opt/mxe/usr/x86_64-w64-mingw32.static/include/CGAL/Nef_3/SNC_external_structure.h Line: 1169 Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 63 CGAL cache size in bytes: 104536656 Total rendering time: 0 hours, 3 minutes, 10 seconds Top level object is a 3D object: Simple: yes Vertices: 25902 Halfedges: 77706 Edges: 38853 Halffacets: 40120 Facets: 20060 Volumes: 3399 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a11111111-error.png> Note the stats are different. -------------------------------------------------- Affinity=10101010 It appeared to use all 4 CPUs watching the threads (~8 threads ~6%), but see CPU graph [perhaps I wasn't paying attention, but it could be that misinterpretation...] Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 60 CGAL cache size in bytes: 104662192 Total rendering time: 0 hours, 3 minutes, 31 seconds Top level object is a 3D object: Simple: yes Vertices: 26334 Halfedges: 79002 Edges: 39501 Halffacets: 40766 Facets: 20383 Volumes: 3451 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a10101010-no_error.png> ---------------------------------------------------------- A=11010111 Used most of 6 CPUs at times Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores ERROR: CGAL error in CGALUtils::applyBinaryOperator union: CGAL ERROR: assertion violation! Expr: e_below != SHalfedge_handle() File: /opt/mxe/usr/x86_64-w64-mingw32.static/include/CGAL/Nef_3/SNC_FM_decorator.h Line: 417 Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 66 CGAL cache size in bytes: 35077472 Total rendering time: 0 hours, 1 minutes, 30 seconds Top level object is a 3D object: Simple: yes Vertices: 570 Halfedges: 1710 Edges: 855 Halffacets: 610 Facets: 305 Volumes: 11 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a11010111-error.png> Note stats and missing objects in display. Different error. ---------------------------------------------------------- A=00011000 Used 2 cpus Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 60 CGAL cache size in bytes: 104662192 Total rendering time: 0 hours, 3 minutes, 23 seconds Top level object is a 3D object: Simple: yes Vertices: 26334 Halfedges: 79002 Edges: 39501 Halffacets: 40766 Facets: 20383 Volumes: 3451 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a00011000-no_error.png> This is representative of the chunks where CPU was < 12.x% <http://forum.openscad.org/file/n21253/threads-a%3D00011000.png> Red is a finished thread (showing CPU at the time it finished) Green is a new thread since last refresh. ie shows two sequential pairs of threads. ----------------------------------------------------------- This is the weird scheduling one... A=01010101 Only used 1 CPU Compiling design (CSG Tree generation)... Rendering Polygon Mesh using CGAL... Threaded traversal phase 1: Generating 4003 leaf geometries Threaded traversal phase 2: Spawning 16075 threads on 8 cores Geometries in cache: 15916 Geometry cache size in bytes: 11829776 CGAL Polyhedrons in cache: 60 CGAL cache size in bytes: 104662192 Total rendering time: 0 hours, 4 minutes, 2 seconds Top level object is a 3D object: Simple: yes Vertices: 26334 Halfedges: 79002 Edges: 39501 Halffacets: 40766 Facets: 20383 Volumes: 3451 Rendering finished. <http://forum.openscad.org/file/n21253/csg-a01010101-no_error.png> This is representative of the thread display <http://forum.openscad.org/file/n21253/threads-a%3D01010101.png> ie shows three sequential threads. ---------------------------------------------------------- Can there be a non-concurrency reason the errors occur or not, purely based on #threads available? There were other multiple CGAL errors I'll post next, using different geometry. End of part 2. ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21253.html Sent from the OpenSCAD mailing list archive at Nabble.com.
M
MichaelAtOz
Mon, Apr 17, 2017 3:23 AM

Note all above was 3D objects.


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21254.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

Note all above was 3D objects. ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21254.html Sent from the OpenSCAD mailing list archive at Nabble.com.
M
MichaelAtOz
Mon, Apr 17, 2017 4:15 AM

kintel wrote

$ time ./OpenSCAD.app/Contents/MacOS/OpenSCAD random-threads.csg --enable
thread-traversal -o out.stl
Threaded traversal phase 1: Generating 4003 leaf geometries
Threaded traversal phase 2: Spawning 16075 threads on 8 cores
real 14m40.628s
user 18m8.360s
sys 5m29.745s

This ran without errors, but only saturated 4 out of 8 cores.
Mac OS X doesn’t do per-process CPU affinity..

-Marius

Hmmm...

a. IANALG (I am not a linux guru)
b. Does OS X have  this
http://man7.org/linux/man-pages/man8/numactl.8.html  ? Or perhaps kick off
two competing renders, the point of the exercise is to cause thread
contention, possiblty affecting order of execution. Tho I'm wondering if
spinlock is not as atomic as it should be given the CPU cache
architecture...

I have been thinking that there may be a need for a 'threads offset' option,
value +/-n where you can use fewer or over-commit threads used. Fewer if you
want to reserve capacity for other things, like funny kitten videos, while
it renders in the background, over-commit - as you see above there are
periods where CPU is free, but no threads available to use it. Experience
will see what degree of lock contention occurs when over-committing...


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21255.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

kintel wrote > $ time ./OpenSCAD.app/Contents/MacOS/OpenSCAD random-threads.csg --enable > thread-traversal -o out.stl > Threaded traversal phase 1: Generating 4003 leaf geometries > Threaded traversal phase 2: Spawning 16075 threads on 8 cores > real 14m40.628s > user 18m8.360s > sys 5m29.745s > > This ran without errors, but only saturated 4 out of 8 cores. > Mac OS X doesn’t do per-process CPU affinity.. > > -Marius Hmmm... a. IANALG (I am not a linux guru) b. Does OS X have this <http://man7.org/linux/man-pages/man8/numactl.8.html> ? Or perhaps kick off two competing renders, the point of the exercise is to cause thread contention, possiblty affecting order of execution. Tho I'm wondering if spinlock is not as atomic as it should be given the CPU cache architecture... I have been thinking that there may be a need for a 'threads offset' option, value +/-n where you can use fewer or over-commit threads used. Fewer if you want to reserve capacity for other things, like funny kitten videos, while it renders in the background, over-commit - as you see above there are periods where CPU is free, but no threads available to use it. Experience will see what degree of lock contention occurs when over-committing... ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21255.html Sent from the OpenSCAD mailing list archive at Nabble.com.
M
MichaelAtOz
Mon, Apr 17, 2017 10:56 PM

kintel wrote

This ran without errors, but only saturated 4 out of 8 cores.
Mac OS X

I presume you have had occasions where >4 threads are saturated?
What model machine & CPU do you have & OS X version?
My detailed Apple knowledge is in the garage with my 1st gen Mac 128KB...

I'm wondering if OS X handles locks differently...


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21270.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

kintel wrote > This ran without errors, but only saturated 4 out of 8 cores. > Mac OS X I presume you have had occasions where >4 threads are saturated? What model machine & CPU do you have & OS X version? My detailed Apple knowledge is in the garage with my 1st gen Mac 128KB... I'm wondering if OS X handles locks differently... ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21270.html Sent from the OpenSCAD mailing list archive at Nabble.com.
MK
Marius Kintel
Tue, Apr 18, 2017 12:55 AM

On Apr 17, 2017, at 18:56, MichaelAtOz oz.at.michael@gmail.com wrote:

I presume you have had occasions where >4 threads are saturated?
What model machine & CPU do you have & OS X version?

This might just be a bi-effect of how this was implemented. I’ve got a quad-core i7 with hyperthreading, so a heavily CPU-bound process with spinlocks may very well grab one entire core. This is on OS X 10.10.5.

-Marius

> On Apr 17, 2017, at 18:56, MichaelAtOz <oz.at.michael@gmail.com> wrote: > > I presume you have had occasions where >4 threads are saturated? > What model machine & CPU do you have & OS X version? This might just be a bi-effect of how this was implemented. I’ve got a quad-core i7 with hyperthreading, so a heavily CPU-bound process with spinlocks may very well grab one entire core. This is on OS X 10.10.5. -Marius
M
MichaelAtOz
Tue, Apr 18, 2017 2:20 AM

It grabbed all 8 on Windows, frequently at ~12.5% each, hence both
hyperthreads at ~100%, must be good cache (CPU) interleaving (or sitting
there spinning the lock, wish there was a way to tell). Other times ~6-7%
max, not sure what's happening then, possibly both halves getting cache
misses. (that was 3D)

Anyway...

I know this is Shotgun-Testing® (lots of random stuff), but I had many goes
with 256 shotguns firing ~1.5 MegaThreads® (my new clothing line) of
slightly-non-trivial 2D geometries, without an error (detected). So I think
2D may be more thread safe. I suspect it is older & more mature?

However threads (8) were getting ~5% each for most of the peak shown here,
~50% total CPU.

http://forum.openscad.org/file/n21279/2D_1%2Bmillion_threads-dispatch_thead_%404%25cpu_for_most_of_the_peak.png

I'm about to do performance comparisons.


Admin - PM me if you need anything, or if I've done something stupid...

Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above.

The TPP is no simple “trade agreement.”  Fight it! http://www.ourfairdeal.org/  time is running out!

View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21279.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

It grabbed all 8 on Windows, frequently at ~12.5% each, hence both hyperthreads at ~100%, must be good cache (CPU) interleaving (or sitting there spinning the lock, wish there was a way to tell). Other times ~6-7% max, not sure what's happening then, possibly both halves getting cache misses. (that was 3D) Anyway... I know this is Shotgun-Testing® (lots of random stuff), but I had many goes with 256 shotguns firing ~1.5 MegaThreads® (my new clothing line) of slightly-non-trivial 2D geometries, without an error (detected). So I think 2D may be more thread safe. I suspect it is older & more mature? However threads (8) were getting ~5% each for most of the peak shown here, ~50% total CPU. <http://forum.openscad.org/file/n21279/2D_1%2Bmillion_threads-dispatch_thead_%404%25cpu_for_most_of_the_peak.png> I'm about to do performance comparisons. ----- Admin - PM me if you need anything, or if I've done something stupid... Unless specifically shown otherwise above, my contribution is in the Public Domain; to the extent possible under law, I have waived all copyright and related or neighbouring rights to this work. Obviously inclusion of works of previous authors is not included in the above. The TPP is no simple “trade agreement.” Fight it! http://www.ourfairdeal.org/ time is running out! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21279.html Sent from the OpenSCAD mailing list archive at Nabble.com.
C
codifies
Wed, Apr 19, 2017 9:34 AM

I tried your stress test in the first post of this thread, but I'm not sure
how to monkey with affinity under Linux - never really having the need,
anyhow it works okay but I did notice that the very last thread seems to
take a significant portion of the overall render time

I then tested it with a number of "sweeps" (a module that produces usually
180-360 hulls in a loop) when there are multiple sweeps >3 it keeps all the
cores busy, but again with the last thread taking its time solo...

with a single sweep only one core was ever active, I did try tinkering with
the loop ie group inside the loop and not using children() but couldn't seem
to interest it in using multiple cores

while this isn't a "catch all" solution it is certainly a great step forward
and even if it means changing coding style a little to make it easier to
portion up, its still worth using!

Is there an easy way in Linux to alter core affinity to test any issues
with the stress test ?

great work!

--
View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21297.html
Sent from the OpenSCAD mailing list archive at Nabble.com.

I tried your stress test in the first post of this thread, but I'm not sure how to monkey with affinity under Linux - never really having the need, anyhow it works okay but I did notice that the very last thread seems to take a significant portion of the overall render time I then tested it with a number of "sweeps" (a module that produces usually 180-360 hulls in a loop) when there are multiple sweeps >3 it keeps all the cores busy, but again with the last thread taking its time solo... with a single sweep only one core was ever active, I did try tinkering with the loop ie group inside the loop and not using children() but couldn't seem to interest it in using multiple cores while this isn't a "catch all" solution it is certainly a great step forward and even if it means changing coding style a little to make it easier to portion up, its still worth using! Is there an *easy* way in Linux to alter core affinity to test any issues with the stress test ? great work! -- View this message in context: http://forum.openscad.org/Multi-threaded-render-discussion-1-tp21249p21297.html Sent from the OpenSCAD mailing list archive at Nabble.com.