In November (2021) I wrote a couple of posts bemoaning the headaches of the Apple Intel to ARM architecture shift (part I and part II) before coming to a solution that works for me. (raku on docker on ubuntu on vftools as set out at the end of part II).
One of the drivers to choose this option was to get the whole of my (raku) stack running native on ARM (–platform linux/arm64) to get the performance boost of the new M1 CPU architecture.
Going back to something I posted in January 2021 – where I ran through the progressive speed ups that refactoring my code had achieved – there were a couple of test timings that I can now rerun.
|secs user time|
(1.2GHz/8GB/2 core macOS Intel core M)
|use Physics::Measure :ALL;||13||2.8|
|secs user time|
(3.2GHz/8GB/8 core macOS Apple M1)
|use Physics::Measure :ALL;||5.4||1.5|
|% improvement||first compile||pre-compiled|
|use Physics::Measure :ALL;||141%||87%|
So – the longer running tasks are showing a throughput improvement of up to 2.4x … my config. efforts and cash have been quite beneficial, then!
I also wanted to check the load across CPUs…
So it seems that my raku toolchain is fairly evenly loading the 4 performance cores … but this quick compile does not run long enough to get above about 60% load. The even spread is a very encouraging aspect of the built in raku concurrency support.
[Also – bear in mind that the new results are running in a docker container on an ubuntu VM on vftools on macOS – so there is a new level of system capability in play.]
Finally, I ran a full on raku load to check if all 4 performance CPUs would be kept busy …
raku -e '.say for ^∞'
This settles at %CPU = 121+76+24+10 = 230% (where 400% is the theoretical max of 100% per CPU times 4 CPUs). Green is the user process (terminal) and red the system process (vftools). Again, it is good to see raku evenly spreading the load across all performance cores.
Hmmm – wonder if I can get something to use the GPUs 😉
As ever, your feedback and comments are very welcome…