raku on the M1 – up to 2.4x faster

In November (2021) I wrote a couple of posts bemoaning the headaches of the Apple Intel to ARM architecture shift (part I and part II) before coming to a solution that works for me. (raku on docker on ubuntu on vftools as set out at the end of part II).

One of the drivers to choose this option was to get the whole of my (raku) stack running native on ARM (–platform linux/arm64) to get the performance boost of the new M1 CPU architecture.

Going back to something I posted in January 2021 – where I ran through the progressive speed ups that refactoring my code had achieved – there were a couple of test timings that I can now rerun.

secs user time
Jan 2021
(1.2GHz/8GB/2 core macOS Intel core M)
first compilepre-compiled
use Physics::Measure;101.2
use Physics::Measure :ALL;132.8
secs user time
Jan 2022
(3.2GHz/8GB/8 core macOS Apple M1)
first compilepre-compiled
use Physics::Measure;4.50.9
use Physics::Measure :ALL;5.41.5
% improvementfirst compilepre-compiled
use Physics::Measure;122%33%
use Physics::Measure :ALL;141%87%

So – the longer running tasks are showing a throughput improvement of up to 2.4x … my config. efforts and cash have been quite beneficial, then!

I also wanted to check the load across CPUs…

So it seems that my raku toolchain is fairly evenly loading the 4 performance cores … but this quick compile does not run long enough to get above about 60% load. The even spread is a very encouraging aspect of the built in raku concurrency support.

[Also – bear in mind that the new results are running in a docker container on an ubuntu VM on vftools on macOS – so there is a new level of system capability in play.]

Finally, I ran a full on raku load to check if all 4 performance CPUs would be kept busy …

$> raku -e '.say for ^∞'

This settles at %CPU = 121+76+24+10 = 230% (where 400% is the theoretical max of 100% per CPU times 4 CPUs). Green is the user process (terminal) and red the system process (vftools). Again, it is good to see raku evenly spreading the load across all performance cores.

Hmmm – wonder if I can get something to use the GPUs 😉

As ever, your feedback and comments are very welcome…

~p6steve

3 Comments

  1. p6steve says:

    As mentioned in my previous post, the Intel => ARM change pain for developer tools is not a raku only phenomenon as can be see here … where the “solution” is to run everything under rosetta interpreter and thus forgo a lot of the speed potential … https://alexslobodnik.medium.com/apple-m1-python-pandas-and-homebrew-20f14828ccc7

    Liked by 1 person

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s