Rails RSpec Elasticsearch Parallel Test Suite
Rails RSpec Parallel Test Suite
A straight forward suite with parallel_tests is simple and clever.Simply append ENV['TEST_ENV_NUMBER'] to your test db name and it will create 16 unique databases if you run with 16 threads.
This is step one
database.yml:
test:
<<: *default
database: myapp_test<%=ENV['TEST_ENV_NUMBER'] %>
add the gem 'parallel_tests' to your Gemfile and use a meaningful
.rspec_parallel
--format progress
--format ParallelTests::RSpec::RuntimeLogger --out tmp/parallel_runtime_rspec.log
--format ParallelTests::RSpec::SummaryLogger --out tmp/spec_summary.log
--format ParallelTests::RSpec::FailuresLogger --out tmp/failing_specs.log
then get ready for spec run
RAILS_ENV=test rails parallel:drop parallel:create db:migrate parallel:prepare
run specs in parallel
RAILS_ENV=test rails parallel:spec
Elasticsearch and parallel_tests strategies
Enter elasticsearch. Now your parallel spec threads all sabotage each others, sending stuff that's on your 16 unique database to the same elasticsearch at possibly the same time.
Boom! Test fails
Googling the error takes you to few places and with little detail, but you should find two approaches:
A - Simulated Mutex
https://github.com/grosser/parallel_tests/wiki#disable-parallel-run-for-certain-cases-cucumber
What you are trying to is to have all the 16 parallel threads looking for the same file and check if they are cleared to go ahead.If any other thread is holding the file, the others have to wait.
Hence only one parallel thread at a time interacts with elasticsearch.
Language GROQ:
config.around(:each, search: true) do |example|
if ENV['PARALLEL_TEST_GROUPS']
## Simulating a mutex to avoid parallel runs of elasticsearch
while File.exists?("tmp/elasticsearch.tmp")
sleep(0.2)
end
File.open("tmp/elasticsearch.tmp", "w") {}
end
Searchkick.callbacks(true) do
example.run
end
if ENV['PARALLEL_TEST_GROUPS']
File.delete("tmp/elasticsearch.tmp")
end
end
Notice I use ENV['PARALLEL_TEST_GROUPS']?
That's because ENV['TEST_ENV_NUMBER'] is nil for first parallel test.A huge gotcha, I wish it would follow liskov principle and always be a number.
Anyway, the simulated mutex... I seem to get sporadic errors, this is really a weak mutex, the threads work fast and two threads may have seen the file isn't there and both trying to move forward.
I guess I would chose this approach if the specs needing the mutex are few. If many, you have threads waiting in line too much.
B - Spin up X in-memory elasticsearch test cluster
This is how I ended up solving it, but I don't really love this either.
Still, it cuts the spec time from 8min single thread to 3min 2sec 5 threads on my 8 core (16 thread machine.)
In the process, I blew up my computer several times, only force shutdown with power button helped.
The problem is memory, it both preassigns too much and uses too much.
rails_helper.rb:
require 'rspec/rails' insert right after this,before the rest of the app is loaded,
require 'elasticsearch/extensions/test/cluster'
# Checking for PARALLEL_TEST_GROUPS env var as ENV['TEST_ENV_NUMBER'] is nil in first parallel thread (!)
run_es_test_cluster = ENV['USE_SYSTEM_ELASTICSEARCH_IN_SINGLE_THREAD'].blank? || ENV['PARALLEL_TEST_GROUPS'].present?
if run_es_test_cluster
port = 9250 + ENV['TEST_ENV_NUMBER'].to_i
ENV['ELASTICSEARCH_URL'] = "http://localhost:#{port}" #required by searchkick
es_options = {
port: port,
cluster_name: "cluster#{ENV['TEST_ENV_NUMBER']}",
path_data: "tmp/elasticsearch_test#{ENV['TEST_ENV_NUMBER']}",
number_of_nodes: 1,
timeout: 120,
command: ENV['ELASTICSEARCH_BINARY']
}
end
Elasticsearch::Extensions::Test::Cluster.start(**es_options) if run_es_test_cluster
#.... then in the config part: ... #
# stop elasticsearch cluster after test run
if run_es_test_cluster
config.after :suite do
Elasticsearch::Extensions::Test::Cluster.stop(**es_options)
end
end
why the run_es_test_cluster, why not always run the elasticsearch in-memory test cluster?
2sec vs 12sec. If a spec takes 2 sec, spinning up the test cluster adds 10 sec.
That's very expensive in your continuous testing process where you don't want to run whole suites.
I assume you already got elasticsearch service installed and up and running.
If not:
Linux: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/deb.html#deb-repo
Mac: https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html
To get the elasticsearch test cluster running you need a few more things:
add gem 'elasticsearch-extensions' to your group :test in Gemfile
Download and unpack elasticsearch as you need the binary. I haven't found a better way, I learned that you don't want to mess with the elasticsearch installed by apt or brew.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/targz.html
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.6.1-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.6.1-linux-x86_64.tar.gz
Mac:
Try without this step first.
I've been told that a brew install of elasticsearch worked. If parallel_tests can't find elasticsearch, then download and install like this:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-darwin-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-darwin-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.6.1-darwin-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.6.1-darwin-x86_64.tar.gz
add elasticsearch-7.6.1/ to your .gitignore so you don't do anything stupid like committing it to git. You may download to other places as well of course, your choice, just reference the path to it:
.env:
ELASTICSEARCH_BINARY=elasticsearch-7.6.1/bin/elasticsearch
NB: Reduce memory or you'll blow up your computer:
# jvm.options (under elasticsearch-7.6.1/config/jvm.options) change to
elasticsearch-7.6.1/config/jvm.options:
-Xms128m
-Xmx128m
it defaults to 1g!
See how that blows up your computer if you spin up 16 of those?
put this in your .env and .env.sample for this project
.env.sample:
# Use conservative amount of parallel threads as this is very memory intense. If you hit mem or cpu limits, you start congesting and finish slower
PARALLEL_TEST_PROCESSORS=4
# Elasticsearch for in-memory testdb in specs. If this is not set it'll try look for the binary in path, which by default is not in path nor accessible through command line even if you find the location of the binary.
ELASTICSEARCH_BINARY=elasticsearch-7.6.1/bin/elasticsearch
# This setting is ignored when running parallel_tests
# If set true, check that the local elasticsearch is running
# sudo service elasticsearch status
# sudo service elasticsearch start
# Recommendation is to keep this true, so indivdual test runs are faster, no booting new test clusters each time.
# 2s if true: rspec spec/models/meeting_spec.rb
# 12s if false: rspec spec/models/meeting_spec.rb
USE_SYSTEM_ELASTICSEARCH_IN_SINGLE_THREAD=true
Congrats!
You've got a build that runs parallel_tests with a DB and elasticsearch :-)
... I hope :D