Rails

Programmers

March, 2020

Rails RSpec Elasticsearch Parallel Test Suite

Ole Morten Amundsen

Rails RSpec Parallel Test Suite

A straight forward suite with parallel_tests is simple and clever.Simply append ENV['TEST_ENV_NUMBER'] to your test db name and it will create 16 unique databases if you run with 16 threads.

This is step one

database.yml:

test: <<: *default database: myapp_test<%=ENV['TEST_ENV_NUMBER'] %>

add the gem 'parallel_tests' to your Gemfile and use a meaningful

.rspec_parallel --format progress --format ParallelTests::RSpec::RuntimeLogger --out tmp/parallel_runtime_rspec.log --format ParallelTests::RSpec::SummaryLogger --out tmp/spec_summary.log --format ParallelTests::RSpec::FailuresLogger --out tmp/failing_specs.log

then get ready for spec run

RAILS_ENV=test rails parallel:drop parallel:create db:migrate parallel:prepare

run specs in parallel

RAILS_ENV=test rails parallel:spec

Elasticsearch and parallel_tests strategies

Enter elasticsearch. Now your parallel spec threads all sabotage each others, sending stuff that's on your 16 unique database to the same elasticsearch at possibly the same time.

Boom! Test fails

Googling the error takes you to few places and with little detail, but you should find two approaches:

A - Simulated Mutex

https://github.com/grosser/parallel_tests/wiki#disable-parallel-run-for-certain-cases-cucumber

What you are trying to is to have all the 16 parallel threads looking for the same file and check if they are cleared to go ahead.If any other thread is holding the file, the others have to wait.

Hence only one parallel thread at a time interacts with elasticsearch.

Language GROQ:

config.around(:each, search: true) do |example| if ENV['PARALLEL_TEST_GROUPS'] ## Simulating a mutex to avoid parallel runs of elasticsearch while File.exists?("tmp/elasticsearch.tmp") sleep(0.2) end File.open("tmp/elasticsearch.tmp", "w") {} end Searchkick.callbacks(true) do example.run end if ENV['PARALLEL_TEST_GROUPS'] File.delete("tmp/elasticsearch.tmp") end end

Notice I use ENV['PARALLEL_TEST_GROUPS']?

That's because ENV['TEST_ENV_NUMBER'] is nil for first parallel test.A huge gotcha, I wish it would follow liskov principle and always be a number.

Anyway, the simulated mutex... I seem to get sporadic errors, this is really a weak mutex, the threads work fast and two threads may have seen the file isn't there and both trying to move forward.

I guess I would chose this approach if the specs needing the mutex are few. If many, you have threads waiting in line too much.

B - Spin up X in-memory elasticsearch test cluster

This is how I ended up solving it, but I don't really love this either.

Still, it cuts the spec time from 8min single thread to 3min 2sec 5 threads on my 8 core (16 thread machine.)

In the process, I blew up my computer several times, only force shutdown with power button helped.

The problem is memory, it both preassigns too much and uses too much.

rails_helper.rb:

require 'rspec/rails' insert right after this,before the rest of the app is loaded, require 'elasticsearch/extensions/test/cluster' # Checking for PARALLEL_TEST_GROUPS env var as ENV['TEST_ENV_NUMBER'] is nil in first parallel thread (!) run_es_test_cluster = ENV['USE_SYSTEM_ELASTICSEARCH_IN_SINGLE_THREAD'].blank? || ENV['PARALLEL_TEST_GROUPS'].present? if run_es_test_cluster port = 9250 + ENV['TEST_ENV_NUMBER'].to_i ENV['ELASTICSEARCH_URL'] = "http://localhost:#{port}" #required by searchkick es_options = { port: port, cluster_name: "cluster#{ENV['TEST_ENV_NUMBER']}", path_data: "tmp/elasticsearch_test#{ENV['TEST_ENV_NUMBER']}", number_of_nodes: 1, timeout: 120, command: ENV['ELASTICSEARCH_BINARY'] } end Elasticsearch::Extensions::Test::Cluster.start(**es_options) if run_es_test_cluster #.... then in the config part: ... # # stop elasticsearch cluster after test run if run_es_test_cluster config.after :suite do Elasticsearch::Extensions::Test::Cluster.stop(**es_options) end end

why the run_es_test_cluster, why not always run the elasticsearch in-memory test cluster?

2sec vs 12sec. If a spec takes 2 sec, spinning up the test cluster adds 10 sec.

That's very expensive in your continuous testing process where you don't want to run whole suites.

I assume you already got elasticsearch service installed and up and running.

If not:

Linux: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/deb.html#deb-repo

Mac: https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html

To get the elasticsearch test cluster running you need a few more things:

add gem 'elasticsearch-extensions' to your group :test in Gemfile

Download and unpack elasticsearch as you need the binary. I haven't found a better way, I learned that you don't want to mess with the elasticsearch installed by apt or brew.

https://www.elastic.co/guide/en/elasticsearch/reference/7.6/targz.html

Linux:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz.sha512 shasum -a 512 -c elasticsearch-7.6.1-linux-x86_64.tar.gz.sha512 tar -xzf elasticsearch-7.6.1-linux-x86_64.tar.gz

Mac:

Try without this step first.

I've been told that a brew install of elasticsearch worked. If parallel_tests can't find elasticsearch, then download and install like this:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-darwin-x86_64.tar.gz wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-darwin-x86_64.tar.gz.sha512 shasum -a 512 -c elasticsearch-7.6.1-darwin-x86_64.tar.gz.sha512 tar -xzf elasticsearch-7.6.1-darwin-x86_64.tar.gz

add elasticsearch-7.6.1/ to your .gitignore so you don't do anything stupid like committing it to git. You may download to other places as well of course, your choice, just reference the path to it:

.env:

ELASTICSEARCH_BINARY=elasticsearch-7.6.1/bin/elasticsearch

NB: Reduce memory or you'll blow up your computer:

# jvm.options (under elasticsearch-7.6.1/config/jvm.options) change to

elasticsearch-7.6.1/config/jvm.options:

-Xms128m -Xmx128m

it defaults to 1g!

See how that blows up your computer if you spin up 16 of those?

put this in your .env and .env.sample for this project

.env.sample:

# Use conservative amount of parallel threads as this is very memory intense. If you hit mem or cpu limits, you start congesting and finish slower PARALLEL_TEST_PROCESSORS=4

# Elasticsearch for in-memory testdb in specs. If this is not set it'll try look for the binary in path, which by default is not in path nor accessible through command line even if you find the location of the binary. ELASTICSEARCH_BINARY=elasticsearch-7.6.1/bin/elasticsearch

# This setting is ignored when running parallel_tests # If set true, check that the local elasticsearch is running # sudo service elasticsearch status # sudo service elasticsearch start # Recommendation is to keep this true, so indivdual test runs are faster, no booting new test clusters each time. # 2s if true: rspec spec/models/meeting_spec.rb # 12s if false: rspec spec/models/meeting_spec.rb USE_SYSTEM_ELASTICSEARCH_IN_SINGLE_THREAD=true

Congrats!

You've got a build that runs parallel_tests with a DB and elasticsearch :-)
... I hope :D