Testing hell.

I'm working on a ticket that touches a lot of areas of the codebase, and impacts everything that creates a ... really common kind of object.

This means changes throughout the codebase and lots of failing specs. Ofc sometimes the code needs changing, and sometimes the specs do. it's tedious.

What makes this incredibly challenging is that different specs fail depend on how i run them. If I use Jenkins, i'm currently at 160 failing tests. If I run the same specs from the terminal, Iget 132. If I run them from RubyMine... well, I can't run them all at once because RubyMine sucks, but I'm guessing it's around 90 failures based on spot-checking some of the files.

But seriously, how can I determine what "fixed" even means if the issues arbitrarily pass or fail in different environments? I don't even know how cli and rubymine *can* differ, if I'm being honest.

I asked my boss about this and he said he's never seen the issue in the ten years he's worked there. so now i'm doubly confused.

Update: I used a copy of his db (the same one Jenkins is using), and now rspec reports 137 failures from the terminal, and a similar ~90 (again, a guess) from rubymine based on more spot-checking. I am so confused. The db dump has the same structure, and rspec clears the actual data between tests, so wtf is even going on? Maybe the encoding differs? but the failing specs are mostly testing logic?

none of this makes any sense.
i'm so confused.

It feels like i'm being asked to build a machine when the laws of physics change with locality. I can make it work here just fine, but it misbehaves a little at my neighbor's house, and outright explodes at the testing ground.

  • 6
    When everything is confusing, it might be lupus...

    Aka: Use cross diagnosis from Dr House.

    Start with one failing test, examine it, write down every fucking shit it does, then try to find out what and why it changes based on the data you have.

    From experience, it sounds like something small with a lot of influence...

    If I'd start, I'd try examining the environment... Depending on what it does, maybe an sysctl / rlimit setting might be different.

    Then climb up the ladder - Timezone / Encoding / Software Versions....

    Everything the test suite is influenced by. Mostly it's, like jenkins, a linux bash environment, so from exports to locales to whatever I find.

    Then the services and their configuration, everything the test touches.

    Last but not least the test itself.

    Last I usually cross referenced everything and double check I haven't missed anything.

    It's tedious but amazing what you learn on the way.
  • 0

    I think it is because of the external service since you get different test result everytime you run them.Maybe the external service that the tests are using is failing?
  • 0
    @IntrusionCM what this guy says.

    I've encountered such issues when dependencies get mashed up. A development codebase written in lets say ruby 2.5 and it's gems then goes through the pipeline with a 2.4 rubyversion and some older gems and so on.

    In a java project somehow a dependency for a library got hardcoded into a meta_inf file while the maven project had a different one. When parallell building was used you got a race condition where the first loaded lib got used, and that caused varying results in code coverage and consequently failed builds.

    In another occasion i had to check for system setrings influencing database engine which can fuck up a lot.

    A lot of areas but to my experience, such randomness is often due to combination of mismatch in deps and the nature of miltithreaded building and testing.
  • 0
    If you consistantly get the same amount of failing tests for the different ways to run you have some environment issue.

    Either this us a good indication of code that fails in some environment, or it is tests that should be more isolated from environment.

    If the number of failing tests differ you have un unstable test, either from environment, race conditions or it uses random data.

    For example, if you hit the database thing can be affected there if anything is time or date based, make sure you either use a moched date provider, or generate testcases based on now.

    We have a similar problem with some manual tests that are dependent on data in a third party system which can fail the test if the third party data is out of bounds of the test :/

    Idealy you have full control over all environment related info for a test.
Add Comment