Unstable fails in unit tests for ios builds on macos

Jenkins setup:
Our Jenkins is a Ubuntu Container on a proxmox environment. Agent host are 3 MacOs hosts that are identical. Each Agent host has 4 agent nodes which are set up as “Launch agents via SSH”
the agent is:
`

Unix agent, version 3248.3250.v3277a_8e88c9b_


Using JDK 19
`
vm_stat for the agent :

Mach Virtual Memory Statistics: (page size of 16384 bytes)
Pages free:                                1464.
Pages active:                           1252895.
Pages inactive:                         1247197.
Pages speculative:                         4308.
Pages throttled:                              0.
Pages wired down:                        194045.
Pages purgeable:                             16.
"Translation faults":              125089871673.
Pages copy-on-write:                 3808674686.
Pages zero filled:                  42643370187.
Pages reactivated:                  12866462578.
Pages purged:                          29602958.
File-backed pages:                       529090.
Anonymous pages:                        1975310.
Pages stored in compressor:             2795315.
Pages occupied by compressor:           1444460.
Decompressions:                      7350162254.
Compressions:                        7488429512.
Pageins:                             1804827000.
Pageouts:                               3643868.
Swapins:                                      0.
Swapouts:                                     0.

Software:

    System Software Overview:

      System Version: macOS 14.5 (23F79)
      Kernel Version: Darwin 23.5.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: Machost1
      User Name: ios (ios)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 76 days, 23 hours, 29 minutes

Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac13,1
      Model Number: Z14J000JLD/A
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 64 GB
      System Firmware Version: 10151.121.1
      OS Loader Version: 10151.121.1
      Activation Lock Status: Disabled

we are starting unit and several tests using fastlane with the command below

fastlane test deployment_target_version:17.5

 lane :integration_test do |options|
    ios_test(
      scheme: "IntegrationTests",
      deployment_target_version: options[:deployment_target_version]
    )
  end

so it uses xcode Schemes that are defined in repository.

When we do the tests on xcode or fastlane on our local M1 Apples it works without any error. When it runs on Jenkins pipeline probability of success is %50. Sometimes it fails sometimes it does not.
Here is an example of test result.

Failing tests:
  -[VSFileNameGeneratorTests testWhenCurrentTimeIsZeroThenClientIdContainsZero]
  -[NSObject_SwizzlingsTests testWhenTryingToDisablePredictiveTextThenPrivateMethodForSpellingPreferenceExists]

I would really appreciate any idea about what the problem can be and where I should look.

Hello and welcome to this community, @canbuyukburc. :wave:

Given the intermittent nature of the test failures on Jenkins, there are several potential areas to investigate:

  1. Resource Constraints
    The vm_stat output indicates that the system has limited free memory. Please verify that the agents have sufficient resources (CPU, memory) to run the tests reliably. You might want to monitor the resource usage during the test runs.
  2. Environment Differences
    There might be subtle differences between the local environment and the Jenkins environment. Check that the Jenkins agents have the same versions of Xcode, Fastlane, and other dependencies as your local machines.
  3. Concurrency Issues
    Running multiple agents on the same host might lead to resource contention. Try reducing the number of concurrent builds or isolating the tests to see if it improves stability.
  4. Network Issues
    If the tests rely on network resources, intermittent network issues could cause failures. Ensure that the network is stable and that there are no connectivity issues.
  5. Disk I/O
    Ensure that the disk I/O performance is adequate. Slow or overloaded disks can cause timeouts and other issues.
  6. Test Flakiness
    Some tests might be inherently flaky (been there, done that). Review the failing tests to see if they have any timing dependencies or other conditions that could cause intermittent failures.
  7. Logs and Debugging
    Check the Jenkins logs and the test logs for any additional information that might indicate the cause of the failures. Enable verbose logging for Fastlane and Xcode to get more detailed output.
  8. Jenkins Configuration
    Ensure that the Jenkins configuration is optimal for running iOS tests. This includes proper setup of the SSH agents, environment variables, and any other relevant settings. :person_shrugging:

hi @poddingue
thank you for such a detailed answer.
I went through you answer and here are my findings

  1. A now assigned min2 max 10GB of RAM to the Jenkins Agents als JVM option.
  2. the Xcode version is fixed for the project and thats why Jenkins Agent and developers use the same version. But I found out fastlane version is not fixed and may be MacOs on the build nodes needs updating, which needs coordination of huge teams so it will take time.
  3. Normally we do not have concurrent builds of the same type. One might be doing unit test but other might be doing upload but they use similar resources like Xcode or fastlane.
  4. Network is not an issue,locally git and macosses are colocated.
  5. Disk I/O might be a problem because I have seen that on the macos nodes there is this “fseventsd” process sucking up a lot of resources and it points out to a problem on file system. So a reboot and hopefully an update might help.
  6. therefore we run the tests 2 times using the pipeline retry option but fails at the same point whereas it does not fail locally.
  7. I will start another test today using verbose. Right now waiting for the result of another run.
1 Like

Thanks a lot for your feedback. :crossed_fingers:

Increasing VM size and helped.
Also updating fastlane was helpful.

1 Like

Good to hear, thanks a lot for the feedback. :+1: