How to process a list of files across a group of agents using pipeline?

@mawinter69 I’m sorry but there is still a problem. The filenames get extracted ok, but there is an exception further down.

parameters {
    text defaultValue: 'file1.txt\nfile2.txt\nfile3.txt\nfile4.txt',
           name: 'filenames', 
}

stages {

    stage('exp') {

        steps {
            script {
                def files = params.filenames.split('\n')
                def p = [:]

                for (file in files) {
                    echo "Preparing $file"
                    p[file] = {
                        node(params.agents) {
                            def nextFile = files.pop()
                            echo "$nextFile"
                            sleep 10
                        }
                    }
                }
                parallel p
            }   
        }
    } 

Output:

[Pipeline] { (Branch: file1.txt)
[Pipeline] { (Branch: file2.txt)
[Pipeline] { (Branch: file3.txt)
[Pipeline] { (Branch: file4.txt)

[Pipeline] }
Failed in branch file1.txt
[Pipeline] }
Failed in branch file2.txt
[Pipeline] }
Failed in branch file3.txt
[Pipeline] }
Failed in branch file4.txt

Also:   hudson.remoting.ProxyException: groovy.lang.MissingMethodException: No signature of method: [Ljava.lang.String;.pop() is applicable for argument types: () values: []
Possible solutions: drop(int), min(), dump(), max(), grep(), sort()

Thank you Marcus, working now.

@mawinter69 Hi Markus, I’m sorry but I’ve realized I don’t fully understand your example. In this snippet:

p[file] = {
    node('a||b') {
        def nextFile = files.pop()
        echo "$nextFile"
        sleep 10
    }
}

I assume that ‘echo’ and ‘sleep’ are bash commands, is that correct?

If I add ‘ls -l’ as an example additional command:

p[file] = {
    node('a||b') {
        def nextFile = files.pop()
        echo "$nextFile"
        ls -l
        sleep 10
    }
}

I get an exception:

groovy.lang.MissingPropertyException: No such property: ls for class: groovy.lang.Binding
Possible solutions: class

What I actually want to do is to run a bash command with the Groovy variable:

ls -l "$nextFile"

Could you help me with this please?

sleep and echo are pipeline steps here.
If you want to run a shell step write

sh """
  ls -la "$nextFile"
"""

This assumes that the file is in the current folder of course if not absolute

1 Like

@mawinter69 Hi Markus, may we revisit this code please. It is working fine, but the reporting in Jenkins Pipeline Console output is strange.

For this example code:

pipeline {

    agent { label "jenkins-ubuntu22-2" }

    stages {
        stage('example_2') {
            steps {
                script {
                    def my_agent = 'jenkins-ubuntu22-2'

                    def files_list = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt', 'file6.txt', 'file7.txt', 'file8.txt', 'file9.txt', 'file10.txt']

                    // Reverse the list so that pop() will pop filenames starting with the first filename in the parameter string.
                    def files_list_reverse = files_list.reverse()

                    // Create a map that will contain info for the parallel stages.
                    def p = [:]
                    for (file in files_list_reverse) {
                        echo "Preparing $file"
                        p[file] = {
                            node(my_agent) {

                                def nextFile = files_list_reverse.pop()
                                
                                sh """
                                    echo $nextFile
                                """

                            }
                        }
                    }

                    // Execute the parallel stages
                    parallel p
                }   
            }
        }              
    }
}

The Pipeline Console shows:


The image shows stage ‘file2.txt’ selected but the console output is for ‘file9.txt’.
So the mapping between the parallel stages and the console output is incorrect.
Can you comment on what is wrong please?

The stages are executed in parallel. And p is a hash, that means the order in which the parallel stages are started is not guaranteed to be in the order they were added. So it can happen that the file2.txt is started after the one for file9.txt. If execution order is important then parallel is the wrong approach.

Thanks, the execution order is not critical. It just means that the pipeline console is hard to navigate, but we can live with that. Thanks for your answer.

maybe don’t name the stages after the files, but use a simple counter

1 Like

@mawinter69 May I ask for another hint regarding this parallel execution pipeline please?

As a reminder, the code you suggested was:

script {
   def files = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt', 'fil65.txt', 'file7.txt', 'file8.txt', 'file9.txt', 'file10.txt']
   def p = [:]
   for (file in files) {
       echo "Preparing $file"
       p[file] = {
           node('a||b') {
               def nextFile = files.pop()
               echo "$nextFile"
               sleep 10
           }
       }
   }
   parallel p
}

I have evolved the code for my application but the concept is working very well. Instead of the echo I execute an application:

sh """
    ./myApp $nextFile
"""

At the end of the pipeline myApp will have been executed once for each file in list files.
I want to include in the final email notification the list of files together with an indication of whether or not myApp succeeded (exit code 0) or failed (exit code 1) for each file.

file 1.txt  Passed
file 2.txt  Failed
etc.

I would be grateful for a hint for how to do this.

Would it be best to do this by somehow providing a post processing section for each parallel stage? I am wondering if that section could add a text entry to a global results list, which could be included in the final email notification.

I would be grateful for a hint as to how to organize the post processing sections.

Thanks in advance.

the sh step allows to return the status of the commands, that allows to collect the results.

script {
  def results = [:]
   def files = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt', 'fil65.txt', 'file7.txt', 'file8.txt', 'file9.txt', 'file10.txt']
   def p = [:]
   for (file in files) {
       echo "Preparing $file"
       p[file] = {
           node('a||b') {
               def nextFile = files.pop()
               results[nextFile] = sh returnStatus: true, script: "./myApp $nextFile"

               sleep 10
           }
       }
   }
   parallel p

  // construct the mail from the results map
}

Thanks for this. What is the syntax for a multi-line shell script please?

For example, how to handle this as a script?

sh """
    mkdir -p $results_path
    cd $results_path
    ./myApp $nextFile
"""