AWS Lambda vs Windmill
This benchmark compares the performance of AWS Lambda and Windmill for executing a simple but computationally intensive task - calculating the 33rd Fibonacci number using a recursive algorithm.
We chose this task because:
- It's CPU-bound rather than I/O-bound, making it a good test of raw compute performance.
- The recursive implementation creates significant computational overhead.
- It's simple enough to implement identically in both platforms.
The goal is to measure and compare:
- Cold start latency.
- Warm execution time.
- Consistency of response times.
- Cost implications at scale.
Both services were configured similarly, using Python 3.11 as the runtime. The same Fibonacci calculation code was deployed to each platform to ensure a fair comparison.
Setup
Windmill
The setup is the exact same as for the other benchmarks . We used the same EC2 m4-large instance and deployed Windmill on docker using the docker-compose.yml in Windmill's official GitHub repo (with the same adjustment, i.e. 1 worker only, even though for this use case it would not make a difference).
We created a script in Windmill computing a Fibonacci number in Python:
N_FIBO = 33
# WIMDMILL script: `u/benchmarkuser/fibo_script`
def fibo(n: int):
    if n <= 1:
        return n
    else:
        return fibo(n - 1) + fibo(n - 2)
def main():
    return fibo(N_FIBO)
which we called multiple times using its webhook. We used siege benchmark tool to trigger the jobs multiple times using its webhook:
siege -r500 -c1 -v -H "Cookie: token=$WM_TOKEN" "http://$WM_HOST/api/w/benchmarks/jobs/run_wait_result/p/u%2Fbenchmarksuser%2Ffibo_script"
AWS Lambda
We set up a Lambda running Python 3.11 with the following simple script:
import json
N_FIBO = 33
def fibo(n: int):
    if n <= 1:
        return n
    else:
        return fibo(n - 1) + fibo(n - 2)
def lambda_handler(event, context):
    result = fibo(N_FIBO)
    return {
        'statusCode': 200,
        'body': json.dumps(result)
    }
We gave the Lambda 2048MB of memory, but according to the logs the memory never exceeded 50MB. On AWS, vCPU is proportional to the memory so we can assume it got a decent vCPU.
We exposed a trigger through AWS API Gateway and from our EC2 instance, we called it using the same siege benchmark tool:
siege -r500 -c1 -v -H "x-api-key: $AWS_API_KEY" "https://$AWS_LAMBDA_HOST/default/fibo_lambda"
Results
In the same vein as the other benchmarks, we ran fibonacci(10) 500 times (--reps=500 as siege argument) and fibonacci(33) 100 times.
The results were the following:
| (in seconds) | # reps | AWS Lambda (sec) | Windmill Normal (sec) | Windmill Dedicated Worker (sec) | 
|---|---|---|---|---|
| fibo(10) | 500 | 36.56 | 55.36 | 26.81 | 
| fibo(33) | 100 | 93.95 | 109.06 | 104.5 | 
Which gives an average duration per job in milliseconds:
| (in milliseconds) | AWS Lambda (sec) | Windmill Normal (sec) | Windmill Dedicated Worker (sec) | 
|---|---|---|---|
| fibo(10) | 73 | 111 | 54 | 
| fibo(33) | 940 | 1091 | 1045 | 
Visually, we have the following graphs:
Conclusion
For running a high number of lightweight tasks (fibonacci(10)), we can see that Windmill in dedicated worker mode is the fastest. Windmill in normal mode is slower because it runs a cold start for each task.
For long running tasks (fibonacci(33)), Windmill in normal mode and dedicated worker mode is almost equivalent because the warm-up time needed in normal mode is negligible compared to the duration of tasks. AWS Lambda has slightly better performance for those kind of tasks, likely because it is able to run the core of the Python logic faster than Windmill.