Specifying ode solver options to speed up compute time

Question

I'm specifying the 'JPattern', sparsity_pattern in the ode options to speed up the compute time of my actual system. I am sharing a sample code below to show how I set up the system using a toy example.  Specifying the JPattern helped me in reducing the compute time from 2 hours to 7 min for my real system.  I'd like to know if there are options (in addition to JPatthen)  that I can specify to further decrease the compute time. I found the Jacobian  option but I am not sure how to compute the Jacobian easily for my real system.
global mat1 mat2
mat1=[ 
       1    -2     1     0     0     0     0     0     0     0;
       0     1    -2     1     0     0     0     0     0     0;
       0     0     1    -2     1     0     0     0     0     0;
       0     0     0     1    -2     1     0     0     0     0;
       0     0     0     0     1    -2     1     0     0     0;
       0     0     0     0     0     1    -2     1     0     0;
       0     0     0     0     0     0     1    -2     1     0;
       0     0     0     0     0     0     0     1    -2     1;
       ];

mat2 = [
        1    -1     0     0     0     0     0     0     0     0;
        0     1    -1     0     0     0     0     0     0     0;
        0     0     1    -1     0     0     0     0     0     0;
        0     0     0     1    -1     0     0     0     0     0;
        0     0     0     0     1    -1     0     0     0     0;
        0     0     0     0     0     1    -1     0     0     0;
        0     0     0     0     0     0     1    -1     0     0;
        0     0     0     0     0     0     0     1    -1     0;
        ];

x0 = [1 0 0 0 0 0 0 0 0 0]';
tspan = 0:0.01:5;

f0 = fun(0, x0);
joptions = struct('diffvar', 2, 'vectvars', [], 'thresh', 1e-8, 'fac', []);
J = odenumjac(@fun,{0 x0}, f0, joptions); 
sparsity_pattern = sparse(J~=0.);
options = odeset('Stats', 'on', 'Vectorized', 'on', 'JPattern', sparsity_pattern);

ttic = tic();
[t, sol]  =  ode15s(@(t,x) fun(t,x), tspan , x0, options);
ttoc = toc(ttic)
fprintf('runtime %f seconds ...n', ttoc)
plot(t, sol)

function f = fun(t,x)
global mat1 mat2
% f = zeros('like', x)
% size(f)
f = zeros(size(x), 'like', x);
size(f);
f(1,:) = 0;
f(2:9,:) = mat1*x + mat2*x;
f(10,:) = 2*(x(end-1) - x(end));
% df = [f(1, :); f(2:9, :); f(10, :)];
end

Are there inbuilt options available for computing the Jacobian?
I tried something like the below
x = sym('x', [5 1]);
s = mat1*x + mat2*x;
J1 = jacobian(s, x)

But this takes a huge time for a large system.
Suggestions will be really appreciated.
EDIT:
global advMat diffMat

gridSize = 10000;
x0 = [1; zeros(gridSize - 1, 1)];
tspan = 0:0.01:5;

% Specify the finite difference matrices including boundary conditions
% You'll probably need to add some 1/dx or 1/dx^2 scaling to the matrices
e = ones(gridSize, 1);
advMat = spdiags([e, -e], 0:1, gridSize, gridSize);
diffMat = spdiags([e, -2*e, e], -1:1, gridSize, gridSize);

f0 = f(0, x0);
joptions = struct('diffvar', 2, 'vectvars', 2, 'thresh', 1e-8, 'fac', []);
J = odenumjac(@f,{0 x0}, f0, joptions); 
jpattern = sparse(J~=0.);

% The problem is linear so the Jacobian is the sum of linear operators in RHS
jacobian = advMat + diffMat;

ttic = tic();
% [t, sol] = ode15s(@(t, y) f(t, y), tspan, x0, odeset('Jacobian', jacobian));
% [t, sol] = ode15s(@(t, y) f(t, y), tspan, x0, odeset('JPattern', jpattern));
[t, sol] = ode15s(@(t, y) f(t, y), tspan, x0, odeset('Jacobian', jacobian, 'JPattern', jpattern));

ttoc = toc(ttic);
fprintf('runtime %f seconds ...n', ttoc);
plot(sol);

function dy = f(~, y)
global advMat diffMat
dy = advMat * y + diffMat * y;
end

Result:
For a grid size of 10000
Jacobian: runtime 15.508172 seconds ...
Jpattern: runtime 57.470325 seconds ...
Jacobian + Jpattern: runtime 30.028399 seconds ...

For a grid size of 5000
Jacobian: runtime 0.203265 seconds ...
Jpattern: runtime 0.650601 seconds ...
Jacobian + Jpattern: runtime 0.256899 seconds ...

For a grid size of 50000
Out of memory. Type "help memory" for your options.

:/

Steven Roberts · Answer

It looks like you have a advection diffusion PDE discretized with finite differences.  This gives an ODE of the form
$$
y' = f(y) = A y + D y,
$$
where $A$ is the discretized advection operator and $D$ is the discretized diffusion operator.  In your case, it seems you have zero Dirchlet and Neumann boundary conditions which can be encoded in the $A$ and $D$ coefficients (correct me if I'm wrong).  Note that
$$
frac{partial f}{partial y} = A + D,
$$
so that constant matrix is your Jacobian.
Here is how I would implement this sort of system.  For simplicity, I have assumed zero Dirchlet boundary conditions on both sides.  You'll have to modify this based on your boundary conditions.
gridSize = 100;

% Specify the finite difference matrices including boundary conditions
% You'll probably need to add some 1/dx or 1/dx^2 scaling to the matrices
e = ones(gridSize, 1);
advMat = spdiags([e, -e], 0:1, gridSize, gridSize);
diffMat = spdiags([e, -2*e, e], -1:1, gridSize, gridSize);

% The problem is linear so the Jacobian is the sum of linear operators in RHS
jacobian = advMat + diffMat;

x0 = [1; zeros(gridSize - 1, 1)];
tspan = 0:0.01:5;

ttic = tic();
[t, sol] = ode15s(@(t, y) f(t, y, advMat, diffMat), tspan, x0, odeset('Jacobian', jacobian));
ttoc = toc(ttic);
fprintf('runtime %f seconds ...n', ttoc);
plot(sol);

function dy = f(~, y, advMat, diffMat)

dy = advMat * y + diffMat * y;

end

The use of sparse matrices and the exact Jacobian (not just finite difference approximation) helps performance a lot.
--- Edit ---
The timing results in the OP's edit are not consistent with what I get on my computer, although it does confirm using the Jacobian option leads to the fastest result.  For the "out of memory error", that probably comes from the odenumjac function.  You should use the MATLAB debugger to track down the source.  If you insist on having JPattern, use
jacobian = advMat + diffMat;
jpattern = jacobian ~= 0;

Chris Rackauckas · Answer

Julia's DifferentialEquations.jl has a lot of tooling for automatically deriving (sparse) matrices. For more information, see the JuliaCon 2020 video on Auto-Optimization and Parallelism in DifferentialEquations.jl. Combined with the orders of magnitude acceleration commonly seen over the MATLAB solvers, this might be a good option for you and is a quick translation.
Here I show a few things, with a best time of 16.700 μs on your problem, which is done best with just automating the optimization with an explicit RK method, but I show how to automatically derive the Jacobian in sparse and Tridiagonal forms, which I assume might be useful for your full problem. Your MATLAB code took 0.456539 seconds on my computer, and Steven's took 0.054520 seconds on my computer, so this gave an overall speedup of 27,000x over your original code and 3,000x over the other poster's. Here it is:
using DifferentialEquations, BenchmarkTools

mat1=[
       1    -2     1     0     0     0     0     0     0     0;
       0     1    -2     1     0     0     0     0     0     0;
       0     0     1    -2     1     0     0     0     0     0;
       0     0     0     1    -2     1     0     0     0     0;
       0     0     0     0     1    -2     1     0     0     0;
       0     0     0     0     0     1    -2     1     0     0;
       0     0     0     0     0     0     1    -2     1     0;
       0     0     0     0     0     0     0     1    -2     1;
       ];

mat2 = [
        1    -1     0     0     0     0     0     0     0     0;
        0     1    -1     0     0     0     0     0     0     0;
        0     0     1    -1     0     0     0     0     0     0;
        0     0     0     1    -1     0     0     0     0     0;
        0     0     0     0     1    -1     0     0     0     0;
        0     0     0     0     0     1    -1     0     0     0;
        0     0     0     0     0     0     1    -1     0     0;
        0     0     0     0     0     0     0     1    -1     0;
        ];

x0 = [1.0,0,0,0,0,0,0,0,0,0]
saveat = 0:0.01:5

function fun(dx,x,p,t)
    dx[1,:] .= 0
    dx[2:9,:] .= mat1*x + mat2*x
    dx[10,:] .= 2*(x[end-1] - x[end])
end

prob = ODEProblem(fun,x0,(0.0,5.0))
sys = modelingtoolkitize(prob)
fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true)

# Explicit RK Methods
@btime sol = solve(fastprob,Tsit5()) # 16.700 μs (245 allocations: 40.28 KiB)
@btime sol = solve(fastprob,BS3()) # 19.800 μs (231 allocations: 33.70 KiB)
@btime sol = solve(fastprob,Vern7()) # 18.400 μs (266 allocations: 49.62 KiB)

# Stabilized-Explicit RK Methods
@btime sol = solve(fastprob,ROCK2()) # 173.300 μs (831 allocations: 159.59 KiB)
@btime sol = solve(fastprob,ROCK4()) # 237.100 μs (1958 allocations: 191.64 KiB)

# Implicit and Semi-Implicit Methods
@btime sol = solve(fastprob,Rosenbrock23()) # 83.200 μs (541 allocations: 53.50 KiB)
@btime sol = solve(fastprob,TRBDF2()) # 72.400 μs (297 allocations: 31.72 KiB)
@btime sol = solve(fastprob,KenCarp47()) # 110.500 μs (444 allocations: 33.02 KiB)

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 μs (3505 allocations: 1.22 MiB)
@btime sol = solve(sparseprob,TRBDF2()) # 254.000 μs (1332 allocations: 414.91 KiB)
@btime sol = solve(sparseprob,KenCarp47()) # 346.400 μs (1757 allocations: 525.05 KiB)

using Setfield, LinearAlgebra
f = fastprob.f
newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype)
newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity)
tridiagprob = ODEProblem(newf,x0,(0.0,5.0))
@btime sol = solve(tridiagprob,Rosenbrock23()) # 188.000 μs (556 allocations: 66.19 KiB)
@btime sol = solve(tridiagprob,TRBDF2()) # 87.800 μs (338 allocations: 40.31 KiB)
@btime sol = solve(tridiagprob,KenCarp47()) # 133.400 μs (482 allocations: 42.16 KiB)

Now, as you scale this problem you also might want to make use of parallelism in the sparse/tridiagonal Jacobian generation (the performance differences will undoubtedly change as well, 27,000x will fall a bi). You can set the type of parallelism in the code generation step, which produces optimized sparse filling functions. Here's an example on this code:
using ModelingToolkit
sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true,parallel=ModelingToolkit.MultithreadedForm())

It's not help on the size of the example problem, but it will be on larger scale problems.
Finally, the other answer brings up a good point that you have a constant Jacobian here. If you have this in your real problem, you'll want to specialize on this property. In DifferentialEquations.jl you can do this by using the DiffEqArrayOperator to explicitly tell the solver that your system can be represented by a linear operator. This opens you up to new solvers, such as Lie Group and Magnus methods, and can be exploited in IMEX and exponential integrators if there's a separate nonlinear portion. However, in your case you just have a linear operator $u' = Au$, so we can make use of the LinearExponential solver which uses optimized Krylov expm techniques.
using ForwardDiff
# Use automatic differentiation to accurately derive the Jacobian since I'm lazy
dx = zeros(10); x = zeros(10)
J = ForwardDiff.jacobian((dx,x)->fun(dx,x,nothing,0.0),dx,x)
A = DiffEqArrayOperator(Tridiagonal(J))
prob = ODEProblem(A,x0,(0.0,5.0))

# Solve with different variations on the Krylov Exponential method
@btime sol = solve(prob,LinearExponential()) # 64.700 μs (82 allocations: 23.27 KiB)
@btime sol = solve(prob,LinearExponential(krylov = :off)) # 68.700 μs (82 allocations: 23.27 KiB)
@btime sol = solve(prob,LinearExponential(krylov = :simple)) # 74.300 μs (77 allocations: 13.47 KiB)

Again at this size you can't be the explicit RK method because it's too small to overcome, but these should scale extremely well as evidenced in some of the benchmarks.
Of course you can save a bit of time by not doing the automatic symbolic generation of Jacobians and such, but I was lazy and wanted to show that the translation and all of these methods could be tried in under 5 minutes. Hopefully this shows a variety of techniques you can employ beyond the basics and gets you started on the path to acceleration.
P.S. I haven't employed the really modern techniques here, like Symbolics.jl's ability to generate code with optimized floating point characteristics via e-graphs, or the automated model order reduction and surrogatization with JuliaSim, but I wanted to keep the answer to just the basics. This is still going to be missing another order of magnitude of course if you factor in all of those extras.

Specifying ode solver options to speed up compute time

2 Answers

Add your own answers!

Ask a Question