As a proof-of-concept, I did a quick and dirty C++ implementation of bvp4c.
One of the main purposes was to test the Sundials/Kinsol nonlinear algebraic
solver for this application. In that regard, I would say my experience has been
mixed. My impression is that, even with line search enabled, it is not as effective
at coping with poor initial guesses as MATLAB bvp4c.
This implementation computes a dense jacobian by finite difference (FD). I had hoped to
try FD computation of a sparse jacobian but have not done that yet. But even this
dense jacobian code has better performance than I expected.
The main limitation (that I know of) of my little test code is that it doesn't compute
discretization errors and refine the mesh; it simply computes a solution with the
input mesh.
If you have any interest in looking at the code, it is here:
Bill Greene