For now, the new algorithm hasn’t actually been used to solve any logistical problems, since it would take too much work updating today’s programs to make use of it. But for Rothvoss, that’s beside the point. “It’s about the theoretical understanding of a problem that has fundamental applications,” he said.
I don't see how "it would take to much work updating today's programs". Most domain specific models call out to Gurobi, CPLEX, or FICO solvers for large problems, and open source ones like SCIP for the small ones. There is a standard MPS format where you can run exchange models between all of these solvers, and the formulation of the problem shouldn't change, just the solving approach inside the solver.
Can someone enlighten me? I could see if they are arguing, this will require a new implementation, and if so, there is a ton of benefit the world would see from doing so.
The new algorithm of R&R would need to replace the algorithms at the core of Gurobi, CPlex, etc. These tools are marvels of engineering, extremely complex, results of decades of incremental improvements. If would likely take significant research effort to even figure out a way to incorporate the new discoveries into these engines.
Why would it need to replace them? From the article, they claim they have found a way to reduce the upperbound faster when searching large Integer problems. I don't see how that effects the current searching process. All of these solvers you can enter in an upperbound yourself if you have knowledge of the problem and know a previous solution. So it seems if this is just a programmatic way of reducing the upper bound, it should fit right in with current approaches. What am I missing?
It's a research paper. You can write a theoretical paper and let others apply it practically, which others can figure out the practical aspect and report results of benchmarks, or others can also build on the theory.
This paper only has 2 authors. The other solvers are probably applying technique specific tricks and speedups, and you're working with approximate optimization, it's not that easy to move everything over.
So? I don't get the relevance of the author count.
It's quite easy to go tell other people what they should do with their time.
These researchers are in the business of improving algorithms. Implementing them in large industrial (or open source) code bases in a maintainable way -- and then actually maintaining that code -- is a different skillset, a different set of interestes, and as was pointed out, besides the point.
Either you believe their results, then be grateful. Someone (yoU!) can implement this. Or you don't. In which case, feel free to move on.
Your tone comes off as entitled.
You do realize that the solver companies are in exactly the same boat, right?
And given how much the licenses cost, I'd love a new player to show up and bring them down to a reasonable level.
Since version 8.0.3, SCIP is available under Apache 2.0 License:
So the new player to show up is here. :-)
no they're not. they're in the business of making their customers' problems solve fast and well. That's of course strongly related, but it is _not_ the same. An algorithm may well be (and this is what OP might be hinting at) be more elegant and efficient, but execute worse on actually existing hardware.
You're making a very general point on how algorithm research and software development are two different things, which is of course true. However OP's question is genuine: a lot of research in OR is very practical, and researchers often hack solvers to demonstrate that whatever idea offers a benefit over existing solving techniques. There are no reason to believe that a good new idea like this one couldn't be demonstrated and incorporated into new solvers quickly (especially given the competition).
So the quoted sentence is indeed a bit mysterious. I think it just meant to avoid comment such as "if it's so good why isn't it used in cplex?".
Honestly?
The search for the 'exactly optimal solution' is way overrated
I think you can get a moderately efficient solution using heuristics at 1/10 of the time or less
Not to mention developer time and trying to figure out which constraints make your problem infeasible. Especially as they get more complicated because you want to make everything linear
I agree, especially when considering that a model is also not reality.
However, what folks often do is find a Linear Solution quickly, then optimize on the Integer Solution, which gives you a gap that you can use to choose termination.
The vast majority of the United States power grid (many thousands of power plants) are optimized in auctions every hour for the next day and every 5 minutes on the operating day. Finding the globally optimal solution is pretty important for both fairness and not wasting billions of dollars each year. I'd agree with you for a lot of problems though, but keep in mind there are plenty where they need full optimality or within a tiny percentage from it.
Every time an integer feasible point is found during the iterative process these algorithms use (branch and bound), you get a new upper bound on the global minimum. It’s not clear to me how these dynamically generated upper bounds highly specific to the particular problem relate to the upper bounds of a more general nature that R&R produce.
If it's an upper bound, it should be pretty easy to plug into the existing stuff under the hood in these solvers. Can you provide my insight into how the R&R "Upper bound" is different and "more general in nature"?
I don't think they're talking about a bound for the optimum objective value, but a theoretical upper bound for a covering radius related to a convex body and a lattice. The bound would be useful in a lattice-based algorithm for integer linear programming. I don't think there exists an implementation of a lattice algorithm that is practical for non-toy integer linear programming problems, let alone one that is competitive with commercial ILP solvers.
They prove a new upper bound to a combinatorial quantity that controls the worst-case running time of an algorithm of Dadush, not an upper bound to the optimal value of a given ILP instance.
If they wanted to see their ideas work in practice, they could implement Dadush's algorithm in light of these new bounds, but this would be unlikely to outperform something like CPLEX or Gurobi with all their heuristics and engineering optimizations developed over decades.
Otherwise, and this is the sense of the quoted sentence, they could go deep into the bowels of CPLEX or Gurobi to see if their ideas could yield some new speed-up on top of all the existing tricks, but this is not something that makes sense for the authors to do, though maybe someone else should.
What? Have you ever used a solver before? The actual APIs exposed to the user are very simple interfaces that should allow swapping out the backend regardless of the complexity. The idea a new algorithm—short of something like "updating the solution to adjust to a change in data"—would not require any sort of research to slot in as an implementation for the existing interface.
the interface is simple, but modern solvers apply a ton of heuristics that often dramatically reduce problem size, so a naive implementation of a better algorithm that isn't hooked deeply into the core of an existing ilp solver is likely to be very slow
Why is this exposed to the user? If it isn't exposed to the user, what on earth are you talking about?
From what I gather the parent post is saying that it is easy to make a naive implementation of this improvement, but due to naivety of the implementation it will be slower in practice. Hence it is a lot of work (and thus difficult) to actually put this improvement into practice.
Why would the API expose the heuristics to the user? Because an intelligent user can make minor adjustments and turn certain features on/off to sometimes dramatically increase performance depending on the problem.
The api interface is simple, but the change would impact the code underneath. Since these are branch and bound algorithms, it would really depend on how often the worst runtime complexity case occurred. If it only happened in 2% of use cases, it might not make a huge difference for example.
Gurobi was only founded in 2008. I don't doubt the optimizer was the result of "decades of incremental improvements", but the actual implementation must have been started relatively recently.
It was founded by some of the key people behind CPLEX (another solver, founded in 1987). In fact, one of the cofounders of Gurobi was a cofounder of CPLEX prior. They brought decades of knowledge with them.
Yep. They were also able to catch up as CPLEX was bought out by IBM and I think they typically keeps a pretty small staff after an acquisition.
These solvers get faster every year, how exactly are they supposed to stay the world's fastest if people invent better algorithms all the time that never get implemented by the commercial offerings?
You seem to be confusing problem formulation with the problem solution. It is true there is a standard way to exchange the problem formulation through something like MPS (though it seems AML's like AMPL etc. have taken over). All this format gives you is a standard mathematical formulation of the problem.
However, the solution is something very specific to the individual solver and they have their own data structures, algorithms and heuristic techniques to solve the problem. None of these are interchangeable or public (by design) and you cannot just insert some outside numbers in the middle of the solver process without being part of the solver code and having knowledge of the entire process.
All these solvers use branch and bound to explore the solution space and "fathom" (i.e. eliminate candidate search trees if the lowest possible value for the tree is above an already found solution). The upper bound that the solver calculates via pre-solve heuristics and other techniques does vary from solver to solver. However, they all have a place for "Upper bound", and there are mechanisms in all of these solvers for updating that value in a current solve.
If this paper were a complementally orthogonal implementation from everything that exists in these solvers today, if it can produce a new upper bound, faster than other techniques, it should be fairly plug and play.
I have an undergrad OR degree, and I have been a practitioner for 18 years in LP/MIP problems. So I understand the current capacities of these solvers, and have familiarity with these problems. However, I and am out of my depth trying to understand the specifics of this paper, and would love to be corrected where I am missing something.
What is OR?
Operations Research: https://en.m.wikipedia.org/wiki/Operations_research
Wouldn't the "open source ones like SCIP for the small ones." be public by design?
In many cases you can actually insert outside numbers in the middle of the solver process via callbacks. For example see IncumbentUpdater at https://python-mip.readthedocs.io/en/latest/classes.html
And various C APIs for solvers have other callbacks
It's generally quite limited of course, for the reasons you mentioned.
The math programming languages of AMPL, AIMMS, GAMS...etc are dying in my industry and being replaced by general industry languages like Python/Java + Solver API.
The randomized algorithm that Reis & Rothvoss [1] present at the end of their paper will not be implemented in Gurobi/CPLEX/XPRESS. It remains a fantastic result regardless (see below). But first let me explain.
In terms of theoretical computational complexity, the best algorithms for "integer linear programming" [2] (whether the variables are binary or general integers, as in the case tackled by the paper) are based on lattices. They have the best worst-case big-O complexity. Unfortunately, all current implementations need (1) arbitrary-size rational arithmetic (like provided by gmplib [3]), which is memory hungry and a bit slow in practice, and (2) some LLL-type lattice reduction step [4], which does not take advantage of matrix sparsity. As a result, those algorithms cannot even start tackling problems with matrices larger than 1000x1000, because they typically don't fit in memory... and even if they did, they are prohibitively slow.
In practice instead, integer programming solver are based on branch-and-bound, a type of backtracking algorithm (like used in SAT solving), and at every iteration, they solve a "linear programming" problem (same as the original problem, but all variables are continuous). Each "linear programming" problem could be solved in polynomial time (with algorithms called interior-point methods), but instead they use the simplex method, which is exponential in the worst case!! The reason is that all those linear programming problems to solve are very similar to each other, and the simplex method can take advantage of that in practice. Moreover, all the algorithms involved greatly take advantage of sparsity in any vector or matrix involved. As a result, some people routinely solve integer programming problems with millions of variables within days or even hours.
As you can see, the solver implementers are not chasing the absolute best theoretical complexity. One could say that the theory and practice of discrete optimization has somewhat diverged.
That said, the Reis & Rothvoss paper [1] is deep mathematical work. It is extremely impressive on its own to anyone with an interest in discrete maths. It settles a 10-year-old conjecture by Dadush (the length of time a conjecture remains open is a rough heuristic many mathematicians use to evaluate how hard it is to (dis)prove). Last november, it was presented at FOCS, one of the two top conferences in computer science theory (together with STOC). Direct practical applicability is besides the point; the authors will readily confess as much if asked in an informal setting (they will of course insist otherwise in grant applications -- that's part of the game). It does not mean it is useless: In addition to the work having tremendous value in itself because it advances our mathematical knowledge, one can imagine that practical algorithms based on its ideas could push the state-of-the-art of solvers, a few generations of researchers down the line.
At the end of the day, all those algorithms are exponential in the worst case anyways. In theory, one would try to slightly shrink the polynomial in the exponent of the worst-case complexity. Instead, practitioners typically want to solve one big optimization problems, not family of problems of increasing size n. They don't care about the growth rate of the solving time trend line. They care about solving their one big instance, which typically has structure that does not make it a "worst-case" instance for its size. This leads to distinct engineering decisions.
[1] https://arxiv.org/abs/2303.14605
[2] min { c^T x : A x >= b, x in R^n, some components of x in Z }
[3] https://gmplib.org/
[4] https://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1982f/art.p...
Thanks for these resources and comments.
Would say that the following is a good summary? -> This is an important theoretical result, but most real-world problems are far from worst case scenarios, therefore improving the worst case currently has little practical use.
This statement is probably mostly correct, but I think that in one way it could be misleading: I would not want to imply that real-world problem instances are somehow easier than the worst-case, in terms of computational complexity. They still very much exhibit exponential increase in computational cost as you scale them up.
Instead, most read-world instances have structure. Some of that structure is well understood (for example, 99% of optimization problems involve extremely sparse matrices), some is not. But sometimes, we can exploit structure even without understanding it fully (some algorithmic techniques work wonder on some instances, and we don't fully know why).
It could be argued that by exploiting structure, it is the constant factor in the big-O computational complexity that gets dramatically decreased. If that is the case, the theory and practice do not really contradict each other. It is just that in practice, we are willing to accept a larger exponent in exchange for a smaller constant factor. Asymptotically it is a losing bargain. But for a given instance, it could be extremely beneficial.
No they're saying theoretical improvements does not directly lead to practical, because theory and practice have diverged due to how computers work. Instead, theoretical will most likely lead to indirect gains, as the techniques used will result in the next-generation of practical improvements.
Thanks for your information. I think it really bridge the gap between the people who are interested in this algorithm and MILP "users". I have two more questions.
1. Usually we deal with models with both integer and continuous variables (MILP). Conceptually B&B tackles ILP and MILP in similar ways. Is there any difficulty for lattice based method to be extended to solve MILP?
2. How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?
I don't think that continuous variables are an issue. Even when all the explicit variables are integer, we have implicit continuous variables as soon as we have an inequality: the slack of that inequality. There is probably some linear algebra trick one can use to transform any problem into a form that is convenient for lattice-based algorithms.
Very unlikely in the next 5 years. Beyond that, they could be the next small revolution, maybe. "Cutting planes" were another tool that had some good theory but were thought to be impractical. Then 25 years ago, people found a way to make them work, and they were a huge boost to solvers. We may be due for another big jump.
Lattice-based method are already effective in some niches. Branch-and-bound solvers are horrible at cryptography and number theory problems (those problems are bad fits for floating-point arithmetic in general), and lattice-based methods shine there. There are also some rare dense optimization problems that benefit from lattice-based methods (typically, one would use lattices in a pre-processing step, then pass the reformulated problem to a regular branch-and-bound solver [1]).
[1] https://link.springer.com/chapter/10.1007/3-540-48777-8_1
I honestly think that's just journalism for "no one implemented it in production yet". Which is not surprising, for an algorithm less than a year old. I don't think it's worth expanding and explaining "too much work".
That being said, sometimes if an algorithm isn't the fastest but it's fast and cheap enough, it is hard to argue to spend money on replacing it. Which just means that will happen later.
Furthermore, you might not even see improvements until you implement an optimized verision of a new algorithm. Even if big O notation says it scales better... The old version may be optimized to use memory efficiently, to make good use of SIMD or other low level techniques. Sometimes getting an optimized implementation of a new algorithm takes time.
As other commenters here have mentioned, in discrete optimization there can be a very large gap between efficienct in theory and efficient in practice, and it is very likely that this is the case here too. Linear programming for example is known to be solvable in polynomial time, but the algorithm which does so (the ellipsoid method) is not used in practice because it is prohibitively slow. Instead, people use the (exponential time worst-case) simplex method.
Modern ILP solvers have a huge number of heuristics and engineering in them, and it is really difficult to beat them in practice after they have optimized their branch-and-cut codes for 30 years. As the top comment mentions, the software improvements alone are estimated to have improved the solving time of practical ILP instances by a factor of 870'000 since 1990.
I thought there were other interior point methods now beside the ellipsoid algorithm that performed better. Some of these are useful in convex nonlinear programming, and I believe one is used (with a code generator from Stanford to make it faster) in the guidance software for landing the Falcon 9 first stage. There, as the stage descends it repeatedly solves the problem of reaching the landing point at zero velocity with minimum fuel use, subject to various constraints.
Yes, there are other interior point methods besides the ellipsoid method, and virtually all of them perform better for linear programming. Sometimes, the solvers will use these at the root node for very large models, as they can beat out the simplex algorithm. However, I am unsure if any of them has been proven to run in polynomial time, and if so, if the proof is significantly different from the proof for the ellipsoid method. The point I was mainly trying to make is that there can be a significant gap between practice and theory for ILP. Even 40 years after LP was proven to be polytime solvable, simplex remains the most widely used method, and it is very hard for other methods to catch up.
Karmarkar's algorithm, for example, has been proved to run in polynomial time.
https://en.wikipedia.org/wiki/Karmarkar%27s_algorithm
It was also (in)famous as an algorithm that was patented (the patent expired in 2006).
The open source solvers are a mess of 30 years of PhD students random contributions. It's amazing they work at all. If you can possibly avoid actually implementing anything using them you will.
Can others chime in? To what extent is the above this a fair summary?
I would hope there have been some code reorganizations and maybe even rewrites? Perhaps as the underlying theory advances? Perhaps as the ecosystem of tools borrows from each other?
But I don’t know the state of these solvers. In many ways, the above narrative wouldn’t surprise me. I can be rather harsh (but justifiably so I feel) when evaluating scientific tooling. I worked at one national lab with a “prestigious” reputation that nonetheless seemed to be incapable of blending competent software architecture with its domain area. I’m not saying any ideal solution was reachable; the problem arguably had to do with an overzealous scope combined with budgetary limits and cultural disconnects. Many good people working with a flawed plan seems to me.
I think what this work does is establish a new, and lower, upper bound on the number of points that need to be explored in order to find an exact solution.
From some of your other replies it looks to me like you're confusing that with an improved bound on the value of the solution itself.
It's a little unclear to me whether this is even a new solution algorithm, or just a better bound on the run time of an existing algorithm.
I will say I agree with you that I don't buy the reason given for the lack of practical impact. If there was a breakthrough in practical solver performance people would migrate to a new solver over time. There's either no practical impact of this work, or the follow on work to turn the mathematical insights here into a working solver just haven't been done yet.
I think some peeps are not reading this sentence the way you meant it to be read.
It seems to me you meant "I don't know what part of this research makes it especially hard to integrate into current solvers (and I would like to understand) ".
But people seem to be interpreting "why didn't they just integrate this into existing solvers? Should be easy (what lazy authors)".
Just trying to clear up some misunderstanding.
Maybe what they mean is that, despite an asymptotic advantage, the new algorithm performs worse for many use cases than the older ones. This might be due to the many heuristics that solvers apply to make problems tractable as others have mentioned, as well as good old software engineering optimization.
So the work that's required is for someone to take this algorithm and implement it in a way that levels the playing field with the older ones.