Researchers have found a faster way to do integer linear programming

For now, the new algorithm hasn’t actually been used to solve any logistical problems, since it would take too much work updating today’s programs to make use of it. But for Rothvoss, that’s beside the point. “It’s about the theoretical understanding of a problem that has fundamental applications,” he said.

I don't see how "it would take to much work updating today's programs". Most domain specific models call out to Gurobi, CPLEX, or FICO solvers for large problems, and open source ones like SCIP for the small ones. There is a standard MPS format where you can run exchange models between all of these solvers, and the formulation of the problem shouldn't change, just the solving approach inside the solver.

Can someone enlighten me? I could see if they are arguing, this will require a new implementation, and if so, there is a ton of benefit the world would see from doing so.

The new algorithm of R&R would need to replace the algorithms at the core of Gurobi, CPlex, etc. These tools are marvels of engineering, extremely complex, results of decades of incremental improvements. If would likely take significant research effort to even figure out a way to incorporate the new discoveries into these engines.

Why would it need to replace them? From the article, they claim they have found a way to reduce the upperbound faster when searching large Integer problems. I don't see how that effects the current searching process. All of these solvers you can enter in an upperbound yourself if you have knowledge of the problem and know a previous solution. So it seems if this is just a programmatic way of reducing the upper bound, it should fit right in with current approaches. What am I missing?

It's a research paper. You can write a theoretical paper and let others apply it practically, which others can figure out the practical aspect and report results of benchmarks, or others can also build on the theory.

This paper only has 2 authors. The other solvers are probably applying technique specific tricks and speedups, and you're working with approximate optimization, it's not that easy to move everything over.

This paper only has 2 authors.

So? I don't get the relevance of the author count.

It's quite easy to go tell other people what they should do with their time.

These researchers are in the business of improving algorithms. Implementing them in large industrial (or open source) code bases in a maintainable way -- and then actually maintaining that code -- is a different skillset, a different set of interestes, and as was pointed out, besides the point.

Either you believe their results, then be grateful. Someone (yoU!) can implement this. Or you don't. In which case, feel free to move on.

Your tone comes off as entitled.

business of improving algorithms

You do realize that the solver companies are in exactly the same boat, right?

And given how much the licenses cost, I'd love a new player to show up and bring them down to a reasonable level.

Since version 8.0.3, SCIP is available under Apache 2.0 License:

https://www.scipopt.org/index.php#news

So the new player to show up is here. :-)

no they're not. they're in the business of making their customers' problems solve fast and well. That's of course strongly related, but it is _not_ the same. An algorithm may well be (and this is what OP might be hinting at) be more elegant and efficient, but execute worse on actually existing hardware.

Implementing them in large industrial (or open source) code bases in a maintainable way -- and then actually maintaining that code -- is a different skillset, a different set of interestes,

You're making a very general point on how algorithm research and software development are two different things, which is of course true. However OP's question is genuine: a lot of research in OR is very practical, and researchers often hack solvers to demonstrate that whatever idea offers a benefit over existing solving techniques. There are no reason to believe that a good new idea like this one couldn't be demonstrated and incorporated into new solvers quickly (especially given the competition).

So the quoted sentence is indeed a bit mysterious. I think it just meant to avoid comment such as "if it's so good why isn't it used in cplex?".

Honestly?

The search for the 'exactly optimal solution' is way overrated

I think you can get a moderately efficient solution using heuristics at 1/10 of the time or less

Not to mention developer time and trying to figure out which constraints make your problem infeasible. Especially as they get more complicated because you want to make everything linear

I agree, especially when considering that a model is also not reality.

However, what folks often do is find a Linear Solution quickly, then optimize on the Integer Solution, which gives you a gap that you can use to choose termination.

The vast majority of the United States power grid (many thousands of power plants) are optimized in auctions every hour for the next day and every 5 minutes on the operating day. Finding the globally optimal solution is pretty important for both fairness and not wasting billions of dollars each year. I'd agree with you for a lot of problems though, but keep in mind there are plenty where they need full optimality or within a tiny percentage from it.

Every time an integer feasible point is found during the iterative process these algorithms use (branch and bound), you get a new upper bound on the global minimum. It’s not clear to me how these dynamically generated upper bounds highly specific to the particular problem relate to the upper bounds of a more general nature that R&R produce.

upper bounds of a more general nature that R&R produce

If it's an upper bound, it should be pretty easy to plug into the existing stuff under the hood in these solvers. Can you provide my insight into how the R&R "Upper bound" is different and "more general in nature"?

I don't think they're talking about a bound for the optimum objective value, but a theoretical upper bound for a covering radius related to a convex body and a lattice. The bound would be useful in a lattice-based algorithm for integer linear programming. I don't think there exists an implementation of a lattice algorithm that is practical for non-toy integer linear programming problems, let alone one that is competitive with commercial ILP solvers.

They prove a new upper bound to a combinatorial quantity that controls the worst-case running time of an algorithm of Dadush, not an upper bound to the optimal value of a given ILP instance.

If they wanted to see their ideas work in practice, they could implement Dadush's algorithm in light of these new bounds, but this would be unlikely to outperform something like CPLEX or Gurobi with all their heuristics and engineering optimizations developed over decades.

Otherwise, and this is the sense of the quoted sentence, they could go deep into the bowels of CPLEX or Gurobi to see if their ideas could yield some new speed-up on top of all the existing tricks, but this is not something that makes sense for the authors to do, though maybe someone else should.

If would likely take significant research effort to even figure out a way to incorporate the new discoveries into these engines.

What? Have you ever used a solver before? The actual APIs exposed to the user are very simple interfaces that should allow swapping out the backend regardless of the complexity. The idea a new algorithm—short of something like "updating the solution to adjust to a change in data"—would not require any sort of research to slot in as an implementation for the existing interface.

the interface is simple, but modern solvers apply a ton of heuristics that often dramatically reduce problem size, so a naive implementation of a better algorithm that isn't hooked deeply into the core of an existing ilp solver is likely to be very slow

Why is this exposed to the user? If it isn't exposed to the user, what on earth are you talking about?

From what I gather the parent post is saying that it is easy to make a naive implementation of this improvement, but due to naivety of the implementation it will be slower in practice. Hence it is a lot of work (and thus difficult) to actually put this improvement into practice.

Why would the API expose the heuristics to the user? Because an intelligent user can make minor adjustments and turn certain features on/off to sometimes dramatically increase performance depending on the problem.

The api interface is simple, but the change would impact the code underneath. Since these are branch and bound algorithms, it would really depend on how often the worst runtime complexity case occurred. If it only happened in 2% of use cases, it might not make a huge difference for example.

results of decades of incremental improvements.

Gurobi was only founded in 2008. I don't doubt the optimizer was the result of "decades of incremental improvements", but the actual implementation must have been started relatively recently.

It was founded by some of the key people behind CPLEX (another solver, founded in 1987). In fact, one of the cofounders of Gurobi was a cofounder of CPLEX prior. They brought decades of knowledge with them.

Yep. They were also able to catch up as CPLEX was bought out by IBM and I think they typically keeps a pretty small staff after an acquisition.

These solvers get faster every year, how exactly are they supposed to stay the world's fastest if people invent better algorithms all the time that never get implemented by the commercial offerings?

You seem to be confusing problem formulation with the problem solution. It is true there is a standard way to exchange the problem formulation through something like MPS (though it seems AML's like AMPL etc. have taken over). All this format gives you is a standard mathematical formulation of the problem.

However, the solution is something very specific to the individual solver and they have their own data structures, algorithms and heuristic techniques to solve the problem. None of these are interchangeable or public (by design) and you cannot just insert some outside numbers in the middle of the solver process without being part of the solver code and having knowledge of the entire process.

All these solvers use branch and bound to explore the solution space and "fathom" (i.e. eliminate candidate search trees if the lowest possible value for the tree is above an already found solution). The upper bound that the solver calculates via pre-solve heuristics and other techniques does vary from solver to solver. However, they all have a place for "Upper bound", and there are mechanisms in all of these solvers for updating that value in a current solve.

If this paper were a complementally orthogonal implementation from everything that exists in these solvers today, if it can produce a new upper bound, faster than other techniques, it should be fairly plug and play.

I have an undergrad OR degree, and I have been a practitioner for 18 years in LP/MIP problems. So I understand the current capacities of these solvers, and have familiarity with these problems. However, I and am out of my depth trying to understand the specifics of this paper, and would love to be corrected where I am missing something.

What is OR?

Operations Research: https://en.m.wikipedia.org/wiki/Operations_research

Wouldn't the "open source ones like SCIP for the small ones." be public by design?

In many cases you can actually insert outside numbers in the middle of the solver process via callbacks. For example see IncumbentUpdater at https://python-mip.readthedocs.io/en/latest/classes.html

And various C APIs for solvers have other callbacks

It's generally quite limited of course, for the reasons you mentioned.

The math programming languages of AMPL, AIMMS, GAMS...etc are dying in my industry and being replaced by general industry languages like Python/Java + Solver API.

The randomized algorithm that Reis & Rothvoss [1] present at the end of their paper will not be implemented in Gurobi/CPLEX/XPRESS. It remains a fantastic result regardless (see below). But first let me explain.

In terms of theoretical computational complexity, the best algorithms for "integer linear programming" [2] (whether the variables are binary or general integers, as in the case tackled by the paper) are based on lattices. They have the best worst-case big-O complexity. Unfortunately, all current implementations need (1) arbitrary-size rational arithmetic (like provided by gmplib [3]), which is memory hungry and a bit slow in practice, and (2) some LLL-type lattice reduction step [4], which does not take advantage of matrix sparsity. As a result, those algorithms cannot even start tackling problems with matrices larger than 1000x1000, because they typically don't fit in memory... and even if they did, they are prohibitively slow.

In practice instead, integer programming solver are based on branch-and-bound, a type of backtracking algorithm (like used in SAT solving), and at every iteration, they solve a "linear programming" problem (same as the original problem, but all variables are continuous). Each "linear programming" problem could be solved in polynomial time (with algorithms called interior-point methods), but instead they use the simplex method, which is exponential in the worst case!! The reason is that all those linear programming problems to solve are very similar to each other, and the simplex method can take advantage of that in practice. Moreover, all the algorithms involved greatly take advantage of sparsity in any vector or matrix involved. As a result, some people routinely solve integer programming problems with millions of variables within days or even hours.

As you can see, the solver implementers are not chasing the absolute best theoretical complexity. One could say that the theory and practice of discrete optimization has somewhat diverged.

That said, the Reis & Rothvoss paper [1] is deep mathematical work. It is extremely impressive on its own to anyone with an interest in discrete maths. It settles a 10-year-old conjecture by Dadush (the length of time a conjecture remains open is a rough heuristic many mathematicians use to evaluate how hard it is to (dis)prove). Last november, it was presented at FOCS, one of the two top conferences in computer science theory (together with STOC). Direct practical applicability is besides the point; the authors will readily confess as much if asked in an informal setting (they will of course insist otherwise in grant applications -- that's part of the game). It does not mean it is useless: In addition to the work having tremendous value in itself because it advances our mathematical knowledge, one can imagine that practical algorithms based on its ideas could push the state-of-the-art of solvers, a few generations of researchers down the line.

At the end of the day, all those algorithms are exponential in the worst case anyways. In theory, one would try to slightly shrink the polynomial in the exponent of the worst-case complexity. Instead, practitioners typically want to solve one big optimization problems, not family of problems of increasing size n. They don't care about the growth rate of the solving time trend line. They care about solving their one big instance, which typically has structure that does not make it a "worst-case" instance for its size. This leads to distinct engineering decisions.

[1] https://arxiv.org/abs/2303.14605

[2] min { c^T x : A x >= b, x in R^n, some components of x in Z }

[3] https://gmplib.org/

[4] https://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1982f/art.p...

Thanks for these resources and comments.

Would say that the following is a good summary? -> This is an important theoretical result, but most real-world problems are far from worst case scenarios, therefore improving the worst case currently has little practical use.

most real-world problems are far from worst case scenarios, therefore improving the worst case currently has little practical use.

This statement is probably mostly correct, but I think that in one way it could be misleading: I would not want to imply that real-world problem instances are somehow easier than the worst-case, in terms of computational complexity. They still very much exhibit exponential increase in computational cost as you scale them up.

Instead, most read-world instances have structure. Some of that structure is well understood (for example, 99% of optimization problems involve extremely sparse matrices), some is not. But sometimes, we can exploit structure even without understanding it fully (some algorithmic techniques work wonder on some instances, and we don't fully know why).

It could be argued that by exploiting structure, it is the constant factor in the big-O computational complexity that gets dramatically decreased. If that is the case, the theory and practice do not really contradict each other. It is just that in practice, we are willing to accept a larger exponent in exchange for a smaller constant factor. Asymptotically it is a losing bargain. But for a given instance, it could be extremely beneficial.

No they're saying theoretical improvements does not directly lead to practical, because theory and practice have diverged due to how computers work. Instead, theoretical will most likely lead to indirect gains, as the techniques used will result in the next-generation of practical improvements.

Thanks for your information. I think it really bridge the gap between the people who are interested in this algorithm and MILP "users". I have two more questions.

1. Usually we deal with models with both integer and continuous variables (MILP). Conceptually B&B tackles ILP and MILP in similar ways. Is there any difficulty for lattice based method to be extended to solve MILP?

2. How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?

Is there any difficulty for lattice based method to be extended to solve MILP?

I don't think that continuous variables are an issue. Even when all the explicit variables are integer, we have implicit continuous variables as soon as we have an inequality: the slack of that inequality. There is probably some linear algebra trick one can use to transform any problem into a form that is convenient for lattice-based algorithms.

How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?

Very unlikely in the next 5 years. Beyond that, they could be the next small revolution, maybe. "Cutting planes" were another tool that had some good theory but were thought to be impractical. Then 25 years ago, people found a way to make them work, and they were a huge boost to solvers. We may be due for another big jump.

Lattice-based method are already effective in some niches. Branch-and-bound solvers are horrible at cryptography and number theory problems (those problems are bad fits for floating-point arithmetic in general), and lattice-based methods shine there. There are also some rare dense optimization problems that benefit from lattice-based methods (typically, one would use lattices in a pre-processing step, then pass the reformulated problem to a regular branch-and-bound solver [1]).

[1] https://link.springer.com/chapter/10.1007/3-540-48777-8_1

I honestly think that's just journalism for "no one implemented it in production yet". Which is not surprising, for an algorithm less than a year old. I don't think it's worth expanding and explaining "too much work".

That being said, sometimes if an algorithm isn't the fastest but it's fast and cheap enough, it is hard to argue to spend money on replacing it. Which just means that will happen later.

Furthermore, you might not even see improvements until you implement an optimized verision of a new algorithm. Even if big O notation says it scales better... The old version may be optimized to use memory efficiently, to make good use of SIMD or other low level techniques. Sometimes getting an optimized implementation of a new algorithm takes time.

As other commenters here have mentioned, in discrete optimization there can be a very large gap between efficienct in theory and efficient in practice, and it is very likely that this is the case here too. Linear programming for example is known to be solvable in polynomial time, but the algorithm which does so (the ellipsoid method) is not used in practice because it is prohibitively slow. Instead, people use the (exponential time worst-case) simplex method.

Modern ILP solvers have a huge number of heuristics and engineering in them, and it is really difficult to beat them in practice after they have optimized their branch-and-cut codes for 30 years. As the top comment mentions, the software improvements alone are estimated to have improved the solving time of practical ILP instances by a factor of 870'000 since 1990.

I thought there were other interior point methods now beside the ellipsoid algorithm that performed better. Some of these are useful in convex nonlinear programming, and I believe one is used (with a code generator from Stanford to make it faster) in the guidance software for landing the Falcon 9 first stage. There, as the stage descends it repeatedly solves the problem of reaching the landing point at zero velocity with minimum fuel use, subject to various constraints.

Yes, there are other interior point methods besides the ellipsoid method, and virtually all of them perform better for linear programming. Sometimes, the solvers will use these at the root node for very large models, as they can beat out the simplex algorithm. However, I am unsure if any of them has been proven to run in polynomial time, and if so, if the proof is significantly different from the proof for the ellipsoid method. The point I was mainly trying to make is that there can be a significant gap between practice and theory for ILP. Even 40 years after LP was proven to be polytime solvable, simplex remains the most widely used method, and it is very hard for other methods to catch up.

Karmarkar's algorithm, for example, has been proved to run in polynomial time.

https://en.wikipedia.org/wiki/Karmarkar%27s_algorithm

It was also (in)famous as an algorithm that was patented (the patent expired in 2006).

The open source solvers are a mess of 30 years of PhD students random contributions. It's amazing they work at all. If you can possibly avoid actually implementing anything using them you will.

Can others chime in? To what extent is the above this a fair summary?

I would hope there have been some code reorganizations and maybe even rewrites? Perhaps as the underlying theory advances? Perhaps as the ecosystem of tools borrows from each other?

But I don’t know the state of these solvers. In many ways, the above narrative wouldn’t surprise me. I can be rather harsh (but justifiably so I feel) when evaluating scientific tooling. I worked at one national lab with a “prestigious” reputation that nonetheless seemed to be incapable of blending competent software architecture with its domain area. I’m not saying any ideal solution was reachable; the problem arguably had to do with an overzealous scope combined with budgetary limits and cultural disconnects. Many good people working with a flawed plan seems to me.

I think what this work does is establish a new, and lower, upper bound on the number of points that need to be explored in order to find an exact solution.

From some of your other replies it looks to me like you're confusing that with an improved bound on the value of the solution itself.

It's a little unclear to me whether this is even a new solution algorithm, or just a better bound on the run time of an existing algorithm.

I will say I agree with you that I don't buy the reason given for the lack of practical impact. If there was a breakthrough in practical solver performance people would migrate to a new solver over time. There's either no practical impact of this work, or the follow on work to turn the mathematical insights here into a working solver just haven't been done yet.

I don't see how "it would take to much work updating today's programs".

I think some peeps are not reading this sentence the way you meant it to be read.

It seems to me you meant "I don't know what part of this research makes it especially hard to integrate into current solvers (and I would like to understand) ".

But people seem to be interpreting "why didn't they just integrate this into existing solvers? Should be easy (what lazy authors)".

Just trying to clear up some misunderstanding.

Maybe what they mean is that, despite an asymptotic advantage, the new algorithm performs worse for many use cases than the older ones. This might be due to the many heuristics that solvers apply to make problems tractable as others have mentioned, as well as good old software engineering optimization.

So the work that's required is for someone to take this algorithm and implement it in a way that levels the playing field with the older ones.

Software engineers interested in ML/algorithms should learn about linear programming.

It's surprising how many problems can be formulated as linear optimization.

For example, in college I was talking to my Industrial Engineer friend about the average minimum number of swaps required to place billiards balls in an acceptable starting position in the rack (triangle). We both happened to write programs that used monte-carlo sampling to solve it - but my solution did BFS on the state space of a graph, and his used linear programming (which was _probably_ more efficient)

ILP is NP-complete.

Yes? We do manage to solve ILP problems in practice quite nicely.

In fact, most NP problems that you come across in practice are relatively tractable for most practical instances.

Eg for the knapsack problem you have to actually work very hard to get a hard instance in the first place.

That's not correct.

First of all, you can't solve a neoclassical economy using LP, because equilibrium constraints can only be represented as complementarity constraints. You would have to give up on some aspects, like dynamic prices.

The linear complementarity problem in itself is NP hard. So you're screwed from the get go, because your problems are now LPCC problems. Good luck finding an LPCC solver. I can confirm that an open source QPCC solver exists though, which should be even slower.

Next is the fact that if you wanted to build a neoclassical economy model, only global optimization will do.

This means that you need to simulate every time step in one large LPCC model, instead of using a finite horizon. Due to the perfect information assumption, you must know about the state of every person on the planet. You're going to need millions of variables due to simple combinatorial explosion.

It's kind of startling how these assumptions, which are supposed to make analytical solutions tractable by the way, also make non-analytical solutions literal hell.

And before you say that prices can be determined iteratively, as I mentioned, you would run into the problem that future prices are unknown to you, so how are you going to plug them into the second time step? The very thing you want to calculate depends on it's future value.

Economics is a weird science, where experienced reality works much better than the theory.

Computational economics is a relatively new field where intelligent agents are used with lots of runs instead of general optimization solvers I believe. Pretty nifty. One of my colleagues publishes a good bit on it.

Wassily Leontief and his Nobel Prize would like to have a chat with your colleague.

Can you be more specific?

Huh? Are you replying to the wrong comment? I never made any claims about 'solving a neoclassical economy'.

I'm not quite sure who cares about solving a neoclassical economic model like that?

As you indirectly suggest, neoclassical assumption of the type you suggested are not computationally tractable. So the kind of computations real economic agents actually do are likely to be different. (Whether that flavour of neoclassical economics is still useful after taking this caveat into account, is a different question.)

In any case: yes, not all NP-hard or NP-complete problems are easy to solve in practice. Even worse, many problems widely believed to be neither NP-hard nor NP-complete, like factoring integers or computing discrete logarithms, are also hard for many practical instances. (And they have to be, if cryptography is supposed to work.)

It was a response to

It's surprising how many problems can be formulated as linear optimization.

i.e., all problems in NP (which is most problems you're likely to encounter on a day-to-day basis) can be solved with ILP, and many of them can be solved or well-approximated quickly.

it's not. it's np-hard. the easiest proof is that the best known algorithm is greater than O(2^N)

0/1 ILP is NP-hard and the trivial algorithm takes O(2^N), and it's also in NP.

right, but tfa was about the general case where the fancy new algorithm is log(n)^n

Just because it is NP-hard in the worst-case doesn't mean it is not practical. As can be seen in the many theorems under which conditions the regular polynomial-time LP algorithm provides an integer solution.

I foresee a future where industrial engineering and CS are combined into some super-degree. There is currently a surprising amount of overlap in the OR side of things, but I'm shocked by how few IE grads can program their way out of a box. It's a shame, really.

CS already is the super-degree.

How so?

It qualifies you for an opinion on any subject.

A CS degree also qualifies you for on-the-job training in writing code, that odious task that your professors find trivial but somehow are also terrible at it.

We just don't have time. Incentives are elsewhere. Any time devoted to writing good code for a paper is time we cannot use to work on the next paper, (shudder) grant application, or a plethora of other things that we are either forced or incentivized to do.

I miss coding from when I was in a more junior stage of my career and could afford time for it, and I think my fellow professors mostly feel the same, I don't think many would dismiss it as trivial or odious.

I’m inferring “odious” from the priority that is applied to it. Maybe “irrelevant” is better?

But when those junior engineers hit my company, they can do homework problems and that’s about it. “CS fundamentals” aren’t useful when you can’t quit vi or debug a regex. They get to be useful 2-3 years later, after the engineer has shaken off being a student.

Operations Research is basically Industrial Engineering + Mathematical Optimization + programming familiarity. It's super useful.

I mean, it helped win WW2, so I think its utility is already demonstrated. :)

Thanks for the comment. I was thinking more about linear programming and related techniques that mostly came about after the war with Dantzig and when computers could be utilized (I know Kantorovich independently also developed the technique before the war). I went ahead and skimmed some articles on OR in WW2. Cool stuff. Thanks for expanding my knowledge.

but I'm shocked by how few IE grads can program their way out of a box. It's a shame, really.

These days, you could replace the "IE" in your sentence by any of many, many disciplines and still be correct.

As much as mathematicians will hate to hear this, CS is a new and more tangible/practical way to do maths and should therefore hold a spot in a general education as central as maths has in the last few centuries.

I view mathematics (as in, proving theorems) as one of the professions that's most likely to succumb to automation. We like to think there's some mystical human intuition involved, but that's just us putting things the brain isn't all that good at on a high pedestal.

One of my favourite courses in grad school was approximation algorithms and it involved reductions to LP. Lots of fun, can recommend.

Do you have a link to some materials to help get me started? I did an optimization/ILP MOOC once and that was indeed a lot of fun.

https://people.seas.harvard.edu/~cs224/fall14/lec.html

In particular, seems like lectures 9-11 have LP content.

A lot of polynomial time algorithms for combinatorial optimization problems can be interpreted as primal dual algorithms for the corresponding LPs, e.g. mst, matching(bipartite or general graph), network flow, matroid intersection, submodular flow. The extreme point solutions of some LPs also have interesting properties that you can exploit to design approximation algorithms for NP-complete problems. For example, you can prove that there is always a variable with value at least half in an extreme point solution of the steiner forest problem, so you can just iteratively round a variable and resolve the LP to get a 2-approximation. When I was in grad school that was the only 2-approximation algorithm for this problem. Another interesting thing is that you can solve LPs with exponentially many constraints as long as you have a polynomial time separation oracle.

About the travelling salesperson problem, below is a quote from the latest Sapolsky's book Determined: A Science of Life without Free Will. I am not sure how relevant this is for software developers, but still fascinating:

"An ant forages for food, checking eight different places. Little ant legs get tired, and ideally the ant visits each site only once, and in the shortest possible path of the 5,040 possible ones (i.e., seven factorial). This is a version of the famed “traveling salesman problem,” which has kept mathematicians busy for centuries, fruitlessly searching for a general solution. One strategy for solving the problem is with brute force— examine every possible route, compare them all, and pick the best one. This takes a ton of work and computational power— by the time you’re up to ten places to visit, there are more than 360,000 possible ways to do it, more than 80 billion with fifteen places to visit. Impossible. But take the roughly ten thousand ants in a typical colony, set them loose on the eight- feeding- site version, and they’ll come up with something close to the optimal solution out of the 5,040 possibilities in a fraction of the time it would take you to brute- force it, with no ant knowing anything more than the path that it took plus two rules (which we’ll get to). This works so well that computer scientists can solve problems like this with “virtual ants,” making use of what is now known as swarm intelligence."

There's been more than a few of these "nature solves NP-hard problems quickly!" kinds of stories, but usually, when one digs deeper, the answer is "nature finds local optima for NP-hard problems quickly!" and the standard response is "so does pretty trivial computer algorithms."

In the case of TSP, when you're trying to minimize a TSP with a Euclidean metric (i.e., each node has fixed coordinates, and the cost of the path is the Euclidean distance between these two points), then we can actually give you a polynomial-time algorithm to find a path within a factor ε of the optimal solution (albeit exponential in ε).

https://scottaaronson.blog/?p=266

""" I went to the hardware store, bought some glass plates, liquid soap, etc., and found that, while Nature does often find a minimum Steiner tree with 4 or 5 pegs, it tends to get stuck at local optima with larger numbers of pegs. """

:-) Well, nature also makes you, and you solve problems? So by transitivity ...

"Did he try jiggling it a bit, and then less and less and less?"

( Annealing /s )

The Evolutionary Computation Bestiary [1] list a wide variety of animal behavior inspired heuristics.

The foreword includes this great disclaimer: "While we personally believe that the literature could do with more mathematics and less marsupials, and that we, as a community, should grow past this metaphor-rich phase in our field’s history (a bit like chemistry outgrew alchemy), please note that this list makes no claims about the scientific quality of the papers listed."

[1]: https://fcampelo.github.io/EC-Bestiary/

The entire field of metaheuristics is in dire need of a shakeup. Many of the newer publications are not actually novel [0, 1, 2, 3, 4, 5], the metaphors used to describe these methods only disguise their inner workings and similarities and differences to existing approaches and shouldn't justify their publication [6, 7]. The set of benchmarks used to verify the excellent performance of these methods is small and biased [8, 9]. The metaphors don't match the given algorithms [10], the given algorithms don't match the implementation [11] and the results don't match the implementation [12].

It's junk science with the goal of increasing the authors citation count. One of the most prolific authors of papers on "bioinspired metaheuristics" (Seyedali Mirjalili) manages to publish several dozens of papers every year, some gathering thousands if not tens of thousands of citations.

[0]: https://doi.org/10.4018/jamc.2010040104

[1]: https://doi.org/10.1016/j.ins.2010.12.006

[2]: https://doi.org/10.1016/j.ins.2014.01.026

[3]: https://doi.org/10.1007/s11721-019-00165-y

[4]: https://doi.org/10.1007/978-3-030-60376-2_10

[5]: https://doi.org/10.1016/j.cor.2022.105747

[6]: https://doi.org/10.1111/itor.12001

[7]: https://doi.org/10.1007/s11721-021-00202-9

[8]: https://doi.org/10.1038/s42256-022-00579-0

[9]: https://doi.org/10.48550/arXiv.2301.01984

[10]: https://doi.org/10.1007/s11047-012-9322-0

[11]: https://doi.org/10.1016/j.eswa.2021.116029

[12]: https://doi.org/10.1111/itor.12443

If you try to make your path close to a circle, it’s obviously not guaranteed to be optimal, but it’ll probably be close enough for most small practical applications

You can also just use the Christofides-Serdyukov algorithm. It's fast and it actually has a performance guarantee (it always produces a solution that is at most 1.5 times the length of the optimum).

It's noteworthy that you are describing one of the many ways to do a heuristic search. It doesn't mean that the general form of a problem is not NP-hard, just that a good enough solution can be approximated or an optimal search can be made tractable, by adding more information.

This angle was very prominent during the first AI "revolution" wherein casting AI as search problems augmented by human knowledge was in vogue.

There are algorithms called ant colony optimization https://en.wikipedia.org/wiki/Ant_colony_optimization_algori.... They are modeled after this ant colony behavior. As others have mentioned, these are good at finding local optima, like tabu search or simulated annealing, or genetic algorithms. This is good enough for most business purposes, such as the 'couch production' case from the article and other business cases. However it is not the same as finding 'a general solution'. Sapolsky compares us being bad at finding 'a general solution' with ants capable of finding a local optimum. I find this a bit misleading.

If ants can smell where other ants have been, they are kind'a doing Dijkstra's algorithm. Is this the "swarm intelligence" the book is getting to?

People really need to come up with better names. "Linear Programming" or "Integer Linear Programming" mean absolutely nothing.

Also anything dealing with finding the minimum distance distances can be short circuited by keeping the shortest distance and not taking paths that exceed that. This is how approximate nearest neighbor works and can still speed up the full solution. Figuring out full paths that have short average distances first can also get to shorter distances sooner.

You can also cluster points knowing you probably don't want to jump from one cluster to another multiple times.

This made me wonder why it's called programming (since clearly it's not the sense of the word programming most HN'ers are used to).

https://en.wikipedia.org/wiki/Mathematical_optimization#Hist...

From that link:

Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time

Is that also true of 'dynamic programming'?

If you don’t know, you are in for a treat. Here is Bellman’s own description of how he came up with the term “dynamic programming “ —

I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, ‘Where did the name, dynamic programming, come from?’

The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence.

You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose?

In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense.

It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities (Autobiography, p. 159).

See: https://pubsonline.informs.org/doi/pdf/10.1287/opre.50.1.48....

People up-thread have been getting cranky about the use of “programming “ in this sense.

Now of course, “programming“ for optimization has been well-entrenched since the 1970s at least. But perhaps Bellman’s story does give some cover to those who feel the word “programming“ has been wrongly appropriated?

Oh gosh—I was vastly out of the loop: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Thanks! That's a classic for sure.

Also anything dealing with finding the minimum distance distances can be short circuited by keeping the shortest distance and not taking paths that exceed that.

That's the "bound" part of "branch-and-bound", so MILP-solvers already do this.

You can also cluster points knowing you probably don't want to jump from one cluster to another multiple times.

You can incorporate heuristics into the branch-and-bound algorithm, but the goal of MILP-solvers is generally to produce an optimal solution (or at least a solution that is _provably_ within x% of optimality).

If you don't care about optimality and just want a solution that's good enough, I would implement the Christofides–Serdyukov algorithm.

absolutely nothing? you mean other than the relationship with linear systems and linear algebra?

Linear programming solvers use lots of heuristics (not entirely unlike the ones you sketched) internally.

The important thing to keep in mind is that their heuristic only speed up the amount of time spent finding the optimal solution. But you still get a prove at the end, that they actually found the optimal solution. (Or if you stop earlier, you get a provable upper bound estimate of how far you are away at worst from the optimal solution.)

Heuristics like the ones you sketched don't (easily) give you those estimates.

take "programming" to mean "scheduling" and the ancient crusty term acquires some meaning.

Can the folks on HN guide me on how to learn and master linear programming and create a consulting career out of it? I've been exposed to linear programming slightly at work and I find this to be powerful technique to solve a lot of problems that are currently written with generic software programming with better results. I feel there is good opportunity to create a consulting career/business out of it, though having the knowledge and expertise is necessary and there aren't lot of good resources on the internet to learn.

Don't.

There's a lot of low hanging fruit out there in the world of decisions that get made manually today. If you can do a globally optimal MIP solver, cool, I guess. But often you don't have time to run it, and an immediately calculated and configurable greedy solution is good too. Find a domain space with one archetypal decision that gets solved by many different companies on repeat and just solve that one problem.

The ones that already have software answers are the hard sells.

Find a domain space with one archetypal decision that gets solved by many different companies

Interesting. Can you give an example of what kind of decisions you're thinking about?

any sort of scheduling

To be fair, I don't see why LP is still being used for many applications nowadays and not replaced, as it tends to be a brute force techniques.

Would you care to elaborate?

LP or ILP? There is a significant difference since for non-discrete problem Linear Programming is shockingly efficient and in no way can be considered a brute force technique.

edit: What would be a technique you consider non-brute force in discrete problems?

So many discrete optimization problems can be translated into linear programs. It's a really powerful set of tools to know, kind of like SAT solvers.

I only recently learned about linear programming. I started with PuLP and Python to get a grasp. It was one of those "How did I miss this??" moments as a developer.

Do you have any recommendations on where to start?

I wish I can remember how I even learned LP tools existed. I started with this: https://coin-or.github.io/pulp/

The Google OR-tools library is also a good starting point.

I learned about linear programming in uni, but alas I don't think a mathematician's course on linear programming would be a good starting point for practical programmers.

Winston's "Operations Research: Applications and Algorithms" is the authority so far as I can tell. Trivial to find old editions online.

For positive integers m and n, have a m x n matrix A of real numbers. Then also have n x 1 x, 1 x n c, and m x 1 b. Seek x to solve

LP1:

maximize z = cx

subject to

Ax = b

x >= 0

Instead just as easily can do minimize.

Instead of =, might be given >= and/or <=, but use slack and/or surplus variables to get the problem in the form of LP1.

Any x so that

Ax = b

x >= 0

is feasible. If there is such an x, then LP1 is feasible; else LP1 is infeasible. If LP1 is feasible and for any feasible x we have z bounded above, then LP1 is bounded and has an optimal x (z as large as possible) solution. Else feasible LP1 is unbounded above.

So, LP1 is feasible or not. If feasible, then it is bounded or not. If bounded, then there is at least one optimal solution.

Regard n x 1 x as a point in R^n for the real numbers R.

Cute: If all the numbers in LP1 are rational, then have no need for the reals.

The set of all feasible x is the feasible region and is closed (in the usual topology of R^n) and convex. If LP1 is bounded, then there is at least one optimal x that is an extreme point of the feasible region. So, it is sufficient to look only at the extreme points.

To determine if LP1 is feasible or not, and if feasible bounded or not, and if bounded to find an optimal x, can use the simplex algorithm which is just some carefully selected linear algebra elementary row operations on

z = cx

Ax = b

The iterations of the simplex algorithm have x move from one extreme point to an adjacent one and as good or better on z.

A LOT is well known about LP1 and the simplex algorithm. There is a simpler version for a least cost network flow problem where move from one spanning tree to another.

If insist that the components of x be integers, then are into integer linear programming and the question of P = NP. In practice there is a lot known about ILP, e.g., via G. Nemhauser.

I used to teach LP at Ohio State -- there are lots of polished books from very elementary to quite advanced.

I attacked some practical ILP problems successfully.

I got into ILP (set covering, a darned clever idea since get to handle lots of goofy, highly non-linear constraints, costs, etc. efficiently) for scheduling the fleet at FedEx. The head guy at FedEx wrote me a memo making that problem my work -- officially I reported to the Senior VP Planning, but for that ILP work in every real sense reported to the head guy. The promised stock was very late, so I went for a Ph.D. and got good at lots of math, including optimization, LP, and ILP, etc.

Conclusion: A career in LP or ILP is a good way to need charity or sleep on the street -- literally, no exaggeration.

For some of what AI is doing or trying to do now, LP/ILP stands to be tough competition, tough to beat. And same for lots more in the now old applied math of optimization. Bring a strong vacuum cleaner to get the thick dust off the best books.

It seems their result has been out for almost a year now... https://arxiv.org/abs/2303.14605

I'm curious how this affects Traveling Salesman. I was under the impression that all NP-Complete problems take O(n!). Does this method improve it at all?

So what? It takes time for the community to digest the result, grasp its significance, and then write popularization articles about it. If you want to know what's being discovered right this second, read arXiv preprints. If you want to know what was discovered semi-recently and you want an explanation in layman terms that puts the results in perspective, read popularization pieces a while later.

I was under the impression that all NP-Complete problems take O(n!).

SAT is NP-complete and the naive algorithm ("just try every combination") is O(2^n). Even for TSP there is a dynamic programming approach that takes O(n^2*2^n) instead of O(n!).

There is an entire field of research on improving the constants of exponential-time algorithms for NP-hard problems.

Depending on the problem it can also O(2^n), but that is always the worst case scenario. Modern ILP solvers employ a variety of heuristics that in many cases significantly reduce the time needed to find a solution.

Anecdotally, some years back I was developing MILPs with millions of variables and constraints, and most of them could be solved within minutes to hours. But some of them could not be cracked after weeks, all depending the inputs.

We actually don't know how long NP-complete problems take to solve. We conjecture that it's superpolynomial, but that can be exponentially faster than O(n!).

Often, the concrete problems we are interested in have some internal structure that make them easier to solve in practice. Solving Boolean formulas is NP-complete but we routinely solve problems with millions of variables.

ILP (and sat) solvers are interesting because they are great at ruling out large areas that cannot contain solutions. It's also easy to translate many problems into ILP or SAT problems.

Minor nitpick, but the title of this submission should specify "Integer Linear Programming", since the integer part is a much bigger deal.

Polynomial time algorithms have been known for linear programming for decades; _integer_ linear programming is NP-hard.

You are right that integer linear programming is NP-hard; but faster algorithms for continuous linear programming are also super interesting and impactful.

Continuous linear programming is also _hard_. Not in the sense of NP-hard, but in the sense of there being lots of algorithmic and engineering aspects that go into an efficient, modern LP solver. Even just the numerics are complicated enough.

(And many integer linear programming solvers are based on continuous linear programming solvers.)

Yea, Daniel Spielamn and Shang-Hua Teng won the Gödel Prize for their work on smoothed analysis of simplex algorithms. They introduced a way to formally study the worst case complexity of algorithms when the inputs are randomly perturbed by a small amount.

https://www.di.ens.fr/~vergnaud/algo0910/Simplex.pdf

Spielman in 2013 also (with Adam Marcus and Nikhil Srivastava) came out of left field and solved the long open Kadison-Singer problem, to the surprise of more mainstream mathematicians.

I find this interplay between "traditional" mathematicians and those in allied fields like CS to be very interesting.

True, these are all fair points! I didn't intend to diminish the impact or complexity of linear programming solvers. Well-written solvers are some of the most useful and powerful computational tools that exist today.

I think we fixed that, albeit by accident when I edited the title earlier. If it needs further fixing let me know!

While this is an interesting theoretical result, we need to remember that they found an algorithm that is (log n)^O(n). In other words, this is not practical to solve problems with moderate to large size n.

I don't know much about the specific space of ILP, but speaking more generally...

It is sometimes possible to specialize algorithms and implementations to be faster for certain subdomains of the overall problem, allowing real-world-useful problems to be solved in reasonable time despite the generalized theoretical complexity bound.

If this new algorithm is a fundamentally different approach from the current ones, this may allow ILP to be used in domains where it is currently infeasible. Vice versa, this new algorithm may not be feasible in domains where current tools thrive.

In other words, this is not practical to solve problems with moderate to large size n.

This depends entirely on your definition of "moderate to large". Many real world problems can be solved easily using existing MILP solvers. We will likely never find an algorithm that can solve arbitrarily large instances of NP-complete problems in practice. Heck, it's easy to generate lists that are to large to be sorted with O(n^2) bubblesort.

compared to the previous bound of n^n, log(n)^n looks pretty good.

I have a dumb question: how long will it take before this result becoming a pratical MIP solver beating SCIP or gurobi?

Don't forget about the HiGHS solver [1]. MIT licensed and getting to the point where it's outperforming SCIP on the Mittelmann benchmarks [2].

[1]: https://github.com/ERGO-Code/HiGHS

[2]: https://mattmilten.github.io/mittelmann-plots/

HiGHS is more of an alternative to Bonmin and Minotaur than Couenne and SCIP. In my experience though the presolvers in SCIP are extremely dangerous, and it's easy to end up with a local optimum even when that isn't your goal.

Couldn't either of those implement this algorithm?

It’s great result but probably not useful. Similarly to how interior point methods have better theoretical complexity than simplex for LPs, but fine tuned simplex in reality almost always wins.

I never really understood that. Is there a commonly understood "reason" why IP methods are typically slower in practice?

Seems going through the interior you'd quicker approach a good solution than when being confined to the boundary. But maybe that difference is less important in high dimensions.

Calculating derivatives is the most expensive and numerically challenging operation you do in optimization.

Simplex circumvents these issues by traversing the edges of the polytope.

Pivoting is very cheap and from practice we see that you are afforded a LOT of iterations before even start thinking about interior point methods.

Linear programming is very cool, I loved Vasek Chvatal's book as a kid having accidently bought it thinking it was for computers.

But it's tricky to understand and implement and it struggles with real life constraints. i.e. This whole specialty just for integers.

Monto Carlo is trivial to understand and implement, adapts to changes and constraints trivially and should be just as good.

I'm sure for something high end like chip design you will do both. I'd be surprised to hear of real life stories where linear programming beats Monty Carlo.

But it's tricky to understand and implement and it struggles with real life constraints. i.e. This whole specialty just for integers.

Integers are actually harder to deal with than rational numbers in linear programming. Many solvers can also deal with a mixed problem that has both rational and integer variables.

Monte Carlo simulations are an entirely different beast. (Though you probably mean simulated annealing? But that's close enough, I guess. Linear programming is an optimization technique. Monte Carlo by itself doesn't have anything to do with optimization.)

One problem with these other approaches is that you get some answer, but you don't know how good it is. Linear programming solvers either give you the exact answer, or otherwise they can give you a provable upper bound estimate of how far away from the optimal answer you are.

Linear programming on reals is "easy"... You can just check all the points. I believe you can follow the shell of the legal polytope and just use a greedy algorithm to choose the next point that will minimize your goal.

If you can get away with a continuous linear program I don't see why you'd use monte carlo. The simplex method will get you an exact answer.

This discovery may change the world in unpredictable and, perhaps very big, ways. Discoveries like this put all the self-important feature / model developers that we work with in our big tech day jobs into context.

Can you explain specifically what about it you think will change the world and why?

Correct me if I'm wrong but (log n)^O(n) sounds like atrocious complexity?

It is atrocious. It's worse than exponential.

But it's much better than the prior state of the art which was n^n 2^O(n). [1]

The Quanta article, unfortunately, doesn't bother to report the prior complexity for comparison, despite that that's probably the single most important thing to say in order to support the claim in the article's sub-headline.

[1] https://en.m.wikipedia.org/wiki/Integer_programming#Exact_al...

I studied Operations Research at Stanford University in 1985/86, and got to take classes with George Dantzig; and then I went off and became a software engineer instead of doing OR. It's fascinating to read the comments on this post and see how much has been learned about linear programming algorithms since then.

TL;DR fun math no impl

So this is for the special case of non-negative and non-zero weights only, right? But those cases are the only sane ones, avoiding recursive loops winning a time-travel-alike race.

Lowering the algorithmic upper bound for a core NP-complete problem is always extremely interesting. However, this is not necessarily related to improving runtime for practical implementations solving the problem in question.

Solvers for mixed integer programming (MIP) use a lot of algorithms in conjunction with loads of heuristics. Building up the library of heuristics and strategies is a crucial part of why the improvement in MIP solvers have outpaced Moores law. From https://www.math.uwaterloo.ca/~hwolkowi/henry/teaching/f16/6..., the improvements in hardware from 1990 to 2014 was 6500x. But the improvements to the software are responsible for 870000x performance improvement.

The referenced article may become another part of the puzzle in continuing performance improvements for MIP solvers, but it is not in any way a given.

The abstract is more informative: https://arxiv.org/abs/2303.14605

   We obtain a (log(2n))^O(n)-time randomized algorithm to solve integer programs in n variables.

So the work is theoretical: a better exponential-time algorithm than the previous best, based on some analysis of the structure of convex bodies in R^n and how they can be covered by integer grids (lattices).

Most of the practical work on ILPs uses heuristics and branch and bound while taking advantage of the special structure of particular problem formulations. It isn't clear if this work could be used to help either of those, and I imagine without someone from Gurobi (or similar) chiming in, I wouldn't be able to tell from reading the paper.

[2023] The paper was uploaded first in March https://arxiv.org/abs/2303.14605

I remember the news articles when Karmarkar's algorithm for linear programming was announced.

https://en.wikipedia.org/wiki/Narendra_Karmarkar

https://en.wikipedia.org/wiki/Karmarkar%27s_algorithm

I am a little confused about some of the language used here.

The best version they could come up with — a kind of speed limit — comes from the trivial case where the problem’s variables (such as whether a salesman visits a city or not) can only assume binary values (zero or 1).

Did they just call an NP-Complete problem a trivial case?!

I was under the impression that all ILP can be reduced to 01-ILP equivalents, and vice versa?

Unfortunately, once the variables take a value beyond just zero and 1, the algorithm’s runtime grows much longer. Researchers have long wondered if they could get closer to the trivial ideal.

So, is the work a solver improving the lower bound for 01-ILP or an algorithm that brings the bounds between 01-ILP and general ILP closer?

Has anyone considered the potential of integrating the recent ILP breakthrough into transformer models? Given ILP's prowess in optimization, I'm curious about its application in enhancing transformer efficiency, especially in inference speed. Could this ILP method streamline computational resource allocation in transformers, leading to a significant leap in AI model optimization? Keen to hear thoughts on practical challenges and theoretical implications of merging these two fields.

Oh please. Just another theory that won't impact anyone or improve anything anywhere.

Great short article. I haven't looked deeply into the math behind this yet, but this looks to be a preprint [0]. It doesn't appear they're looking directly at the Space Groups as a way to reduce out any symmetries or repetitions that may occur (thus generalizing simplifications of the problem "space"), but it would be interesting to see whether those structures apply or not. I say this as someone who writes software to apply the Space Groups and describe the Voronoi cells around points (or groups of points) distributed through them, so I'm familiar with the "uncanny" ways effects propagate. [1]

I'm also not a mathematician (just a lowly architect), so I'm way out of my depth here. But it's fascinating and as someone looking at paths across these generated honeycombs, this result bears more investigation for me as well.

[0] https://arxiv.org/pdf/2303.14605.pdf [1] If you know a mathematician who might be interested in collaborating on this kind work, ping me. This is ongoing work, and as I said I'm out of my depth mathematically. But have run into some interesting properties that don't seem that deeply investigated which may bear deeper study by an actual expert.