I suppose this is kind of a hot take, and in some ways less relevant than it was (say) 10 years ago, but I believe the theoretical CS community has spent a lot of time analyzing the performance of various algorithms over the years and kind of pointedly not inventing algorithms which accomplish some result at all, even if it is only known to be computable in a pathologically inefficient way. This means a lot of breakthroughs that could have been TCS breakthroughs have been invented, usually worse, by other communities.
I am not the only one who thinks so, here[1] is an obscure talk where Michael Mitzenmacher (a well-known CS theory prof at Harvard) makes more or less this specific complaint.
One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.
With that all said I'm open to the argument that TCS has fundamentally changed and this is no longer such a big issue. But perusing the various TCS conferences I sense it isn't.
Check out the Art of Computer Programming, by Knuth. He writes all of the algorithms in a hypothetical machine code called MIX (MMIX in later editions). He does go through algorithmic analysis to the degree you describe, showing exactly how many cycles a particular routine may take to complete, based on different variables. This is actually fairly complex even for simple programs. Consider the simple case of computing the maximum in a list:
let current_max := 0
for x in array_of_numbers {
if x > current_max {
current_max = x;
}
}
return current_max;
Anyone can tell you this runs in O(N) time. But exactly how many cycles would this take? Ignoring the architectural details, you would need to know how often the condition "x > current_max" is true. If the list of numbers is in ascending order, this will be true every time, if it is in descending order it will only be true once. The difference in the number of cycles between those two cases is significant. I believe Knuth works through this example, and shows that, for a randomly-ordered array, this would be true around log_2(n) times.I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.
When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.
Overall, it's annoying to tell whether an NP-hard problem is always really hard, or if ~all practical instances can be solved with a clever heuristic. It doesn't help that most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.
Since I know quite some people who exactly work on this kind of problem of finding special classes that can be solved in polynomial time, my impression is of course different.
But I think it can be said that if some NP-hard problem is very important in practice and there is no easy way to to get around this problem (i.e. it will also be practically relevant in, say, 15 years), the problem will for sure get a lot more attention.
This is also the reason why some NP-hard problems are much more researched than others.
Sometimes it's also helpful to look into approximation algorithms, e.g., a good LLL implementation can work wonders for certain problems. But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.
This is not (necessarily) true:
For example, there exists a great approximation algorithm (Goemans-Williamson algorithm) for MAXCUT in graphs with non-negative edge weights.
On the other hand, when negative weights do occur, one can show (unless P=NP) that there exists no polynomial-time approximation algorithm for which the approximation guarantee is a constant factor times the optimal solution.
But since the Goemans-Williamson algorithm is a great algorithm (if the Unique Games Conjecture holds, and P != NP, it has the best approximation guarantee that any approximation algorithm for MAXCUT with non-negative weights can get in polynomial time) nobody "forbids" you to use it in situations where also negative edge weights can occur. You will loose the approximation goodness guarentee, but in practice, this algorithm nevertheless gives good results in this situation, just not certified good results.
I'm not saying it was TCS people would be "better" at discovering this or anything like that, I'm just saying that the opportunity for contribution from TCS people is very high, and because the fields are not well-integrated you often have to wait years to uncover what ML concepts "really" are. I think working together would benefit both ML and TCS!
I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.
If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.
(Also touched upon briefly in https://sedgewick.io/talks/#lasting-legacy-of-flajolet = https://sedgewick.io/wp-content/uploads/2022/03/2013-14Flajo...)
His point is exactly that, often researchers care only about worst-case analysis and asymptotics, which is not necessarily a useful scientific analysis of how long an algorithm is expected to take on a machine as a function of the input size (which was the original motivation for Knuth etc to develop the field).
The same authors (but ordered as Sedgewick and Flajolet!) also have a book called An Introduction to the Analysis of Algorithms; there's a course website at https://aofa.cs.princeton.edu/home/ and videos on Coursera (https://www.coursera.org/learn/analysis-of-algorithms) — it is advanced mathematics but less advanced than the Analytic Combinatorics book.
[1] Algorithm, https://github.com/williamw520/toposort/blob/master/Algorith...
[2] Benchmarks, https://github.com/williamw520/toposort/blob/master/README.m...
For example, we assume that arbitrary arithmetic takes constant time. Good. Except that if you take that seriously, then P=PSPACE. So we might say that bit operations on fixed registers are constant time. But nobody does that because then arithmetic is O(lg n) and it's too much trouble to keep track of everything (and if you do, then hashtables aren't O(1) any longer!) If we are more sophisticated we might take a transidchotomous model, but that's really weird and we have results like sorting in less than n lg n time. Not to mention that the mathematical foundations get a bit shaky with more than one variable.
I think the hard problem underneath is that it's hard to define a computational model that (a) can be reasoned about and isn't extremely weird, (b) has solid mathematical foundations, (c) is reasonably predictive of real world performance, and (d) is not linked to a specific architecture. Asymptotic analysis is kind of a compromise. It can be formulated in very elementary terms, it makes mathematical sense and if you plot the performance of a sorting algorithm on a chart it kind of looks like the graph of n log n if you squint a bit.
Knuth does something similar to what you're suggesting IIRC, albeit without the analytic combinatorics machinery. For some programs he derives a precise expression as a function of the size of input that describes how many operations of each kind the underlying machine will perform. In numerical analysis there is a similar concept, e.g. evaluating a polynomial with Horner has the same asymptotic complexity as the naive method, but we say it's better because it takes exactly the optimal amount of floating point operations (and, in practice, because it uses fused multiply-adds, but that's another story...)
A research program along those lines would be certainly fascinating, but the combination of "needs to be reasonably predictive" and "needs to be portable" is a tough nut to crack.
Arithmetic on fixed-width number types is constant. It's very much not constant on arbitrary precision integers, something that underpins much of cryptography. Indeed, if you try to e.g. use Gaussian elimination (which uses O(n^3) arithmetic operations) to compute the determinant of a large integer matrix, you'll run into trouble, which is why there is another algorithm for this use case: https://en.m.wikipedia.org/wiki/Bareiss_algorithm
(Disclaimer: Also not a researcher)
Arithmetic on fixed-width registers in O(1) is equivalent to saying that bit operations are O(1), which intuitively makes the most sense, after all adding two uint64 is not the same as adding two arbitrary precision integers. But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.
Also, what does fixed mean? Is it like the word size in an actual computer architecture, where registers are 64 bits and that's all you'll ever get, or it depends on the size of input? Because the latter is the transdichotomous model, yet another option. Unsurprisingly you can get different results with different computational models.
This is not to say that asymptotic analysis is wrong, obviously, but it's a bit of a mess and it's important to remember what computational model the analysis was performed under.
So, I only have rudimentary undergraduate knowledge about complexity theory, but my materials (e.g. in theoretical CS, cryptography and computational algebra) were always incredibly clear about the fact that arbitrary integer arithmetic isn't constant - this is why complexity classes such as weakly vs. strongly NP-complete exist. Do you have any examples of where people would treat arbitrary length integer arithmetic as constant? (I think most people just never have to use arbitrary precision arithmetic, so that's why they'd not talk about it explicitly.)
You're right that computational complexity depends on the model and that this makes it nontrivial to use in practice. The theoreticians would sweep this under the rug with "these models are all polynomially equivalent to Turing machines" (which lets us formulate a conjecture such as P=NP precisely), but of course that doesn't in practice help you distinguish between a constant, linear or quadratic algorithm.
My understanding is that it's pretty much how it started, but proved to be impractical. Meaning, we absolutely can quantify the work any algorithm will take, but for that we need to define the architecture first, i.e., to specify what is the unit of work, what are operations available to us. And, sure, we can do that for any given task (and we always could), but it's usually too messy to include that in general discussions about algorithms, not specifically concerned with the architecture.
What's meant by "it’s already too much to ask for a closed form for fibonacci numbers"? Binet's formula is usually called a closed form in my experience. Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
You don't need arbitrary-precision arithmetic to evaluate Binet's formula - big integer arithmetic suffices (simply do your calculations in the ring Z[X]/(X^2-X-1) ).
What makes you believe that the coefficient will be transcendental? I'd rather expect some non-trivial algebraic number (i.e. root of some polynomial with rational coefficients).
he confuses the characteristic equation for the generating function. This : T(z) = z + zT + zT^2 + zT^3 is the characteristic equation.
There is no simple generating function ,as it's not possible to easily invert a cubic closed in form.
Successive derivatives of the generating function are taken to find the coefficients of the infinite series that encodes the sought values. It's hard enough explaining this with a quadratic; a cubic so much more involved.
This so far beyond the scope of a blog post. You need to spend at least a few weeks working on this or take some college courses.
Yes there is no "simple" generating function, i.e. there's no simple expression for T(z). Nevertheless, T(z) (as further explained in another comment here in this thread) satisfies the equation T(z) = z + T(z) + zT(z)^2 + zT(z)^3, and this fact can be used to find the coefficients of T (i.e. the numbers t_n) via Lagrange inversion, and the code in the post does so and gets the same numbers as https://oeis.org/A036765.
(You use the term "characteristic equation", but that is a term only used in the context of linear recurrence relations AFAIK, and not applicable here.)
(Also IMO the post does not overcomplicate anything; what it presents is in fact the easiest way of solving the problem — that of finding the asymptotics of the number of unordered rooted ternary trees, up to isomorphism. If you think you have a simpler method even for counting small cases, feel free to post an answer at https://math.stackexchange.com/questions/5050941/counting-tr....)
It extends the original framework with a lot of useful new primitives, like Hadamard products. Super useful!
But more generally, why do we want to analyze combinatorics and algorithms? I suppose it gives us some confidence that we are actually making progress.
Seems like combinatorics comes in second only to statistics.
ghci> ts=0:(1+ts+ts^2+ts^3) ghci> take 12 ts [0,1,1,2,5,13,36,104,309,939,2905,9118]
Using the Num instance for power series (very cute application of laziness expounded by Karczmarczuk and then McIlroy https://dl.acm.org/doi/abs/10.1017/s0956796899003299)
$ ghci
GHCi, version 9.12.2: https://www.haskell.org/ghc/ :? for help
ghci> ts=0:(1+ts+ts^2+ts^3)
ghci> take 12 ts
<interactive>:2:1: error: [GHC-39999]
• No instance for ‘Num [Integer]’ arising from a use of ‘it’
• In the first argument of ‘print’, namely ‘it’
In a stmt of an interactive GHCi command: print it
Edit: After reading Bentley's paper that you linked, figured it out: $ cat > bentley.hs
ps0:: Num a => [a]
ps0 = 0 : ps0
-- (.*):: Num a => a->[a]->[a]
c .* (f:fs) = c*f : c.*fs
instance Num a => Num [a] where
-- negate (f:fs) = (negate f) : (negate fs)
(f:fs) + (g:gs) = f+g : fs+gs
(f:fs) * (g:gs) = f*g : (f.*gs + fs*(g:gs))
fromInteger c = fromInteger c : ps0
^D
$ ghci bentley.hs
GHCi, version 9.12.2: https://www.haskell.org/ghc/ :? for help
[1 of 2] Compiling Main ( bentley.hs, interpreted )
bentley.hs:7:10: warning: [GHC-06201] [-Wmissing-methods]
• No explicit implementation for
‘abs’, ‘signum’, and (either ‘negate’ or ‘-’)
• In the instance declaration for ‘Num [a]’
|
7 | instance Num a => Num [a] where
| ^^^^^^^^^^^^^^^^
Ok, one module loaded.
ghci> ts=0:(1+ts+ts^2+ts^3)
ghci> take 12 ts
[0,1,1,2,5,13,36,104,309,939,2905,9118]
Very informally, let T(z) be the generating function where the coefficient of the z^n term is the number of trees we are searching with exactly n nodes.
We know from the functional description that such a tree can be:
- a leaf;
- a node with 1 child;
- a node with 2 children;
- a node with 3 children.
A leaf is a tree with one node and that's it, so it "contributes" as a single tree of size 1.How many trees of size n are there that fall in the second case? Exactly as many as the total trees of size (n-1), hence this case "contributes" as zT(z).
How many trees of size n fall in the third case? That's a bit harder to see, because now you have (n-1) nodes in total, split between the two branches. So you can have 1 in the left branch and n-2 in the right branch, 2 left n-3 right and so on. With a bit of basic combinatorics you can notice that this is exactly the convolution product of the series with itself, in analytical terms T^2(z). Hence the third case "contributes" for zT^2(z).
The fourth case is similar.
At this point we have constructed the inverse of the generating function we are looking for, i.e. z as a function of T(z) rather than the reverse, but luckily there's a standard trick for this: Lagrange inversion.
Your explanation was exactly what I needed, thanks.
The introduction to generating functions in this well known text on the subject (at the beginning of Chapter 1) might be more helpful than the Wikipedia article linked in the original post:
https://www2.math.upenn.edu/~wilf/gfology2.pdf
From that text:
> Although giving a simple formula for the members of the sequence may be out of the question, we might be able to give a simple formula for the sum of a power series, whose coefficients are the sequence that we’re looking for.
A worked example of the quadratic formula would be solving a particular equation by plugging in values of a, b, c into the formula.
A worked example of analytic combinatorics is solving a combinatorics problem with techniques from real/complex analysis.