As someone who did a lot of work with graphs, "why don't programming languages have a built-in graph data type?" is a question I’ve been asked a million times.
I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.
This is a super naive take. but I would consider the pointer the be the native graph type. What is wanted is not a graph type but the tooling to traverse graphs.
Generalized pointers (RAM addresses, disk storage locations, network locations, etc.) would be the general way to implement explicit literal graphs.
Other graphs, that are partly or wholly computed or recomputed as needed from other relationships, could be considered "implicit" graphs and can be implemented many other ways.
The fact that graphs can be any combinations of literally or dependently defined, static or dynamic defined, would add even more complexity to any truly general graph library.
You could say the same of Lists.
Well, of course! Lists, trees, arrays, structures, etc. are specialized graphs.
Maybe a pointer is better described as equivalent to a half-edge?
I think this might be kinda right.
Take rust. Tree edges use Box, and DAG edges use Arc or Rc. Rust doesn’t really help with building non-DAG graphs as native data types - you can drop back to using “handles” for more generic graphs.
And I actually think that is kinda fair. Languages and standard libraries should support trees and DAGs out of the box. Perhaps Hillel is right that more complex graphs should be in a user-space library or something.
It's a bit like saying that a byte is a native number type, providing comparison to zero and addition as the only operations, and asking to figure the rest yourself.
I mean, it's technically sufficient, but I'd not call it proper support on a language level.
Period should probably be a comma; I read this comment a couple times before it made sense, rather than calling the grand parent super naive.
Hmm, think you're taking the title of the article a little too literally, and focusing on just the "data type" aspect. Yeah, the article itself indirectly agrees with you that pointers are often how graphs are implemented: "Graphs are a generalization of linked lists, binary trees, and hash tables," which are data-structures often implemented by pointers.
Yeah, the article is really saying what you're saying, that building a graph library ("tooling") is very complicated.
But, maybe I'm misunderstanding what you're saying??
I actually brought this hot take up in my conversation with Hillel -- something along the lines of "void * * is the canonical native graph representation in C."
I used to think that since graphs are such a broad data structure that can be represented in different ways depending on requirements that it just made more sense to implement them at a domain-ish level (the article mentions this in the "There are too many implementation choices" section).
Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.
[0] https://github.com/petgraph/petgraph
Why wouldn't an abstract `Graph` type and specific implementations of that work? Like Java has for `Set` and various implementations?
For importing and exporting data, it often makes more sense to use something like a table or a tree rather than a graph. (Like a CSV file or a JSON file.)
So it's not clear what the interface would do. What methods should there be? Again, there are too many choices, and a Graph interface often isn't the best way to represent a view of some subset of a graph.
To add to this, recursively enumerable is the same as semi-decidable, and that only gets you to finite time.
One of the big reasons to fight to make dependencies DAG like is because exhaustive search gets you to exponential time.
NP-complete, NP-hard are easy to run into with graphs.
Graph k-colorability, finding Hamiltonian cycles, max cliques, max independent sets, and vertex cover on (n)vertex graphs aren't just NP, they have no sub-exponential time algorithms.
Finding those subgraphs is often impractical.
even at that level of abstraction there are many flavors of "graphs". what some people call graph are actually hypergraphs, or attributed graphs.
You might say graphs are an abstract concept, solving our problems requires specialized graphs, and so we just use or build the kind we need.
Yeah this is exactly how I think of it. I think "graphs" are just at a different level of the abstraction hierarchy than the data structures they're often grouped together with.
This is even further up the abstraction hierarchy, but to illustrate the point, nobody really wonders why languages don't ship with a built-in database implementation. And it's the same basic reason as with graphs; one size doesn't fit most.
More importantly, there are a lot of tradeoffs.
Virtually every language offers a hash map. You can roll your own to outperform in an individual circumstance, the default works pretty well.
You can't really do that with a graph. Maybe if you offered a bunch of graph types.
---
PS. Bit of trivia: Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.
And so why isn't this the solution?
Most languages support both hash map (fast lookup) and balanced tree (ordered entries) primitives, even though they both implement the "associative map" container type.
Can't we have 2, 3, 5, or even 8 different graph types?
That’s what graph libraries do, look at petgraph. You’ve got a bunch of graph implementations, and a bunch of algorithms over them.
It's also a bit different in that every insertion entails at least one allocation, and every lookup entails pointer-chasing through at least 3 dereferences. Java's lack of proper value types really kneecaps their hashmap implementation in many practical performance scenarios...
I'm not sure that's particularly unusual. For example, C++ supports this too:
https://en.cppreference.com/w/cpp/container/unordered_map/ma...
It's also different in that each hash bucket can use a Red-Black tree when there's many entries.
What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.
Typed functional languages like Haskell (data)/ML (type) do have a built-in way to define new tree types, and so does Rust (enum). It's one of the biggest things I miss when I'm not using these languages, especially when combined with some metaprogramming (deriving/derive) to get some functions defined for these new types very quickly.
I think it's because you have to think it through when they get too big for one machine. A lot of those so-called "NoSQL" databases are really meant to represent graph databases (think DynamoDB) in an adjacency list. I've had success with Gremlin and JanusGraph, but it's a really messy space. It's not a programming language problem in my opinion, but a distributed systems one.
I had an opposite experience. I was doing some new-to-me graph work in tcl and had just assumed the tcl standard library wouldn't have any graph algorithms. Come to find out I was wrong, and ended up saving myself a bunch of time not having to re-invent the wheel:
https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tclli...
Puthon has one, as mentioned, but as a non CS-person I never encountered any problem in my programming-life that so fundamentally called for graphs that I needed one, not did I had so much fascination for graphs that it was a tool I wanted to force onto my problems.
I guess it is just that there are many problems that can be solved incredibly well without graphs and not that many where graphs outshine everything else so clearly that people would use them.
That being said, convince me of the opposite and I am happy.