The semantic web standards are sorely lacking (for decades now) a killer application. Not in a theoretical universe of decentralized philosopher-computer-scientists but in the dumbed down, swipe-the-next-30sec-video, adtech oligopolized digital landscape of walled gardens. Providing better search metadata is hardly that killer app. Not in 2024.
The lack of adoption has, imho, two components.
1. bad luck: the Web got worse, a lot worse. There hasn't been a Wikipedia-like event for many decades. This was not pre-ordained. Bad stuff happens to societies when they don't pay attention. In a parallel universe where the good Web won, the semantic path would have been much more traveled and developed.
2. incompleteness of vision: if you dig to their nuclear core, semantic apps offer things like SPARQL queries and reasoners. Great, these functionalities are both unique and have definite utility but there is a reason (pun) that the excellent Protege project [1] is not the new spreadsheet. The calculus of cognitive cost versus tangible benefit to the average user is not favorable. One thing that is missing are abstractions that will help bridge that divide.
Still, if we aspire to a better Web, the semantic web direction (if not current state) is our friend. The original visionaries of the semantic web where not out of their mind, they just did not account for the complex socio-economics of digital technology adoption.
Over on lobste.rs, someone cited another article retracing the history of the Semantic Web: https://twobithistory.org/2018/05/27/semantic-web.html
An interesting read in itself, and also points to Cory Doctorow giving seven reasons why the Semantic Web will never work: https://people.well.com/user/doctorow/metacrap.htm. They are all good reasons and are unfortunately still valid (although one of his observations towards the end of the text has turned out to be comically wrong, I'll let you read what it is)
Your comment and the two above links point to the same conclusion: again and again, Worse is Better (https://en.wikipedia.org/wiki/Worse_is_better)
Every time I read a post like this I'm inclined to post Doctorow's Metacrap piece in response. You got there ahead of me. His reasoning is still valid and continues to make sense to me. Where do you think he's "comically wrong"?
The implicit metrics of quality and pedigree he believed were superior to human judgement have since been gamified into obsolescence by bots.
I think that the jury is still out on that one. Human judgement is too often colored by human incentives. I still think there's an opportunity for mechanical assessments of quality and pedigree to excel, and exceed what humans can do; at least, at scale. But, it'll always be an arms race and I'm not convinced that bots are in it except in the sense of lying through metadata, which brings us back to the assessment of quality and pedigree - right/wrong, good/bad, relevant/garbage.
Link counting being reliable for search. After going through people's not-so-noble qualities and how they make the semantic web impossible, he declares counting links as an exception. It was to a comical degree not an exception.
Yes. There is that. Ignobility wins out again.
item 2.6 kneecapped item 3
Indeed a good read, thanks for the link!
I think his context is the narrower "Web of individuals" where many of his seven challenges are real (and ongoing).
The elephant in the digital room is the "Web of organizations", whether that is companies, the public sector, civil society etc. If you revisit his objections in that light they are less true or even relevant. E.g.,
Yes. But public companies are increasingly reporting online their audited financials via standards like iXBRL and prescribed taxonomies. Increasingly they need to report environmental impact etc. I mentioned in another comment common EU public procurement ontologies. Think also the millions of education and medical institutions and their online content. In institutional context lies do happen, but at a slightly deeper level :-)
This only raises the stakes. As somebody mentioned already, the cost of navigating random API's is high. The reason we still talk about the semantic web despite decades of no-show is precisely the persistent need to overcome this friction.
We are who we are individually, but again this ignores the collective intelligence of groups. Besides the hordes of helpless individuals and a handful of "big techs"(=the random entities that figured out digital technology ahead of others) there is a vast universe of interests. They are not stupid but there is a learning curve. For the vast part of society the so-called digital transformation is only at its beginning.
You have a very charitable view of this whole thing and I want to believe like you. Perhaps there is a virtuous cycle to be built where infrastructure that relies on people being more honest helps change the culture to actually be more honest which makes the infrastructure better. You don't wait for people to be nice before you create the gpl, the gpl changes mindsets towards opening up which fosters a better culture for creating more.
It's also very important to think in macro systems and societies, as you point out, rather than at the individual level
Thanks for sharing that Doctorow post, I had not seen that before. While the specific examples are of course dated (hello altavista and Napster), it still rings mostly true.
One major problem RDF has is that people hate anything with namespaces. It's a "freedom is slavery" kind of thing. People will accept it grudgingly if Google says it will help their search rankings or if you absolutely have to deal with them to code Java but 80% of people will automatically avoid anything if it has namespaces. (See namespaces in XML)
Another problem is that it's always ignored the basic requirements of most applications like:
1. Getting the list of authors in a publication as refernces to authority records in the right order (Dublin Core makes the 1970 MARC standard look like something from the Starship Enterprise)
2. Updating a data record reliably and transactionally
3. Efficiently unioning graphs for inference so you can combine a domain database with a few database records relevant to a problem + a schema easily
4. Inference involving arithemtic (Godel warned you about first-order logic plus arithmetic but for boring fields like finance, business, logistics that is the lingua franca, OWL comes across as too heavyweight but completely deficient at the same time and nobody wants to talk about it)
things like that. Try to build an application and you have to invent a lot of that stuff. You have the tools to do it and it's not that hard if you understand the math inside and out but if you don't oh boy.
If RDF got a few more features it would catch up with where JSON-based tools like
https://www.couchbase.com/products/n1ql/
were 10 years ago.
I'll give you two examples: Internet Archive. Let's Encrypt.
Hardly a good reference, Internet Archive is older than Wikipedia.
Wikipedia itself is only a little over two decades old. I don't think anyone would parse "many decades" as "two decades".
There's also OpenStreetMap, exactly two decades old and thus four years younger than Wikipedia.
The world wide web (but not the internet) is only 3 decades old!
Not true: Wikidata, Open Alex, Europeana, ... and many smaller projects making use of all that data, such as my project Conzept (https://conze.pt)
Let's Encrypt is very good but it's not exactly a web app, semantic-web or otherwise.
The semantic web has been, in my opinion, a category error. Semantics means meaning and computers/automated systems don't really do meaning very well and certainly don't do intention very well.
Mapping the incredible success of The Web onto automated systems hasn't worked because the defining and unique characteristic of The Web is REST and, in particular, the uniform interface of REST. This uniform interface is wasted on non-intentional beings like software (that I'm aware of):
https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...
Maybe this all changes when AI takes over, but AI seems to do fine without us defining ontologies, etc.
It just hasn't worked out the way that people expected, and that's OK.
I take the other side of this trade, and have since c. 1980. I say that semantics is a delusion our brains creates. Doesn't really exist. Or conversely is not the magical thing we think it is.
man
How are you oblivious of the performative contradiction that is that statement?
Please tell me you're not an eliminativist. There is nothing respectable about eliminativism. Self-refuting, and Procrustean in its methodology, denying observation it cannot explain or reconcile. Eliminativism is what you get when a materialist refuses or is unable to revise his worldview despite the crushing weight of contradiction and incoherence. It is obstinate ideology.
Hard agree.
I think about it as:
- Hypermedia controls were been deemphasized, leading to a ton of workarounds to REST
- REST is a perfectly suitable interface for AI Agents, especially to audit for governance
- AI is well suited to the task of mapping the web as it exists today to REST
- AI is well suited to mapping this layout ontologically
The semantic web is less interesting than what is traversable and actionable via REST, which may expose some higher level, reusable structures.
The first thing I can think of is `User` as a PKI type structure that allows us to build things that are more actionable for agents while still allowing humans to grok what they're authorized to.
There's another element, trusting the data.
Often that may require some web scale data, like Pagerank but also any other authority/trust metric where you can say "this data is probably quality data".
A rather basic example, published/last modified dates. It's well known in SEO circles at least in the recent past that changing them is useful to rank in Google, because Google prefers fresh content. Unless you're Google or have a less than trivial way of measuring page changes, the data may be less than trustworthy.
Not even Google seems to be making use of that capability, if they even have it in the first place. I'm regularly annoyed by results claiming to be from this year, only to find that it's a years-old article with fake metadata.
They are quite good at near content duplicate detection so I imagine it's within their capabilities. Whether they care about recency, maybe not as long as the user metrics say the page is useful. Maybe a fallacy about content recency.
You don't see many geocities style sites nowadays, even though there's many older sites with quality (and original) content. Maybe mobile friendliness plays into that though.
Yeah, dates in Google results have become all but useless. It's just another meaningless knob for SEOtards to abuse.
At TU delft, I was supposed to do my PhD in semantic web especially in the shipping logistics. It was funded by port of Rotterdam 10 years ago. Idea was to theorize and build various concepts around discrete data sharing, data discovery, classification, building ontology, query optimizations, automation and similar usecases. I decided not to pursue phd a month into it.
I believe in semantic web. The biggest problem is that, due to lack of tooling and ease of use, it take alot of effort and time to see value in building something like that across various parties etc. You dont see the value right away.
Funny you bring up logistics and (data) ontologies. I'm a PM at a logistics software company and I'd say the lack of proper ontologies and standardized data exchange formats is the biggest effort driver for integrating 3rd party carrier/delivery services such as DHL, Fedex etc.
It starts with the lack of a common terminology. For tool A a "booking" might be a reservation e.g. of a dock at a warehouse. For tool B the same word means a movement of goods between two accounts.
In terms of data integration things have gotten A LOT worse since EDIFACT is de facto deprecated. Every carrier in the parcel business is cooking their own API, but with insufficient means. I've come across things like Polish endpoint names/error messages or country organisations of big Parcel couriers using different APIs.
IMHO the EU has to step in here because integration costs skyrocket. They forced cellphone manufacturers to use USB-Cs for charging, why can't they force carriers to use a common API?
The EU is doing its part in some domains. There is e.g., the eProcurement ontology [1] that aims to harmonize public procurement data flows. But I suppose it helped alot that (by EU law) everybody is obliged to submit to a central repository.
[1] https://docs.ted.europa.eu/epo-home/index.html
Good choice. The semantic web really brought me to the brink.
The community has its head in the sands about... just about everything.
Document databases and SQL are popular because all of the affordances around "records". That is, instead of deleting, inserting, and updating facts you get primitives that let you update records in a transaction even if you don't explicitly use transactions.
It's very possible to define rules that will cut out a small piece of a graph that defines an individual "record" pertaining to some "subject" in the world even when blank nodes are in use. I've done it. You would go 3-4 years into your PhD and probably not find it in the literature, not get told about it by your prof, or your other grad students. (boy I went through the phase where I discovered most semantic web academics couldn't write hard SPARQL queries or do anything interesting with OWL)
Meanwhile people who take a bootcamp can be productive with SQL in just a few days because SQL was developed long ago to give the run-of-the-mill developer superpowers. (imagine how lost people were trying to develop airline reservation systems in the 1960s!)
A killer app is still not enough.
People can’t get HTML right for basic accessibility, so something like the semantic web would be super science that people will out of their way to intentionally ignore any profit upon so long as they can raise their laziness and class-action lawsuit liability.
I see RDF as a basis to build on. If I think RDF is pretty good but needs a way to keep track of provenance or temporality or something I can probably build something augmented that does that.
If it really works for my company and it is a competitive advantage I would keep quiet about it and I know of more than one company that's done exactly that. The standards process is so exhausting and you have to fight with so many systems programmers who never wrote an application that it's just suicide to go down that road.
BTW, RSS is an RDF application that nobody knows about
https://web.resource.org/rss/1.0/spec
you can totally parse RSS feeds with a RDF-XML parser and do SPARQL and other things with them.
99% of the time you'll get an RSS 2.0 feed which is an XML format. Of course you can convert, but RSS 1.0 seems, like you said, forgotten from the world.
Not only has this gotten much worse; even when you put in the stop gaps for developers such as linters or other plugins, they willfully ignore them and will actually implement code they know is determinantal to accessibility.
I think the problem with any sort of ontology type approach is the problem isn't solved when you have defined the one ontology to rule them all after many years of wrangling between experts.
As what you have done is spend many years generating a shared understanding of what that ontology means between the experts. Once that's done you have the much harder task for pushing that shared understanding to the rest of the world.
ie the problem isn't defining a tag for a cat - it's having a global share vision of what a cat is.
I mean we can't even agree on what is a man or a women.
You point out a real problem but it does not feel like an unsurmountable and terminal one. By that argument we would never have a human language unless everybody spoke the same language. Turns out once you have well developed languages (and you do, because they are useful even when not universal) you can translate between them. Not perfectly, but generally good enough.
Developing such linking tools between ontologies would be worthwhile if there are multiple ontologies covering the same domain, provided they are actually used (i.e., there are large datasets for each). Alas, instead of a bottom-up, organic approach people try to solve this with top-down, formal (upper-level) ontologies [1] and Leibnizian dreams of an underlying universality [2], which only adds to the cognitive load.
[1] https://en.wikipedia.org/wiki/Formal_ontology
[2] https://en.wikipedia.org/wiki/Characteristica_universalis
In our spoken language the agents doing the parsing are human AI's (actual intelligences) able to deal with most of the finer nuances in semantics, and still making numerous errors in many contexts that lead to misunderstanding, i.e. parse errors.
There was this hand-waving promise in semantic web movement of "if only we make everything machine-readable, then .." magic would happen. Undoubtedly unlocking numerous killer apps, if only we had these (increasingly complex) linked data standards and related tools to define and parse 'universal meaning'.
An overreach, imho. Semantic web was always overpromising yet underdelivering. There may be new use cases in combinations of SM with ML/LLM but I don't think they'll be a vNext of the web anytime soon.
Killer applications solve real problems. What is the biggest real problem on the web today? The noise flood. Can semantic web standards help with that? Maybe! Something about trust, integrity, and lineage, perhaps.
Semantic Web doesn't help with the most basic thing: how do you get information ? If I want to know when was the Matrix shot, where do I go ? Today we have for-profit centralized point to get all information, because it's the only way this can be sustainable. Semantic Web might make it more feasible, by instead having lots of small interconnected agents that trust each other, much like... a Web of Trust. Except we know where the last experiment went (nowhere).
Off the top of head...
OpenStreetMap was in 2004. Mastodon and the associated spec-thingy was around 2016. One/two decades is not the same as many decades.
Oh, and what about asm.js? Sure, archive.org is many decades old. But suddenly I'm using it to play every retro game under the sun on my browser. And we can try out a lot of FOSS software in the browser without installing things. Didn't someone post a blog to explain X11 where the examples were running a javascript implementation of the X window system?
Seems to me the entire web-o-sphere leveled up over the past decade. I mean, it's so good in fact that I can run an LLM clientside in the browser. (Granted, it's probably trained in part on your public musing that the web is worse.)
And all this while still rendering Berkshire Hathaway website correctly for many decades. How many times would the Gnome devs have broken it by now? How many upgrades would Apple have forced an "iweb" upgrade in that time?
Edit: typo
Say what you want, but Macromedia Dreamweaver came pretty close to being "that killer app". Microsoft attempted the same with Frontpage, but abandoned it pretty quickly as they always do.
I think that Web Browsers need to change what they are. They need to be able to understand content, correlate it, and distribute it. If a Browser sees itself not as a consuming app, but as a _contributing_ and _seeding_ app, it could influence the semantic web pretty quickly, and make it much more awesome.
Beaker Browser came pretty close to that idea (but it was abandoned, too).
Humans won't give a damn about hand-written semantic code, so you need to make the tools better that produce that code.
Search and ontologies weren't the only goals. Microformats enabled standardized data markup that lots of applications could consume and understand.
RSS and Atom were semantic web formats. They had a ton of applications built to publish and consume them, and people found the formats incredibly useful.
The idea was that if you ran into ingestible semantic content, your browser, a plugin, or another application could use that data in a specialized way. It worked because it was a standardized and portable data layer as opposed to a soup of meaningless HTML tags.
There were ideas for a distributed P2P social network built on the semantic web, standardized ways to write articles and blog posts, and much more.
If that had caught on, we might have saved ourselves a lot of trouble continually reinventing the wheel. And perhaps we would be in a world without walled gardens.
i think you're confused. the killer app is everyone following the same format, and such, capitalists can extract all that information and sell LLMs that no one wants in place of more deterministic search and data products.
Graph Based RAG systems look promising https://www.ontotext.com/knowledgehub/fundamentals/what-is-g...