We take the typical blog url design (/2024/08/14/slug) for granted but back in the very early 2000s pretty much every blog tool had its own URL design. Matthew Thomas back then took an inventory:
https://web.archive.org/web/20030810201315/http://mpt.phrase...
He was on the search for his ultimate blogging system, where this "cruft-free" URL structure should be used:
https://web.archive.org/web/20051107103030/http://mpt.phrase...
I could have sworn there was a changeset in which Matt Mullenweg was implementing those cruft-free URLs in his new fork called Wordpress, but trying for google for something with "Wordpress" from the early 2000s is basically impossible in 2024.
Update: I found this: https://ma.tt/2004/08/mike-on-uris/
He calls file extensions cruft, but i've come to value them. They are a simple way to indicate file type - desired or offered - which is easily understood by machines and people.
I currently work with an API which does a bit of content negotiation using the Accept header, so clients can request data in various formats - application/json for a snapshot, text/event-stream for an updating feed, or text/html for an interactive dashboard. I wish it didn't. I wish we'd just used file extensions. Trivial to use in a browser or via curl, trivial to implement on either side.
That's fine (and already common) for images, JSON, etc.
But nobody wants webpage URL's that randomly end in .php, .htm, .html, .aspx, and so forth. That's just noise that is both gibberish and entirely irrelevant to the user.
.htm and .html is relevant just like .pdf and .zip etc etc
But I agree about .php, .aspx and other extensions that are telling something about the server side. That’s irrelevant for the user.
it's _kind of_ relevant, if it weren't for the fact that the absence of any extension implies .html >99% of the time
Wouldn't that also include JSON for the other 47% of the time?
I think that refers mostly to the .php and .asp of the time. Those don't tell a thing to the user.
I want users to know I use PHP! :D
And I judge people who use ASP, lmao.
For _APIs_ I prefer to use both - the only downside is that resource names need to be restricted to _not_ include trailing `.{EXT}`s (either at all or limiting EXT to things that aren't valid content types).
E. g. `/books` - looks at the `Accept` header. `/books.json` - sets the `Accept` header to `application/json`. `/books.xml` - `application/xml`, and so on.
I guess this reflects a view of blogging that maybe is more what people today would use twitter or mastodon for, with lots of blogposts with the same title like "open thread" or "links for sunday". Today people mostly use blogs to publish essays, and then a slug based on the title should be sufficient, since you're not going to publish two essays with the same title. That's what substack uses.
I think the date is still extremely valuable. Knowing whether something is from last month or a decade ago makes a huge difference. It's also useful so that URL's can be sorted by date.
Also, "you're not going to publish two essays with the same title" feels false. If you write 1,000 pieces and use short titles and tend to write about the same subjects, it feels extremely likely that you'll wind up repeating titles.
I agree, but I think it's important to note that the date in the URL can also be misleading. For example, it's often assigned at time of creation. If that page or post gets updated years later, even if almost entirely rewritten, it still has the original date in the URL
If we're talking about blogs/news, they don't ever get almost entirely rewritten. The original publication is the only date that matters, and it matters a lot.
If we're talking about evergreen content like documentation, then of course you don't put dates in the URL. A small "last updated" on the page itself is appropriate there.
Unfortunately, this isn't the case. It should be the case IMHO, but it (currently) isn't. The SEO/marketing people nowadays (ab)use popular pages for the search rankings and update them regularly to keep the content fresh and highly ranked (since search engines give much preference to new content).
Also, even for strict blogs/news, it's not unusual for a particular post to be a draft for many months before publishing. Most serious blog will fix the date to match publish date, but that isn't what happens by default especially in Wordpress (which is the most important platform for blogs).
And it's sad how often one needs to use the URL to find the date, since many authors just don't put it on the page (corporate sites are particularly scared of dating their stuff)
Others seem to think just day and month is fine, as if the year isn't the most significant part. And if both numbers are <=12 then you have to go and find out what locale the author formats their dates in...
In fact, on my own blog, I have some recurring posts, e.g.,
https://www.dahosek.com/the-big-countdown/
https://www.dahosek.com/the-big-countdown-2/
⋮
https://www.dahosek.com/the-big-countdown-11/
Alas, the default URL scheme in Wordpress doesn’t include the date.
A useful midpoint is to use just the year. That way you get a fresh namespace on January 1st.
I use that for static files on my blog and it’s worked great for 20+ years:
https://static.simonwillison.net/static/2024/mlx-whisper-gpu...
https://static.simonwillison.net/static/2003/getElementsBySe...
Disambiguation is one thing, but as a reader, I really like having the date indicated in the URL for informational purporses. It's very helpful.
Search engines usually have a date filter. Here's DDG searching "wordpress" between 2000 and 2005.
https://duckduckgo.com/?q=wordpress&t=h_&df=2000-01-01..2005...
I'm guessing you were looking for:
https://wordpress.org/news/2004/01/cruft-free-uris-in-wp-10/
I assume Google supports something similar, but I've stopped using it.
How do search engines figure out the date of webpages that don't contain it in the metadata?
How do search engines figure out the date of webpages that don't contain it in the metadata?
Poorly.
I have a blog so old I titled it an "online diary." It pre-dates search engines, so they tend to date the diary entries (blog posts) based on first crawl. Which means lot of the dates presented by the search engines are off by several years.
The simplest version is recording the date they noticed a change in the page.
Well, arguably both Movable Type and Radio Userland's URLs were already pretty cruft-free. The success of Wordpress was mostly due to other factors (free, php, great feeds, great markup in default templates, great support for plugins).