LibreOffice opens it right up. It's support for old document file formats is really excellent. I keep it around for just this purpose. https://imgur.com/a/JENgq6V
But I also love using BasiliskII and InfiniteMac emulators!
LibreOffice opens it right up. It's support for old document file formats is really excellent. I keep it around for just this purpose. https://imgur.com/a/JENgq6V
But I also love using BasiliskII and InfiniteMac emulators!
I am deeply disappointed that a company like Microsoft doesn't make a point of Microsoft Word being able to open any document created by any version of Word, no matter how ancient it is. I think they have the social/historical/economical responsibility of doing so.
If they are worried about vulnerabilities in the old parsing code, move it to an external process, run it under isolation in a sandbox to spit out a newer readable version on the fly, but don't eliminate this capability from the software.
EDIT: zokier pointed out to me that the desktop version of Word opens the file fine, it is only the web version that doesn't. So, consider this post void.
EDIT 2: Well it opens the document, but is not able to display or print the embedded graphics, it seems.
You missed the fact that the real Word does open this file just fine, its just the toy web version that has issues (and maybe Mac too but eh)
Yes, it opens it and throws away the graphics, so not "just fine".
If we go into splitting hairs, it doesn't really throw the graphics away, it simply lacks the "filter" to display them but they are there still, as in it recognizes the graphics object correctly and lays out it on the page. Based on the error message, hypothetically I suppose you could even make a custom filter to handle the object.
But this really goes more into the facet of Office files that allowed embedding pretty much anything into them, and relying on this "filter" system (I guess OLE) to handle embedded objects. So while the DOC file itself is getting parsed and rendered pretty much perfectly, the embedded objects are another story.
In the same sense I'd say browser might open some HTML page "fine" even if it doesn't know how to handle some image format that is used on the page; it'd still handles the HTML correctly.
Makes me wonder if the graphics are in PICT format
I think they are. You can even find some PICT files inside the ODT in the github from TFA
if you read the blog, the main point of OP’s project was to get at the diagrams, so hardly “splitting hairs”.
Oh, really? I stand corrected. Thanks for pointing this out.
No, you're not wrong, another commenter points out that latest Word opens the document but doesn't display the graphics.
This is expected with the web versions of Office. They can read (certain) binary Office formats but not edit them. The web version of Office is designed for OpenXml file formats.
The Office 365 Mac version refuses to open it.
You can recover text but the result is horrible. No graphics and all formatting lost.
Old file formats have security vulnerabilities. The online version of Word is designed for docx only, although it can open certain binary documents.
Fundamentally, a data file format can't have vulnerabilities. At most it can be prone to vulnerabilities, but more often it's just that popular implementations are bad.
Sorry, the Word parser does and Microsoft did not feel it important enough to fix as their focus is on OpenXml formats.
Then that's on Microsoft. There's no fundamental reason why a secure parser can't be written for old formats.
Why would Microsoft do that? It makes zero financial sense to continue with a parser that may need to be rewritten from scratch for a ~30 year old format.
they can do what they want, and i'll continue on my 2 decade long decision to never give microsoft money, for anything. Same way i'll never give propellerhead another dime, or Plex[0], or any of these other consumer-hostile companies.
I don't trust MS to maintain software, even though as far as that goes, they're better than a lot of companies that have been writing software for decades. "time marches on" is silly when we have millions of times the compute, storage, and transit speeds available to us. I also don't see why people see the need to shill for multi-billion dollar companies.
What microsoft should have done is trademark a new name for their word processor the second they made the decision to not open word .doc from older versions. That way there's no confusion.
[0] having a hard time remembering the name/company of the software i purchased for in-house streaming over a decade ago. Plex is still a hassle to use for in-house streaming compared to the "service" or whatever they're selling. Unfortunately Synology seems to have grown weary of releasing a version of their client for every newfangled device that comes to market, so i'm stuck with plex on my TV; that is, unless i want to use a stick/set-top/computer attached to it.
I don't trust MS to maintain software
Then you should champion removal of any "old" software they have that is under maintenance-only status. You wouldn't want security vulnerabilities to go unfixed, would you?
What microsoft should have done is trademark a new name for their word processor the second they made the decision to not open word .doc from older versions. That way there's no confusion.
That makes zero sense. Word is still Word. It performs the same tasks (and more) as Word 1.0 did.
And Word today still reads/writes .doc, just not versions that are that old.
No they don't. Parsers can have security vulnerabilities, but you can fix those, and there's little reason why a parser for an old format would have more vulnerabilities than for a new format. Some formats can also have certain (intended) features that have security implications, but parsers can choose to disable them if they are concerned.
Many old formats were essentially just binary dumps of memory, or something not far removed. Documenting the formats was not a standard. Yes, I agree that there is a social responsibility, but having worked in digital archiving I can tell you that the olden days were really, really messy. No, really.
This is the point that many of the commenters who criticize Microsoft are missing, and it's why the old formats are not enabled by default (security vulnerabilities) and why it's not as simple as creating a parser.
Microsoft still deserves criticism for designing their old word formats so badly. It was a design choice to turn documents of mostly text into obscure binary formats that were badly standardized and maintained.
no they don't.
They were effectively working at embedded scale, trying to capture state within tremendously limiting constraints.
This is a case of interpreting past decisions based on current criteria, when those same conditions would have prevented modern methods from being implemented.
Not true at all. Some of Microsoft's best minds created extremely ingenious methods that allowed early word processors to be usable on files that were dramatically larger than what would fit in memory. OSes didn't support suitable performance via VM infrastructure at the time. It was clever, outside of the box thinking that got MS to be able to beat WordPerfect (a worthy competitor) and the many other also-rans.
There was (contrary to popular belief) not a deliberate strategy to limit interoperability. It was simply the reality of the approaches utilized that made them tightly coupled to the MS Word codebase and less standardizable than would have otherwise been ideal.
Source: one of the guys who worked on it at MS.
Word 4.0 ran from floppy disks on PC XTs (8088 CPU) with 320 KB of RAM. You can't afford an elaborate parser in such limited memory, or you'd have to swap out its implementation on floppy on every load and save. Just running the parser would have slowed down document loading significantly. The floppy disk capacity also wasn't much larger. You already had to swap the disks for doing spell checking or similar. For comparison, the first web browser (WorldWideWeb) was an executable of about 1 MB and ran on a much faster 32-bit NeXT computer with 8 MB of RAM and a hard drive.
Microsoft still deserves criticism for designing their old word formats so badly.
I would love to see some modern devs try to write software for a 68000 system with only 512K of memory
You don't have to go anywhere near 1990 to find issues with modern Microsoft (especially cloud) apps opening documents created in older ones!
Indeed. If I ever end up in the cloud version of Word (or indeed any other app) my first instinct is to click 'Open in App'.
Is there any commercial software development company with better backwards compatibility creds than Microsoft? I'm genuinely curious.
Great cautionary tale about how quickly formats get obsolete, especially closed source ones.
I use markdown, plaintext and png for all the documents I need to store long term.
Even if these formats disappear, I could trivially reimplement my own parser.
implementing a markdown parser is far from trivial
implementing a parser that tricks people into believing it parses markdown because it acts like a markdown parser in simple cases is what is trivial
it's likely that your markdown data will indeed be recoverable, but if you're generating it yourself, html is probably safer
But the Markdown document doesn't actually need a parser to still be usable. Markdown as a whole imitates the conventions of typed text. The table formats would even be usable on an old typewriter.
markdown doesn't have tables, although you can include html <table> tags in it.
perhaps you mean
indented fixed-width blocks
you can use for ascii art
or typewriter-style tables?
Sure it does. It may not be in the original standard, but many/most parsers support tables that use pipe characters to separate columns.
And regardless, markdown documents -- including the table extension -- are readable without a parser.
extensions to markdown aren't markdown; that's why commonmark is called commonmark
not being able to tell which variant of a language is in use is one of the biggest problems for archival, and in particular various extensions to the microsoft word format (all made by the same company!) were what made jgc's archival work so difficult in this case
language extensions are an especially bad problem when there's no extension mechanism—because sometimes a pipe is just a pipe. but unfortunately markdown's only extension mechanism is html
It's called CommonMark because Gruber insisted. Not because extensions to markdown aren't Markdown®, which no one cares about, and not because it isn't markdown in the ways that matter.
Ironically, his objection was to the idea of a single and rigorous standard, you'll note that Git-flavored markdown never drew his wrath. And yet you're treating him and Swartz's implementation as if it was such a standard. Which it is not.
Parsing markdown is multiple orders of magnitude easier than Microsoft Word, especially before docx.
And it has the merit to be human-readable in plaintext!
that's probably true
Or org-mode format. Then you even get tables properly.
The (only) issue is that Markdown isn't a format, it's a loose family of formats with many extensions. Implementing a parser Commonmark is not an especially difficult task in the grand scheme of things, it's quite well specified and has an extensive test suite.
Although I find myself wondering what this "parsing Markdown" business is even about. It's perfectly legible as plain text, that was the main design principle behind it. If the goal is to have your data accessible in future, if you can read it now, and you don't go blind, you'll be able to read it later as well.
strictly speaking, markdown is a superset of html
Isn’t markdown plaintext? (I didn’t downvote.)
Isn’t HTML plaintext?
;)
Yes, but not intended to be directly human readable by contrast.
If it wasn't intended to be human readable it would have been a binary format.
It may have been intended to be human readable, but it failed dismally in that goal.
Even before the web turned into the javascript infested swamp that is now, the tags having the same visual weight as the text they enclose made it tiring to read.
Markdown's genius is in the formatting tags being almost no hindrance to readability.
I definitely agree that Markdown is more readable than markup, but personally I abhor what some frameworks do to HTML. I make sure my HTML is legible! There is even a benefit when it comes to hyperlinks in that you can see the URL!
As a society we should have been thinking more about digital preservation since the time we started eschewing archiving hard copies in paper.
People who don't know history are doomed to repeat it, but how can our future generations learn from our mistakes if all our documents are unreadable or lost by their time?
Are you just casually dismissing all the work that digital archivists have done over the past couple of decades?
https://www.loc.gov/librarians/standards
https://www.loc.gov/preservation/digital/
https://www.loc.gov/programs/digital-collections-management/...
and that's just Library of Congress, they are hardly alone in this field
For such reasons, I think it is a good idea to use plain ASCII text format to document protocols and file formats as much as possible. (It is especially a problem if the documentation of a more complicated format or protocol requires use of that format or protocol itself.)
There is also Just Solve The File Format Problem wiki (which I have added stuff to), although it uses HTML, and does not include full specifications for all file formats (but it does for some of them), and in some cases are links to external files, but it is helpful to find information about file formats anyways.
The problem with markdown is that if you want to convert it to a formatted set of pages, the output will differ based on the version of your markdown converter. Similarly for HTML and also for plaintext to an extent. A PDF should remain exactly the same forever, but AFAIK the only properly editable document type that really keeps exactly the same formatting over time with updated software releases is TeX/LaTeX. In fact, that is a guarantee - if a LaTeX version doesn't produce exactly the same layout as a previous version for the same input document, it's officially a bug.
As a testament to Microsoft's backwards compatibility: the file opened mostly fine in the Windows version of Word (version 2401), and the layout seems to be identical to the PDF of the article. It did block the file format by default but that was easy enough to allow.
The graphics did not open however, due to a missing graphics filter for the Microsoft Word Picture format. Seem it's been deprecated for a while now but Word 2003 should be able to open it? Which is old, but not that old not to run on modern systems.
Installed a copy of Word 2003, document opened flawlessly immediately with default settings. Saving it from there converted it to a modern .doc which I could open with Office 365 and convert to PDF etc.
I think the moral of the story is that the Windows Office team seems to spend a bit more time on backwards compatibility.
I would be interested to see a PDF generated from Office 365 to understand how flawless it really is.
Here you go, exported from desktop Word to PDF.
https://drive.google.com/file/d/1lnaSr22l3kQbmFHnxg3Ggd3-46v...
Full version string:
Microsoft® Word for Microsoft 365 MSO (Version 2311 Build 16.0.17029.20140) 64-bit
Right. So all the images are missing. LibreOffice still gives the best conversion I think.
Yeah, that’s why you need Word 2003 for the images, it’s a deprecated format full of security holes I guess.
Ah… yeah I was wondering why they would deprecate an image format at all. My understanding is that Word in the old days serialized what was in memory, maybe that was a little too exploitable with images?
Not sure just curious not even sure where to look that one up honestly.
Digging through the files a bit I think the images are in PICT format which is very specific to Macs (the original ones). Its not surprising that modern Word doesn't support those that well as they are actually somewhat complicated kinda-vector image format. I am surprised that even Word 2003 implemented PICT on Windows.
It's not “kinda-vector”, it's a metafile format for QuickDraw operations (Windows did the same later with WMF, which was a list of GDI operations).
http://fileformats.archiveteam.org/wiki/PICT
Imagemagick supports it. What's more important, QuickDraw source is available, so not only we can have “some” conversion, we can also reason about its correctness (to some extent — according to comments, it's from 1982-1985).
https://computerhistory.org/blog/macpaint-and-quickdraw-sour...
Extracting raw embedded PICT files from the document and working with them would be the best way to get proper charts. To see what appeared on paper, we can direct emulated system output to an emulated printer, or capture the PostScript commands and rasterize them at the resolution that was used by device available to the author. It is well known that Word for Windows stored last used printer settings in the document, so it could be the same for files produced by Mac version.
(M-hm, it says “Laserwriter” at 0x10097. Maybe they all do.)
Because Microsoft made the most popular document editor for both Windows and Mac, they had to deal with interoperability of two versions of their own software. Supporting WMF/EMF on Mac meant they had to drag GDI implementation along with Office (luckily, the reference could be grabbed from their colleagues). Supporting PICT on Windows meant they had to re-implement QuickDraw primitives.
https://en.wikipedia.org/wiki/History_of_Microsoft_Word
https://news.microsoft.com/1999/04/26/office-98-built-for-th...
It is totally possible that Office applications used built-in PICT parser even on Mac to make things simple, and not rely on 15 years of compatibility layers in the system.
Probably the completely best would be to use LO for the images and Word otherwise... needs some manual twiddling but I suspect that way you can get pretty much perfect layout and images.
Office applications up to (and probably including) version 2010 break and crash on latest Windows versions. That behavior varies based on Office service packs and updates installed. You were lucky to be able to just save the document.
Unless, of course, you've found some portable version on the net that packs ThinApp and an assortment of old system libraries under the hood.
I had no problems installing a vanilla Office 2003 on Windows 11 23H2. Got the iso from archive.org and it installed without a hitch.
This has not been my experience, I'm wondering where you heard this information from?
I have Office 2003 (or maybe it's 2007?) installed on my work computer, no problems. It even happily coexists with whatever modern Office version I have installed on there too.
I also have Office 2010 installed on my home computer and my husband uses it all the time. No issues.
Both computers are running Windows 10, so I guess it's not technically "the latest version."
I think they spend extra time creating those backward compatibility problems just to make it harder to create a perfect third-party tool.
[1] https://www.infoworld.com/article/2618153/how-microsoft-was-...
For anyone interested, here's the document in modern Word format, with all vector artwork and fonts intact:
https://jasomill.at/proposal.docx
To convert it, I first opened and re-saved using Word 98[1] running on a QEMU-emulated Power Mac, at which point it opened in modern Word for Mac (viz., version 16.82).
The pictures were missing, however, with Word claiming "There is not enough memory or disk space to display or print the picture." (given 64 GB RAM with 30+ GB free at the time, I assume the actual problem is that Word no longer supports the PICT image format).
To restore the images, I used Acrobat (5.0.10) print-to-PDF in Word 98 to create a PDF, then extracted the three images to separate PDFs using (modern) Adobe Illustrator, preserving the original fonts, vector artwork, size, and exact bounding box of each image.
At this point, restoring the images was a simple matter of deleting the original images and dragging and dropping the PDF replacements from the Finder.
For comparison, here's the PDF created by Acrobat from Word 98 on the Power Mac
https://jasomill.at/proposal-Word98.pdf
and here's a PDF created by modern Word running on macOS Sonoma
Did you attempt to extract the pictures so they could be converted directly by another program? Archive Team says that LibreOffice can read vector PICT files[1]. And then saved as SVG. Of course you still have the font problem if it has text. I hadn't thought of using PDF to preserve vectors, but of course it does, as well as embedding the fonts.
Good question. I saved the original document as RTF and extracted what I believe is the raw PICT binary data, but quickly decided on the Acrobat route when I realized I didn't know of any software that could easily convert PICT to a more modern vector format (other than by printing the PICT to Acrobat PDF, but that's essentially what I did in Word with extra steps).
If you want to give it a go, here's the raw PICT data from the RTF:
https://jasomill.at/Picture1.PICT
(extracted from RTF tag \pict\macpict\picw513\pich459)
https://jasomill.at/Picture2.PICT
(\pict\macpict\picw410\pich327)
https://jasomill.at/Picture3.PICT
(\pict\macpict\picw420\pich291)
and here are MacBinary-encoded[1] PICT files containing the same data:
https://jasomill.at/Picture1.bin
https://jasomill.at/Picture2.bin
https://jasomill.at/Picture3.bin
[1] https://en.wikipedia.org/wiki/MacBinary
Encoding is required because the PICT file format stores image data in the file's resource fork[2].
As an aside, MacClippy 98 knew the score:
MacClippy seems like a useful bot. Similar to AI chat windows on websites without the second guessing.
The sci-fi job of digital archaeologists are becoming real!
any time you dig through layers of git commit history to answer a question, you are performing archaeology
I did not expect to read about the LHC in such an 'old' document. I couldn't find (in the time I was willing to spend during work) when the LHC project started to this already be relevant in 1990 (20 years before it started, which is also longer than I would have guessed)
Marvellous. Thank you!
Somehow the author doesnt recognize that emulation is a legitimate answer to this question. Yes he was able to open the document, by using the original software on a highly accurate emulation of the original system. Everything beyond that point is a different question: can we get it inside of a modern word processor.
Emulation is starting to get gaps too... for example, running Windows 95 in an emulator on a modern machine is getting harder and harder (emulators like vmware and virtualbox don't emulate the CPU speed accurately, which causes the system not to boot, and they also don't emulate various paging behaviours of old intel CPU's accurately which causes windows applications to crash within a few seconds of starting).
There are binary patches to windows 95 to fix these issues, but as the system gets older it's less likely people will put effort into binary patching it for compatibility with modern systems. And if it were more obscure, you'd be SOL.
PCem is far, far better for Win95 emulation - it can handle a P2 233 and a Voodoo3 fairly accurately - and tons and tons of hardware on top of that.
It’s amazing. I keep a 95 / 98 and some other vintage machines around as a hobby, but being able to play Unreal in an emulator with 3D acceleration blows my mind
How have you found the Voodoo 3 emulation? I have found it a bit ropey in 86box/PCem - but I find voodoo 1 or 2 works really well.
running Windows 95 in an emulator on a modern machine is getting harder and harder (emulators like vmware and virtualbox don't emulate the CPU speed accurately, which causes the system not to boot, and they also don't emulate various paging behaviours of old intel CPU's accurately which causes windows applications to crash within a few seconds of starting)
I thought the normal way to run Windows 95 was in dosbox?
Whole system emulation like 86box does a much better job of emulating older hardware and OSes - I use it quite a bit for DOS/Win3.11/Win9x era stuff.
Those are virtual machines, not emulators. If you use a proper emulator like PCem or 86box, Windows 95 works fine.
Sort of. What I wanted was to be able to get a PDF version of it. I was hoping that a modern word processor would read the file format, and LibreOffice did. But it's also true that using emulation I was able to get a PDF (albeit one that has different fonts).
it's also true that using emulation I was able to get a PDF (albeit one that has different fonts).
Maybe you needed to have the right fonts installed in your emulated mac? Another comment in this thread pointed out this
That way I can see actual fonts, font sizes and layout to confirm how the document should have looked.
Or you would if you had the original fonts. Word 4.0 was released for System 6 with support as far back as System 3.2. Fonts at that time had separate screen and printer files for the different output resolutions. If you're missing the printer font it'll print a scaled (using nearest-neighbor) rendering of the screen font. If you're missing the screen font it'll substitute the system font. (Geneva by default, as seen in the screenshot.)
In this case, only the well-known Palatino and Courier typefaces are needed. But LibreOffice substituted Times New Roman even though I have Palatino Linotype installed.
That may go some way to explaining some of the differences I see, but the main thing I was looking for in the emulation was the font sizes.
Doesn't the font matter almost as much as the font-size setting for font sizes, given that different font families can have wildly different metrics at the same font size?
I bet it does. I should redo the final part after installing the required fonts.
This is probably because the (internal) name of Palatino Linotype is "PalatinoLinotype" (for the version shipped with Windows) or "PalatinoLTStd" (for the Adobe OpenType version).
In the absence of a hard-coded special case, font matching based on common prefixes could easily match something inappropriate, such as — taking the first example I see on my machine — mapping "Lucida" to "LucidaConsole", when almost any proportional sans-serif font would arguably be a better match for the document author's design intent.
Then again, even exact name matches provide no guarantees. For example, Apple has shipped two fonts (internally) named NewYork: the TrueType conversion of Susan Kare's 1983 bitmap design for the original Macintosh, and an unrelated design released in 2019.
It's more that I half-expected well-known mappings to be baked in. Like "Times" -> "Times New Roman".
Didn't they also name one of their new fonts "SanFrancisco" much to the ire of Susan Kare fans.
Yes, but the current OpenType San Francisco fonts use "SF" in their (display and internal) names, so no naming conflict exists with the original "ransom note" bitmap font.
Also, as far as I know, of the original Mac fonts, Apple only ever shipped TrueType versions of Chicago, Geneva, Monaco, and New York. And I'm not aware of any OS with native support for both OpenType and classic Mac bitmap fonts (conversions are always possible, of course).
One underappreciated (though mentioned) hero in this little saga is the venerable file(1) command.
proposal: Microsoft Word for Macintosh 4.0
It's so incredibly useful and so easily overlooked. I almost reflexively reach out to it when I'm curious about a file and the information it returns is just sufficient to satiate my curiosity and be useful.I agree, file is such a great tool.
I have cursed so many times in the past when I sat in front of a work computer that ran Windows and didn’t have this tool easily available. (Later on, WSL made life easier, but now I’m luckily nearly Windows-free.)
One might even say that file has a lot of magic in it.
file has a lot of magic, but a file typically has only one magic.
I'd say it has a number of magic.
Definitely uses magic to do its work.
If you wanted exactly what would have been printed, on the emulator running Word for Mac 4.0 you should be able to install a print queue that can generate a .ps (Postscript) file, which would could be converted to PDF.
Or Acrobat may be available for that old of an OS and would have a virtual print driver to go directly to PDF.
I know I have running Macs with Word 5.1a which I consider the last Word version needed. I'm sure I opened Word 4.0 files.
Yes, a few years ago I helped a friend recover a bunch of old documents. The solution was to use Mac Word 5 to open the Word 4 files and save them as something newer versions could read.
Ah. Great suggestion! I just used Print2PDF to make a PDF from Word. Will update the blog.
https://web.mit.edu/ghostscript/www/Ps2pdf.htm
Or, if you prefer to do more tweaking yourself, dive into the Ghostscript deep end :)
This rises a potential problem, often underrated by companies: some have backups with infinite retention.
It is common to have backups with retention of 10 years, some may have 20 years for legal reasons… but the majority of people don't understand the difference between "readable" and "usable".
Of course, it depends on the data… And there are companies backing up whole virtual machines with infinite retention, believing to be able to run them: it is hard enough to restore a vSphere 5.x machine on a brand new vSphere 8, I really don't understand this waste of space.
If you backup all, you can sort later, and even eventually never. It costs 1 USD per month at Google Cloud to store 1TB of data.
At this price it's not worth sorting, when one single devops costs 100 USD+ per hour, not including the opportunity cost of not working on something more productive (and less boring for the developer).
Then X years after the company is acquired, or sufficient time has lapsed, you can delete / drop the data without sorting.
Regarding virtual machines, if it's VMDK for example, you can read the raw disks without booting it, and again, it's not worth taking a risk to lose data to potentially save 10 USD per month, which is similar to one developer taking one beer extra at a team event.
if it's VMDK for example, you can read the raw disks without booting it
Yes, but that's the difference between "readable" and "usable". Many companies don't realize the technical difficulties to be able to run the VMs. They just expect that it will work, if needed.
Often an old file or disk image is tiny compared to modern file sizes.
So the waste of space is more of an administrative character than a waste of disk space.
Tragically, Postscript support has been largely removed from MacOS now. Apparently the language was weird enough that supporting it made some (in)security hacks possible. I guess I'm old ! I remember first finding out about it in 1986 when is very "leet". Postscript printers were big $.
I say tragically because Postscript was pretty key in making DTP as compelling as it used to be, which kind of saved the Mac in terms of being the "killer app" for it.
I think you may be able to run some kind of postscript support in some tool from Adobe, or even Ghostscript. And probably, the newer software is better, but it's sad that you can't view a postscript file on macOS out of the box now.
While I agree — my first exposure to PostScript as a programming language was playing around with examples from the Adobe "blue book"[1] over a bidirectional serial connection to a LaserWriter sometime in the '80s — nothing in this document requires PostScript.
The embedded images are in PICT format, and TrueType versions of the three fonts used (Courier, Helvetica, and Palatino) have shipped with all versions of the Mac OS since System 7 in 1991.
And while Word 4.0 shipped in 1989, so did Adobe Type Manager[2], which supported Type 1 fonts onscreen and on non-PostScript printers, though to get a Type 1 version of Palatino for ATM at that time you'd have also needed the Adobe Plus Pack[3] (or possibly acquiring Palatino by other means; I don't recall when Adobe started selling individual fonts and the Font Folio).
[1] https://archive.org/details/postscriptlangua00adobrich
[2] https://www.nytimes.com/1989/12/19/science/personal-computer...
Your information is much more detailed and specific. I was just giving an example of the loss of support for old software/formats. I didn’t mean that postscript support was involved in this particular case.
or possibly acquiring Palatino by other means
Relevant: The Palatino FAQ (1998)
https://web.archive.org/web/19990202052926/http://www.mindsp... https://news.ycombinator.com/item?id=24005172
libreoffice opened it.
Sure, but the layout was screwed up and the fonts and sizes were wrong.
Certainly this is helpful: it's better to be able to open a document and then have to manually fix those issues than to be unable to open it at all. But it was far from perfect.
agreed, but you could probably export as rich text or something.
It's orders of magnitude better than "I can't open this file at all, -1"?
That Mac Word screenshot gives me claustrophobic flashbacks to trying to work on those tiny screens in middle school computer lab, writing science fair papers.
It wasn't so bad. It's better now, but it was fine back then.
I consider it more of not knowing how much better we could have had it. Small monitors were "normal." But I imagine people who got to work with the Portrait Display[1] (an impressive 640x870 resolution!) felt then as we do now when they had to switch back to the internal screen.
[1] https://wiki.preterhuman.net/Apple_Macintosh_Portrait_Displa...
Heh, that screenshot is relatively high-resolution for the time in question, too. 800x600 maybe? The compact Macs were 512x342: https://www.betalogue.com/images/uploads/microsoft/pce-mac-w... (The toolbars, rulers, etc., could be hidden in the settings.)
"Here's a 4000 year old letter from a merchant to his partners describing how to avoid taxes by smuggling goods in their underwear." ( https://www.britishmuseum.org/blog/trade-and-contraband-anci... )
vs
"Not sure if it's possible to read this 30 year old file!"
I get the point you're trying to make, but your former example is rare. While there are more exceedingly-old paper records that are still around and have been preserved than we might expect, we've lost so, so much. Paper and ink (and variations on that) are both fragile.
Digital documents are otherwise easy to preserve indefinitely, if care is taken up-front to choose a simple document format that is likely to remain parseable (or at least documented) for a long time. And even when you don't do that, there's always the possibility of writing a parser later (assuming documentation is around) or reverse-engineering the format.
And in this case, the 30-year-old file did end up getting opened, albeit not as trivially easily as one might hope.
but your former example is rare. While there are more exceedingly-old paper records that are still around and have been preserved than we might expect, we've lost so, so much. Paper and ink (and variations on that) are both fragile.
Depends what you mean by "rare". Ancient Near Eastern correspondence isn't rare at all, precisely because they didn't use paper. (And they went to war a lot.) You seem to be writing as if that letter was a paper document, but it isn't. Paper records that old only exist in Egypt.
Digital documents are otherwise easy to preserve indefinitely, if care is taken up-front to choose a simple document format that is likely to remain parseable (or at least documented) for a long time.
This isn't a good match to the example either; Ancient Near Eastern records had to be deciphered. (The Semitic ones had to be deciphered. The Sumerian ones benefited from surviving documentation, but we had to find that and learn how to read it.)
The original example isn't particularly apt; reading this 30-year-old file, or a similar one, is a task that one guy can do in less than a week using existing tools and know that he's done it correctly. Reading a 4000-year-old cuneiform letter was a much larger project than that.
Until they find a storage medium that don't deteriorate through time, nope, digital storage is still worse than plain paper or clay, in losing its storage capacity and it is enough to have one bad bit.
I was able to download and transfer the proposal document to a Mini vMac emulator, set the Finder's type and creator to those of a Microsoft Word 5 document i.e. respectively WDBN and MSWD, and finally open the document with Microsoft Word 5 for Mac to export it as a RTF document.
Here you have it: https://neko.melomac.net/tmp/proposal.rtf
I certainly agree opening a document from this Macintosh era should be, by far, easier than the process I detailed below, but this is how it is ¯\_(ツ)_/¯
Thanks. Unfortunately, the images are all missing.
It is even more frustrating that the image are in the document, and Microsoft Word for Mac would still display them accurately.
And LibreOffice would display the images in the RTF document in a different size (a tiny block).
If my old Mac display would work, I could have been able to send the document over to CUPS via Netatalk, and make a PDF out of it. Unfortunately Mini vMac can't connect to that VM on the LAN...
Anyhow, it is scandalous that opening legacy documents became such a PITA.
Is there a way to make a PS or PDF file using the actual Word for Macintosh 4? I'd think that would be the definitive render.
Keep reading…he did that. But it’s not clear he had the right PS fonts installed.
I probably did not as I did it really fast after someone suggested it.
ITT: people repeatedly making the same mistakes, misunderstanding archival and also ignoring glaring problems with converted output
Just ask The Neural Net to draw something appropriate to illustrate the given text. There's little noticeable difference.
(ducks and runs away)
Interestingly, the latest and greatest version (desktop app via Office365) of Microsoft Word on Mac appears to know what it is but refuses to open it.
If you drag the file onto Word, it launches a dialogue box telling you "proposal uses a file type that is blocked from opening in this version" along with a link to the supporting page on the Microsoft website[1].
[1] https://support.microsoft.com/en-us/office/error-filename-us...
telling you "proposal uses a file type that is blocked from opening in this version"
"blocked"?
That sounds like Microsoft has some IP problems with their old software.
Extremely interesting and thank you for doing this. I feel strongly that this goes to show just how important preserving historical software and emulation is. I have dabbled myself with old Windows 3.1 software for this very reason. We really, truly are going to have a period where web application driven software just disappears and we wont easily have this retro computing view of these decades in a short time from now.
I also think it is important to show the importance of open formats or open source in general if we want future generations to read our documents or run/compile/understand our software.
I have a few Wang WP Documents from decades ago. I could not open them at all. Libreoffice thought they were corrupted Word Docs.
So the concern about some document formats being unreadable is still valid. Who knows what obscure proprietary formats exist out there.
Wasn't Multimate a Wang clone? Of course, finding an 8" floppy drive might be difficult.
I've been collecting notes about this file for a few years.
Some of the information in this post was previously covered right here in the comments on HN a few years back: <https://news.ycombinator.com/item?id=12793157>
The top reply there links to an online file(1)-like tool that identified it as a MacWrite II document. Last time I checked, the tool was updated and identifies the file as "Word for the Macintosh document (v4.0)" (pretty much what my system's file(1) says about it).
We actually have a scan of Robert Cailliau's copy with his handwritten notes (including the infamous, "Vague but exciting..." remark). It's neither 20 nor 24 pages but instead 16 and differs in several respects: <https://cds.cern.ch/record/1405411>; the version linked in the post and described erroneously as "the original" on w3.org clearly isn't the original and has been changed in several ways besides just "the date added in May 1990". Rather, the May 1990 version here is the second revision of the original that was first passed to Cailliau, and by November 1990 Berners-Lee and Cailliau were calling this second revision "HyperText and CERN"[1][2].
That is, "Information Management: A Proposal" is the one authored solely by TBL and given to Cailliau. It's not the version that appears here. "HyperText and CERN" from May 1990 is what we're looking at here, but was mistakenly also published as "Information Management: A Proposal". Later, TBL and Cailliau coauthored a joint work called "WorldWideWeb: Proposal for a Hypertext Project"[1][3] that referenced "HyperText and CERN" by name.
TBL is also known to have used WriteNow—there are lots of .wn files littering w3.org. I now believe (since last summer) that it's likely that TBL authored this revision of the proposal in WriteNow (even if he didn't save it in the WriteNow format) or used WriteNow at least for the RTF export. Refer again to [2].
1. <https://cds.cern.ch/record/2639699/files/Proposal_Nov-1990.p...>
We actually have a scan of Robert Cailliau's copy with his handwritten notes (including the infamous, "Vague but exciting..." remark).
Sorry, it was late when I wrote this. That was actually Mike Sendall (though TBL and Cailliau did collaborate on the others).
I'm surprised he didn't try an intermediate version of Word -- not the original Word 4.0 for Mac, but not the current online version of Word either.
I had a lot of old Word 4.0 for Mac files at one point, and remember some point in the late 1990's or early 2000's opening them all up in a version of Word for Windows, and then re-saving them in a more up-to-date Word format. I believe there was an official converter tool Microsoft provided as a free add-on or an optional install component -- it wouldn't open the "ancient" Word formats otherwise.
There's definitely going to be a chain here of 1 or 2 intermediate versions of Word that should be able to open the document perfectly and get it into a modern Word format, I should think -- and I'm curious what the exact versions are. (Although as other people point out, if you don't need to edit it, then exporting it as PostScript in Word 4.0 and converting it to PDF works fine too.)
As I've discovered while playing with this document and reading this thread:
Current Word for Mac blocks opening the file under discussion, with no obvious workarounds.
Current Word for Windows will only open the file with non-default security settings, and won't render the images at all.
Per Microsoft, PICT image support was removed from all versions of Word for Windows in August 2019[1].
The current version of Word for Mac fails to render the images with a misleading error message ("There is not enough memory or disk space to display or print the picture.").
As for fonts, they should render fine assuming you have matching fonts, where "matching" is defined by some application- or OS-specific algorithm, e.g., a post above indicates LibreOffice (on Linux?) substituting Times New Roman for Palatino when Palatino Linotype was avilable, whereas current Word on Windows 11 has no problem rendering Palatino as Palatino, presumably using the copy of Palatino Linotype installed with the OS.
Finally, if matching spacing (character, word, and line), line breaks, and page breaks is important, you should definitely open the document using as close a version of Word as possible with the exact fonts used when creating the document installed.
Oh, and hope the original author didn't rely on printer fonts without matching scalable screen fonts available, or else you're probably SOL unless your goal is printing to a sufficiently similar printer.
[1] https://support.microsoft.com/en-gb/office/support-for-pict-...
Yet another example of why Apache needs to take OpenOffice behind the barn.
You mean retire it to a nice farm upstate, little Jimmy might hear the shotgun blast!
This does an “okay” job at converting the document: https://archive.org/details/KeyViewPro
Here is the converted PDF: https://smallpdf.com/result#r=091f20f23de353fac21376a3a49a60...
Not sure that's really true. It did something but the images are a mess and a lot of formatting is gone. I think LibreOffice is still the winner here.
[silly pre-coffee post deleted]
Word is already available on the Infinite Mac as it's under Productivity inside the Infinite HD. No need to install it.
MS word for mac 16.16 opens it with the diagrams intact in "compatibility mode". The only issue is the text is indented slightly too far on the left.
Libre Office opens it with the same quality, but has some weird gray ghost lines around tables.
I downloaded the latest Apache OpenOffice and it did open the file
The last decade of Apache OpenOffice can VERY generously be described as "maintenance mode". Most of the pull requests are grammar and dictionary tweaks.
This is good.
It would be good to get some feature requests into libreoffice to fix the remaining mis-matches in the formatting.
This reminds me of my own screed of a much simpler document (an ASCII table generated as a printer test back in the late 1980s) that was not possible to render correctly some years later - https://bsdly.blogspot.com/2013/11/compatibility-is-hard-cha... - also contains a link to a further rant about other document formats that were supposed to be "standard" and "portable".
Amazing that you can just pop up an emulator in a browser window. Retro Mac emulation used to be such a pain in the ass.
https://www.ebay.com/itm/235033043066
The original word for macOS software seems more than available.
There’s a System 7.1 Mac SE/30 sitting 2ft to my right with Word 5 on it. Send it to me. I’ve got you. Using a combination of LocalTalk and two other computers on that shelf I should get it up to Office 2001 in no time.
It's an interesting problem we have with file formats.. Emulation saves us, but at which point will we need to run emulators in emulators to reach the documents ? I suppose it's still somewhat easier than trying to understand some symbols on a cave wall..
LibreOffice is amazing, beside being able to open many document formats, it can run headless and has command line options which allow automating some tasks such as converting format that would not be possible otherwise.
https://help.libreoffice.org/latest/en-US/text/shared/guide/...
https://opensource.com/article/21/3/libreoffice-command-line
I wonder if it would be a viable business to keep running versions of computers going back say 40 years and offering to recover and convert files for people. (Just getting stuff off floppy disks and Zip drives might be useful)
Today's historic working documents will mostly be SaaS hosted documents in systems like Google Docs, Notion, etc. In the future nobody will be able to open them. They won't exist, and the software won't exist, and there will be no way to restore it since the software is SaaS that can't be emulated or even installed anywhere.
WordPerfect claims the ability to open MS Word 4.0 files. The standard edition is currently $175. I'm not buying it, but if you're willing to spend $175 it might be something to try.
Somewhat off-topic, but I remember Word for Windows 6.0 would take considerable time (like a minute for a 10 page document on my AM386DX/40) to reflow paragraphs across page-breaks (trying to handle widows, orphans &c). If I made an edit to the first page and hit print before it was done, I would end up with a printed document that contained either duplicated or dropped lines at page boundaries.
Normally I have good success with abiword, but it completely barfs on this file; it seems to be falling back on its RTF support.
Now do one with Google Docs
Props to LibreOffice
Recently I was asked to locate an old form document which I found it was written in WriteNow for Macintosh, libreOffice opened it up easily (even without a filename extension) and except for some font substitutions the tables seemed to be all correct. Very impressive.
See also: "How to hire Guillaume Portes" [1]
(also "autoSpaceLikeWord95" in case anyone shares that specific brainworm with me and is Ctrl+Fing for it)
[1] https://www.robweir.com/blog/2007/01/how-to-hire-guillaume-p...
LibreOffice was the first thing I tried, and it worked with no problem.
Well, except for all the problems I outlined in the post.
headline says "open" and libreoffice opened it with no problem.
I simply opened the file with my hex editor. Problem solved. (sarcasm)
In the past, I have in all seriousness read Microsoft Word documents on Linux using less. I might have had LibreOffice installed, but it can’t run over SSH.
It works okay with most old school (pre-XML) ones, since the document text is in the file in plain ASCII amidst all the binary formatting stuff. For the new XML formats, less by itself doesn’t do anything useful, but unzip them and you can read the XML containing the document text.
Word supported a mode, in order to speed up saving, changes were appended to the file in a diff-like format. How could you know you were reading the right content if it could be overwritten later on?
I once negotiated a higher offer for a job because the company sent out an offer letter they'd done this with, where the deleted details for another offer gave me info about another role that made me (correctly) guess there was room to ask for more.
Sometimes “reading the right content” isn’t that important - e.g. “what is this random doc document about?” “oh, it is a design doc for the XYZ subsystem”. Unless the changes completely rewrote the document into a completely different document, which I expect would be rare
If I was going to use the document in anger, I would open it with something proper, of course
I know it's being pedantic, but it absolutely can, libreoffice will happily run over a ssh -X tunnelled X display.
Oh yeah, but that would require me to start an X server. Which I could do, but why bother when less does the job?
Also, less starts a lot faster than LibreOffice does
I actually opened it in emacs in hexl-mode before I ran the file command!
Yeah we read the article--- which matches your screenshot.
This is for all the TL;DR folks.
I think your summary is a bit short. Sure, LibreOffice opens the file but there are multiple problems with the formatting that need correcting. Your screenshot shows at least one of them (there shouldn't be any headers on the first page and the page layout should be different).
The question was "can we open it?"
The question is: is there a bug report?
Yes, the OP also mentions that LibreOffice opens it.
...but they also point out with LibreOffice that "Although there's something weird about the margins and there are other formatting problems." - which is also apparent in your screenshot? Certainly that level support for such an old proprietary format is pretty good, but I'm not sure I'd class it as "really excellent" with those issues.
I should have been clearer: what I meant was that its support for very many different old document formats is excellent. Atari ST, Amiga, Macintosh, and so on. The OP and you are quite right that it won't open the documents with exactly the right formatting, but it's good enough in a pinch so you don't have to learn how to use 40 year old computers. It's a good tool to have.
7zip has similar support for a wide range of compressed file formats, exes, data files, cabinets, and so on. Another good tool to save time and keep you on your modern operating system.
7zfm.exe (7-Zip File Manager) anyway, which I agree is very useful. I've wanted it in Linux multiple times to avoid creating loopback devices but seem to always find it's Windows only.
I was referring to 7z on the command line.
Yes, LibreOffice opened it right up with the wrong font sizes, headers and footers messed up, incorrect gutter and margins, and a bunch of other problems. But they were all fixable.
So does Word 2019 for Windows.
Is the formatting correct? Are the images visible? Because others report (see other comments) that Word opens the file but the images are missing. See the Word generated PDF here: https://news.ycombinator.com/item?id=39359079
Yes, you are right, apologies. I thought it wouldn't open at all, like in the screenshot in that blog post.
Give QEMU a try — current versions do a great job emulating a Power Mac, able to run the most recent PowerPC versions of both classic Mac OS (9.2.2) and Mac OS X (10.5).
With what command line?
Figuring out what to ask qemu to do (without libvirt!) is half the battle.
(Thanks though, I have something to play with tonight)
On macOS, I typically run it from an .app bundle containing a one-line shell script that execs the following script with the "-monitor vc" option (to enable access to the QEMU monitor via a menu command in the Cocoa GUI; when actively using the monitor, I run the script directly with the "-monitor stdio" option instead, as opening the monitor in the Cocoa GUI hides the emulated Mac's display):
Paths are (obviously) site-specific, realpath is the GNU version — used here to ensure nice-looking absolute paths in light of my heavily symlinked filesystem — and specific details (options supplied in no particular order, $workdir vs $here, etc.) are artifacts of hours of fiddling and not cleaning up afterwards.I'm currently running a version of QEMU recently built from Git, though I haven't changed this script in years.
For networking, I'm currently using the notarized tap kext bundled with Tunnelblick[1].
Finally, I'm currently using an Intel Mac, so YMMV with Apple Silicon or Linux, though I have no particular reason to believe any command-line changes would be necessary, other than the obvious -display change to something other than cocoa for Linux.
[1] https://www.tunnelblick.net/downloads.html
Well, StarOffice already existed back then. Now I wonder whether LibreOffice still has some early '90s third party format parsing code inside, or some reverse engineered compatibility and conversion code from much later Word version actually does the job.
Yeah, I stopped reading the article, downloaded the file, the only word processor is in Libre Office. It seemed to work fine so I didn't know what the issue was. Then I read the article and kept scrolling to the end where the author finally uses LibreOffice and it opens mostly okay.