Yeah, well... Just don't assume the doc/docx file you are working on with multiple revisions and comments and you are dutifully saving every 5 minutes will open next time you try loading it... Libreoffice will eat your document sooner or later if you edit a word file [0].
Switch to Libreoffice file format then convert to doc/docx then send back that doc/docx file.
[0] it's sneaky, basically Libreoffice will show you your edits and let you manipulate and modify the document as wish but what it saves into the file is corrupted and you will only notice it next time you open that file. Sometimes it's a mismatched tag in the internal xml representation (can be fixed), sometimes huge chunk of the doc will be missing (can't be fixed).
I can't imagine working on large document in WYSWIG opaque formats anymore. Plain text file formats with version control is the only scalable toolset that won't lose your work.
And there are no file formats that support it, unless you go to something like Markdown.
LaTeX seems like it fits the bill.
I have used LaTeX for a lot of things and it is a pain to use. It is like everything is stuck in 70s, from the syntax to the toolchain to package management.
I don't use LaTeX for anything these days but Typst popped up recently and seems like a decent alternative: https://github.com/typst/typst
I dont like the idea of embedding a scripting language in my documents at all.
I mean LaTeX is Turing complete as well. It is just that scripting is very clunky to use.
Well, sure, I think thats part of the problem with it.
I dont think either of them have to be turing complete though.
Otoh I feel like it forces me to keep it simple. I have two or three templates I've been using for decades, occasionally needing an update because some package was deprecated. Documents are not like Web Design where every two years someone comes up with completely new design guidelines based on the latest super duper awesome user testing and UX research and everything we did before was inferior.
To be fair, printed documents used to be like that for the first one or two centuries of their existence.
Have a look at Typst then. It's a fresh take on the concept.
Another way of looking at it is that LaTeX is so good no one has managed to come up with something better.
I recently started using it because Lyx does not work on my ARM based Linux tablet and its not too bad when used with a GUI editor.
It is a lot more powerful than other things I use (Markdown, Sphinx) and a lot better for version tracking and multiple output formats than word processors.
Typically the big task of a large document is focused on content and organization, and the formatting is a separate concern.
XML is a good start to describing structured text.
Well, it depends.
If you're compiling a report for a college group project (or the workplace equivalent of that) you might well need things like equations and tables and suchlike.
For example if you've got a table of performance results with the best performer in each row highlighted in bold, a fact the text references - then separating content and formatting doesn't really make that much sense.
You can tag the text as "important": see <strong> vs. <b> in HTML.
Just an example. Obviously the more formatting control you need the fancier your language and tool is. Tex is the upper bound.
You could also use docbook. My gut feeling is, that if you have large documents that you collaboratively work on, there's probably a bit of a compilation process to be done and other stuff you want to do with the document.
So the overhead of semantic markup of the whole content (instead of just marking words as bold) might be worth it.
Markdown-in-git used to be used for tech specs at my company. Now we switched to Google Docs, which means we need to maintain a "change log" section at the bottom because Docs history is... not a commit log.
Markdown in git, reviewed as a pull request on GitHub, is the best way to do an RFC. I will die on this hill.
Alternatively, I wonder if an old school message board would work. Each RFC gets its own (sub)board, and within that board each thread is a discussion about some topic -- an individual review, debate about a section, etc. I wonder if such a thing already exists, _specifically_ for technical specification review.
I'm so sorry. Any rueful ideas on how another company might avoid that pitfall?
Not sure how to avoid. I think it was two things:
1. Some devs just didn't like having review discussions as PR comments. Maybe it's the way that the threads break up the markdown source. Or maybe WYSIWYG feels better for doc review. I also recall people saying "it's hard to know what's changed, could you add a changelog section?" When I replied "it's a git repo, and I leave detailed commit messages," there was no response. I think this means that they don't like that.
2. The managers who decide things don't use git. We use Google Docs for everything else, why not for this?
In the end, Google Docs works well enough. I do miss the commit log, though. It's tricky to link to a previous revision in the changelog table -- maybe I'll get into the habit of it.
Both GitHub PRs and Google Docs have this problem, though: There's no ready record of the review process -- the comments. They eventually disappear and are difficult or impossible to retrieve, and they lose context.
Not like anybody is bothering to dig.
What about a wiki, like xwiki or others ? They have diff for changes (changelo), comments and in inline @user prompts.
I wonder if someone could build something clever with git-notes [0] to capture such discussions in the repo itself.
I'm sure others have thought about it already, so there must be some kind of pitfall... A little more searching finds projects like git-appraise. [1]
[0] https://git-scm.com/docs/git-notes
[1] https://github.com/google/git-appraise
It's called Redmine.
Eg: https://redmine.pfsense.org/issues/14139
Trac is still around, too. 1.6 came out a few months ago.
Confluence does a great work of having a WYSWIG editor and keeping a log of changes on a document.
Please do not work on docx.
Import docx if necessary, but work on the OASIS OpenDocument format.
If you need to send work to a msoffice user, just send OpenDocument. Office can open them. Should there issues, it's more likely to be their bug, not libreoffice's.
MS Office users need to get used to working with the standard document format, which is what OpenDocument is.
Ubuntu Bug #1 has been closed for over 10 years. Time to let it go.
Could you please elaborate?
I interpreted this as "LibreOffice is primarily used on Linux, while Office is almost exclusively used on Windows. Windows has much larger desktop market share than Linux, so it is not surprising that Office prefers the Office-specific format."
For what it's worth, [docx][1] is technically standard. Not sure how that pans out in practice.
[1]: https://en.wikipedia.org/wiki/Office_Open_XML
I am not sure about the first part. Most Libre Office users I have come across seem to be running it on Windows.
Of course MS Office is easily dominant on Windows.
Right you are! I stand corrected: https://stats.documentfoundation.org/downloads#week,os
Those stats are of direct downloads?
Linux is probably under-represented on this one because most Linux users will install from distro repos (or flatpack, snap or similar).
It's an "international standard" via both ECMA and the ISO/IEC JTC1. Although ISO in particular seems very pleased that JTC1 exists, this is a terrible way to do technical work, basically the idea is that countries get to agree the world's standards using a democratic process.
But why would countries be the right entities to do this work? They aren't, but there are a conveniently small number of them internationally and there were already bodies to represent them. Specifically they send representatives from their own national standards bodies to the relevant JTC1 sub-sub-committees. Yes that means Taiwan isn't represented.
For situations where there's just a matter of agreeing a few narrow specifics, such as the A-series paper standards, it doesn't really matter how it's done. For a huge problem like "Standardize Word processor application data" it's completely impractical and the results are all you'd expect. Microsoft basically leaned on national representatives from smaller countries to push their pointless vanity standard through both ECMA and subsequently JTC1.
After all that, it's basically futile because of course Microsoft can't magically make their "Office" suite and particularly Word behave in a documented internationally standard way, they don't even know how to describe much of the behaviour except "You know, that's how Word does it". And so, the "Office Open XML" standard has long sections where there's a magic escape hatch for "legacy" documents, which Word uses extensively, and it will always do that.
"Standard" except that yeah, all this non-standard stuff is critical and will be used forever. Futile.
It doesn’t, basically,—the spec isn’t enough to render it, and Microsoft has stopped engaging with ISO in favour of publishing what is now nineteen major versions[1] of their own spec in step with Office updates. There also was a huge shitstorm[2] around the ratification of the original, unsurprisingly given how useless and patent-infested it was.
[1] https://learn.microsoft.com/en-us/openspecs/office_standards...
[2] http://www.groklaw.net/article.php?story=2007011720521698
there was the funny thing thy MS published the standard and at the same time implemented a different standard then due to some stuff MS had to implement their standard properly and now you had 2 docx standards and had to "choose" which to use when setting up MS Word (like back in ~2007 or so) .... then they always switched to the standardized docx but they kinda continued to mess with it to a point where you shouldn't expect a MS Word document to be readable by anything even if it supposedly is saved in the standardized format (and when nitpicking it theoretically also is, just practically not in a very useful way).
ubuntu bug 1 for context: https://bugs.launchpad.net/ubuntu/+bug/1
Context
Bug: Microsoft has a majority market share
https://bugs.launchpad.net/ubuntu/+bug/1
In the real world, we can't all afford to be this idealistic. When I need to send someone a document for work purposes and they can't open my .odt file in MS Word (something that frequently happens to me), I'm not going to say "You need to get used to working with the standard document format. It isn't my problem that you can't open it - it's more likely to be your bug, not Libreoffce's". I'm going to send them a .docx so that we can both get on with our work.
MS Office has officially supported ODT for years. Are you sending files to people using decade old versions of MS word, or are you using too recent a version of ODT, or are is MS's ODT support incredibly buggy or what?
MS Office should be able to open and display ODT correctly but MS Office will produce "rainbow" ODT, ODT with extensions that diverge. I can't find the reference/source at the moment, I will update ASAP.
>MS Office has officially supported ODT for years. [...], or are is MS's ODT support incredibly buggy or what?
The bugs of programs trying to read others' file formats goes both ways. LibreOffice has problems reading some *.docx files -- and likewise -- Microsoft Word has problems reading some *.odt files.
Example thread of trying to keep "tracked changes" preserved in .odt files when co-workers open it in MS Word: https://forum.openoffice.org/en/forum/viewtopic.php?t=99962
That type of interoperability issue also happen with *.xlsx and *.ods spreadsheets that have non-trivial formatting or advanced functionality.
The advice of "just use the OpenDocument format" is not that simple because round-trip fidelity of the file may not be 100% preserved depending on what features of the software the collaborators use.
it's sadly not that rear
Sometimes, _especially_ if it's the "web" Word 365, which in my experience is really good in messing up files including ironically docx files (through I haven't used it in the last ~2 years so maybe it got better).
and yet they will happily say to you: "just get software compatible with a shitty office suite from a crappy company determined to kill interoperability" and expect you to eat it.
I think that's reasonable though. I haven't used Windows or any Microsoft product for over 10 years. However, I accept that that means I use software which is relatively obscure and unpopular. I'm happy with my choice, but I don't expect to enforce it on others, and I think it's entirely reasonable (even if it doesn't personally please me) that when communicating with others for work purposes I should use a medium that is used by virtually everyone else rather than expecting them to adopt mine.
question, how far does this logic go? would you expect people to consider the environmental ramifications if they use a v12 20liter car to go to the grocery store every day? or whether its proper to prepare A LOT of food and just throw out what you dont eat?
Why is responsibility for the software YOU decide to use somehow not something you get, but responsibility for the car you choose to drive is most certainly something you get
send a PDF
Maybe, but it's probably more reliable for them to just grab LibreOffice to open it.
It's much easier to get LibreOffice than MS Office.
”MS Office users need to get used to working with the standard document format, which is what OpenDocument is.”
This is not how the world, in which most of Fortune 500 operate in, works. The software and format people use, is the one usually mandated by their working environment. If your org operates in Office, you operate in Office. The question 99.9% of all users are concerned with is ”I want to review/edit this document I got from Sarah/need to send to Jane”. Not ”does this conform to standard xyz”.
Standars are just paper that software may support. The real question is phenomenological - is there a support, at what level, and is the software vendor incentivized to implement support.
There is no ”this format must be followed” convention in userspace software (sadly) unlike in say, hardware drivers for a desktop os. Even if the standard would have an ISO label.
Got bug numbers on any of those? It's not clear what you mean. How recent was the LO?
I won't bother with providing more than this: https://ask.libreoffice.org/search?q=SAXParseException%3A%20...
Feel free to craft problematic docx that trigger conversion errors and report them as bug for the converter.
It's true for any non-native format (of some complexity) in any application.
Vendor A's development team spends all their time developing, testing, bug-squashing, updating, etc. Format A, and developing, etc. their application, and developing the two to work together. They also have all the bug data, data from users, etc.
Vendor B can't possibly keep up. They have their own application and format to develop. They lack the institutional knowledge, the data, etc. There's no way their application will correctly handle complex data in Format A nearly as reliably as Vendor A (which will have bugs itself).
And in this case, Microsoft's Office team probably has far greater resources overall than LibreOffice.
This. LibreOffice once erased all the footnotes in an article I was editing as a .docx. Never had such problems with .odt.
To be fair, Word will won't eat it, but it will eventually shred it quite well to the point where there is no practical difference.
The Word's format is just something to avoid.
Similar effect can happen the other way around and in fact I have seen that more often (albeit mostly 10 years ago):
- create DOCX with change tracking in Word
- edit that in LibreOffice and save multiple times
- open it in Word modify something and save
- Word cannot load that file. Loading and saving in LibreOffice fixes that.
All cases I have seen involved change tracking, which is probably not that surprising as the internal representation of change tracking in DOCX is totally brain damaged (even more than the rest of that "dump RTF and random crap as XML" format).
I don't know if I would call LibreOffice sneaky here, they're the good guys IMO enabling you to break the clutches of MS Office