That is why emulating, when targeting 100% accuracy, is a craftsmanship in our industry. Not only do you need to know each and every quirk the original hardware/software has, but you also need to replicate it, however peculiar it is. Consider the potential performance impact if itself is not challenging enough.
Interesting example of the kind of thing it's probably not worth caring about in software emulation. Emulating the bug would be considerably slower.
Some day replicating the PS2 on an FPGA will be feasible, and then figuring out how this worked will be a fun project for someone.
FPGA implementations are often implemented based on code or documentation from software emulation projects. An FPGA version of a PS2 has no guarantee of not implementing the same or similar bug.
The point here is that it is actually viable to reimplement such bugs without incurring significant performance penalties.
A software emulator has to be able to execute a single PS2 instruction in the same amount of wall time as it'd take on the original hardware. With a regular multiplication that's fairly easy: x86 also has multiplication, so you can do a 1:1 translation and be fairly certain it's within your time budget. With a bugged multiplication you need to do a regular x86 multiplication, and wrap that in a few dozen other instructions to add the buggy behaviour to it. There's a pretty decent chance it's simply too expensive!
When you're writing an FPGA emulator you are able to recreate the buggy multiplication directly in hardware. There's no additional wrapping needed, so (beyond figuring out intended behaviour) it's not any more costly than emulating a non-buggy multiplication. It's far easier to do a cycle-accurate emulation because you have direct control over the transistors!
There's a pretty decent chance it's simply too expensive!
I doubt the 24 year old, 300Mhz RISC, 32MB Ram, PS2 instruction set is too expensive to do a cycle perfect replication.
You would probably need a terahertz CPU for cycle-accurate PS2 emulation.
Cycle-accurate PS2 emulation means emulating the state of the CPU, GPU, other interacting processors, and their various interconnecting busses at clock cycle granularity and possibly at sub-cycle granularity if the processors are running asynchronously.
The IOP (PS1 processor used for I/O), SPU2 (sound processor), IPU (mpeg decoding), EE (main CPU), VU0, VU1, DMA controller and finally the GS (GPU) run asynchronously. It's naive for others to think perfect cycle accuracy for this machine might be possible with any hardware that exists today. PCSX2 is lucky to have VU1 on it's own thread. It still doesn't work perfectly IIRC, VU0 -> VU1 access is a little sketchy with it enabled.
Idk about PS2, but N64 (a generation older, much slower) still doesn't have a cycle perfect emulator that runs in real time.
Remember that you can't just perfectly emulate the CPU, you must also perfectly emulate the GPU, since they share the memory bus so one can slow the other.
since they share the memory bus so one can slow the other.
You can look more into this in this video by Kaze Emanuar, famous N64 romhacker.
Higan is a cycle-perfect SNES emulator, and it's very single-core CPU-intensive. This is what the FAQ [0] says:
Full-speed emulation for the Super Famicom base unit requires an Intel Core 2 Duo (or AMD equivalent), full-speed for games with the SuperFX chip requires an Intel Ivy Bridge (or equivalent), full-speed for the wireframe animations in Mega Man X2 requires an even faster computer. Low-power CPUs like ARM chips, or Intel Atom and Celeron CPUS generally aren’t fast enough to emulate the Super Famicom with higan, although other emulated consoles may work.
Work can't be split across cores (according to the FAQ) because that would compromise the accuracy of the timing.
It may be that the PS2 has similar problems while being more powerful than the SNES.
You must be young. A PS2 can run a modernish Linux with WindowMaker and decode MP3's, maybe FLAC and Opus and some MP3+XVid/DivX videos. Not so far from a cycle accurate emulation of a low end Windows 98 era PC. A PS2 with Linux can comment on this page for instance. With TLS et all, if you use Dillo as the web browser as it uses MbedTLS. The TLS handshake might ast a few seconds but that's it.
Try running that on a cycle accurate emulator without bringing an i7 to its knees.
A similar PC would be a Pentium II-III at 450-500 MHZ with 64MB of RAM running Damn Small Linux or NetBSD with a small Window Maker setup. Bear in mind that you could run a non-accurate SNES emulation in that machine, with sound outputted at 8000 HZ and with no filters, enough for most common games such as Chrono Trigger and Super Mario World with lots of hacks fixing the imprecise timing under ZSNES.
Why would you want to emulate an old crappy MIPS CPU using a relatively expensive FPGA? The whole idea of emulating old consoles is to be independent of the hardware so you can play your old games on your computer or phone.
For you, maybe. For others, having thing that does what this other thing did, with no regard to cost, is a fun adventure, in and of itself.
In that case I'm sure current FPGAs are already capable of emulating the MIPS CPU of a PS2.
The CPU yes but you want to emulate every piece of hardware in the console. The audio chips, the GPU, they way video memory works and so on.
I second that. I wrote a nes emulator twenty years ago because it was fun, and not for any practical purpose. I had no idea what I was doing, but I remember being in awe of the nes after reading the detailed hw spec ( found on zophars domain, doc by yoshi if memory serves me well ). I promptly decided to write an emulator in whatever language I was learning at that time.
The result was terrible, but I had tremendous fun!
Because the implementation on your computer or phone behaves slightly differently from actual hardware, sometimes to the point of being unplayable. If you can't get your hands on a genuine working console, cycle-accurate FPGA implementation is the next best thing.
Often the games I’m excited to play run like shit and have timing issues that are difficult to measure but feel wrong and diminish the nostalgia.
Free time is scarce, so I’ll gladly pay a couple hundred to not have to spend time fiddling with settings only to ultimately capitulate.
Depends on whether the bug breaks games or not.
Fixing this bug would be part of fixing a bunch of other floating point bugs, more specifically rounding and clamping.
Yes, software floating point would be slower, but the general solution would probably follow the PS4s PS2 emulator. Where each game can have whitelisted sections of code for the software floating point path.
The simplest trick for detecting old ARM emulation - ISTR was used on some Gameboy Advance copy protection: store a booby-trap instruction at PC+4 (i.e. the very next one). A real ARM has a pipeline that reads PC+8, while decoding PC+4 and while executing at PC. So the newly-stored instruction should have no effect. An emulator (which didn't emulate the hardware pipeline) would execute it.
edit: described in more detail here, among other emulation-busting measures from 2004 https://mgba.io//2014/12/28/classic-nes/
Is this why my ROM for Dragon Ball Z: The Legacy of Goku II didn't work sometimes with visualboy advance?
VisualBoyAdvance was (and is) absolute garbage in terms of accuracy and is loaded with delicious code execution exploits from boobytrapped roms
Use mGBA instead if you want to play Gameboy games in 2024
Mednafen it's fine too. Altough VBA... there's VBA-M which is much better than the original one, where the audio code for it was very bad and it glitched under GNU/Linux and BSD a lot.
The Texas Instrument TI320C40 digital signal processor had even weirder pipeline issues:
- Branch delay slots (https://en.wikipedia.org/wiki/Delay_slot), where one or more instruction(s) after a branch would be executed before the branch actually occurred.
- Load delay slots, where values stored into registers weren't guaranteed to appear until some later instruction. I believe the the value in the register was undefined for several cycles?
Writing tightly-optimized assembly code for these chips was pretty horrible, sort of like playing an unusually tasteless Zachtronics clone.
It was also kinda awesome because, as long as you were willing to spend days to optimize one page of code, you could get so much performance out of it.
Things like deliberately using the fact that multiplies only write the results into a register ~6 cycles later, means you can use that register for a bunch of other stuff in the meantime, and then on the 6th cycle the results would magically appear.
Basically, for those 6 cycles, you had no registers in-use for either the source operands or destination of the multiplication.
Obviously this is also pipelinable - you can start more multiplies while the first is running, using the same source and destination registers, but meanwhile you've used other instructions to load more data into the inputs and do something else with the outputs.
The MIPS-I chip also had load delay slots along with branch delay slots.
Some pipelined CPUs have retained compatibility with self-modifying code and detect when you overwrite an instruction that is on the pipeline and flush it.
X86 has that machinery although I'm not sure if they dropped it eventually on the 64-bit variant.
The rules and mechanisms for SMC detection are essentially the same in both modes as far as I am aware.
Both Intel and AMD implement SMC detection that is a bit stronger than required by the specification as well.
Another sibling comment here references it obliquely but on x86 the prefetch queue produces similar behaviour, until Intel decided to detect SMC on the Pentium and newer CPUs so that modifying the instruction about to be executed will always have an effect.
However, someone much later found another undetected edge-case: a self-overwriting repeated string instruction.
https://silviocesare.wordpress.com/2009/02/02/anti-debugging...
How do people get into emulation? It seems like an incredibly challenging niche within software.
You basically have to understand electronics and deep programming wizardry
Game Boy emulator tutorials are all over the web.
CPU's are emulated on a high level way. If you can shift bits, you can understand a Z80 or a 6502 in no time.
I tried to get into it many moons ago, and the guidelines were to start from something very simple and well documented, and build your skills from there. I still have the itch but lack the time :(
It depends what you mean by emulation.
For instance, I ported a 6502 interpreter from UNIX to Classic Macintosh back in the day. This was to play SID music files. So long as it ran fast enough, clock cycle accuracy wasn’t important.
It worked by calling C code from the interpreter.
Terrence Howard was right!
I came for this comment and there you are! Bravo
what's the reference?
Terrence Howard (actor in Ironman, Empire TV show, etc) believes he has discovered "a new math" where 1x1=2. I believe he has gained recent notoriety because he was on the Joe Rogan podcast, where he got a platform to say his beliefs to many people, but he has had these beliefs for many years.
As far as I can tell, his reasoning is literally that 2x2=4, so if you divide both sides by 2, you get 1x1=2.
Thanks, there went my morning. That was quite the rabbit hole. I’m now confused and sad.
Anyone else was lost with the title asking why a mouse or a keyboard should do math ? I spent to much time to find this about Playstation 2, not about Personal System/2 ports used to connect mouse and keyboard.
No, one is PS2, the other is PS/2.
To all the downvoters: Just because something is obvious to you doesn't mean that it is obvious to everyone else. Three character abbreviations can make context discovery really difficult as simply plopping the abbreviation into Google will quite often produce mostly unrelated results.
Don't tell Terrence Howard...
Emulators have to be pragmatic about accuracy, when emulating more modern systems it's generally not feasible to target 100% hardware accuracy and usable performance, so they tend to accept compromises which are technically deviations from the real hardware but usually don't make any observable difference in practice. Anything that uses a JIT recompiler is never going to be perfectly cycle-accurate to the original hardware but it usually doesn't matter unless the game code is deliberately constructed to break emulators.
Dolphin had to reckon with that balance when a few commercial Wii games included such anti-emulator code, which abused details of the real Wii CPUs cache behavior. Technically they could have emulated the real CPU cache to make those games work seamlessly, but the performance overhead (likely a 10x slowdown) would make them unplayable, so they hacked around it instead.
https://dolphin-emu.org/blog/2017/02/01/dolphin-progress-rep...
I once wrote something that would hard lock cortex-A8 but not the cortex-A9 we shipped on. To my knowledge, nobody tracked down why our app, once exfiltrated from our device, would crash slightly older phones.
So they'll patch around it.
You're just making your software worthless in the long run for some value probably less than 5 years, or creating a fun problem for an emu hacker.
Most of the significant losses to piracy monetarily isn't emulation, it's the chippers/mods that bypass cloned media copy protection.
Which emulator authors have a lot more control over in bypassing
If it hardlocked an A8 but not an A9, chances are very high that an emulator would run it with no problem, because nobody deliberately tries to emulate the kind of CPU bug that lets an app hardlock the CPU. GP appears to have been interested in deterring people from running their code on non-authorised real hardware at the time, not targeting emulator users.
Bingo! Didn't want someone running new product's app on old product's hardware. Company was new to building non-RTOS devices which were tightly hardware bound, wanted similar type restrictions.
You assume an anti-piracy attempt when GP, from my reading, made no such statement. More of a mystery, but who cares because the problem hardware wasn’t what they shipped on.
They used the word exfiltrate, it's not a stretch.
Were you exploiting an A8 erratum, or detecting "this is an A8" somehow and then making it barf in a less processor specific way?
A8 erratum. This was ages ago, but if I recall you could place a thumb2 instruction straddling two pages, only one of which was loaded in the TLBs. If you got everything right, the A8 would hang without trapping.
Edit: it was errata #657417, long since scrubbed from arm.com
The A8 errata doc is at https://developer.arm.com/documentation/prdc008070/latest/ these days and does have a description of 657417 with enough detail to make writing a reproducer possible. Instructions crossing page boundaries are tricky beasts :-)
Well look at that! I had searched Google and arm.com for that number. This was definitely it.
I wonder how mainframe emulators (that sometimes are used to run legacy, very critical software on modern hardware) manage to do it. Do they go for full complete emulation? As in, implementing the entire hardware in software?
Mainframes typically execute batch processes on a CPU. Much simpler than a game console with a GPU. Cycle-accurate emulation is less relevant for mainframes.
Most of those a JIT recompilers, Mainframe code doesn't usually depend on instruction cycle timing to the level that say beam-racing game code does.
When it comes to speedrunning: Some speedrunners do, though, to ensure their speedrun tech are reproducible on both emulators and real hardware.
That's true, the small differences between a pragmatic "accurate enough" emulator and real hardware can matter for speedrunners. The difference between real hardware running at 60fps and a principled cycle-accurate emulator running at <0.1fps would matter more, though.
For the SNES and earlier it's feasible to have exceptional accuracy and still usable performance, but for anything modern it's just not happening. Imagine trying to write a cycle-accurate emulator core for a modern CPU with instruction re-ordering, branch prediction, prefetching, asynchronous memory, etc, nevermind making it go fast.
I think the cutline can be moved to the original PlayStation now.
Which arguably explains a cultural rift in arcade emulation circles. MAME's philosophy is about cycle-accuracy, which might work for bespoke arcade hardware up to early 3D systems, whether they're bespoke (such as Namco's System 22) or console-derived (Namco's System 1x series, which all derive from the original PlayStation hardware) hardware. For newer arcade titles, which are just beefed period PCs, such kind of emulation (philosophy) would not be suffice for gameplay.
beebjit [1] is a cycle-accurate JIT-based emulator for the BBC Micro. It can be done.
[1]: https://github.com/scarybeasts/beebjit
That is not perfectly cycle-accurate, but it is accurate enough to run almost anything without issues.
The good news is that modern systems are so unpredictable relative to each other that games can be relied upon to not require cycle-accuracy. IIRC the cycle timings can differ between different units of the same model.