I remember writing a tile-based game for Window98 using DirectX 5. I originally implemented something similar to what is described in this article. My case was not as complex because the screen didn't scroll but large numbers of tiles could potentially need redrawing.
I ended up ripping out all of that code because when I tried to profile the cost of drawing the whole screen with my RIVA TnT card I literally couldn't measure the tiny amount of time it took.
I am glad I missed the EGA period of PC development, I never would have gotten anything done.
I was the opposite, I loved programming so low level, optimising using bitshifts instead of multiplications, writing essential code in assembly, writing 16 or 32 bits at a time (as processors developed).
Continuously learning from the scarce resources I had (books from the local library and a floppy disk with txt files).
I was too young (mid teens) / unexperienced to get anything finished but did manage smooth vertical scrolling and almost got the horizontal stuff working, double buffering a rotating 3D cube (z buffering, math, line by line polygon drawing).
Then came DirectX and you lost all touch with the machine, a rotating 3D cube was breeze, I didn't see the fun in that.
If you ignore the GPU acceleration APIs and try programming graphics the old fashioned way but on modern multi-core & SIMD machines, you can more or less still have a very similar rewarding experience.
Sure it won't compete with the performance of GPUs, but I feel like we should have a category of GPU-less PC graphics for things like the demo scene. Where you strive to deliver interesting results at modern resolutions but hamstrung by CPU-only, incentivizing exploring parallelization techniques, SIMD optimization that strongly resembles assembly coding of yesteryear, and good old fashioned cleverness.
* 'Many'-core CPU (say 8..16 relatively simple cores, eg. RISC-V)
* A good, but modest amount of RAM (eg. few dozen... few hundred MB)
* Integrated graphics, but of a conceptually simple type, eg. a plain framebuffer + _maybe_ some blitter / DMA / scrolling / hw sprite support. Simple enough that a single programmer can wrap their head around it (vs. complex 3D GPU where hardly anyone knows all its details).
* Graphics outputting to common consumer display (HDMI, DisplayPort)
* Sound & some other common I/O
Roughly Commodore Amiga level complexity (at most! simpler = better), just modern fabrication tech & faster.
This would be perfect platform for all sorts of homebrew games, hobby / retro style OSes, educational, industrial uses & more.
But doesn't exist afaik. Microcontroller usually don't have built-in graphics, too little RAM, or otherwise aren't quite it.
A step up (small SBC) and you get loooaads of peripherals, 3D GPU, firmware & associated complexity thrown in 'for free'. Nice for running full blown OS like Linux, not so nice for wrapping one's head around that hardware.
Allwinner F133
Bouffalo Lab BL808
GreenWaves GAP8
(no native video hw on this one I think? targeted at IoT edge computing)
Surely there are more (or will be :-). Problem is usually availability of suitable boards, documentation, software tools, or some combination thereof.
And these are often one-off parts. At some point it goes away, and then what to do? As opposed to being sort-of-standard platform offered by multipe vendors that one could code software ecosystem against (like for eg. Raspberry Pi).
You could of course design your own hardware. But then you quickly get into FPGA territory, design-your-own-computer project, lower clock rates, higher cost, etc. Cheap, small, highly integrated SoC much preferred here.
Maybe the Colour Maximite 2 would be interesting to both of you (don't have any personal experience with it but found it rather nice once I heard about it).
I like the idea of software rasterization as an aesthetic (in the context of games, but demos are included). Caches are larger than the system memory of the pre-GPU days. Best of all, if you can write a plot function you’re a graphics programmer!
I ended up working on software rendering (ray tracing) for film production for many years. Now I'm working for a hardware vendor, so I get to have fun with the machine way below the DirectX level.
That was a magical time to get into coding! The limitations of the hardware really made it worth learning low level wisdom.
I was trying to do similar things on a 486 with a slow supervga card. Without vga hardware scrolling tricks it couldn't even do 60fps of full redraws at 320x200 8 bit.
Even back then there was already a large gap widening between the slowest and fastest graphics cards. Between the bus widening from 8-bit > 16/32-bit, graphics cards supporting posted writes so the CPU wouldn't have to wait as much, and faster CPUs, fast systems could write a 640x480x256 screen at 300 fps or faster. In your case, you may have even either had a cached system memory buffer and DMA copy, or the drawing being done by a blitter. But even in the time of Win98, some 3D cards on the low end had really slow CPU access that barely beat an old ISA bus card.
Screen resolutions have gotten so high that there's a bit of a reversal again -- even if you can redraw an entire 4K screen at 60 fps, it's not exactly efficient. Games typically still redraw everything, but desktop compositors definitely do some optimizations, including dynamically assigning layers to hardware planes to reduce redrawn area.
I ended up ripping out all of that code because when I tried to profile the cost of drawing the whole screen with my RIVA TnT card I literally couldn't measure the tiny amount of time it took.
I am glad I missed the EGA period of PC development, I never would have gotten anything done.