@cominu dunno your level of experience here, so basics first:
you usually want to store a dither pattern in a premade array of values instead of live-computing them. the wikipedia page for "ordered dithering" has some examples of of these arrays for bayer dithering, like in the clip (they show them as matrices, you can flatten them). blue noise can be done the same, just uses a differently-shuffled array
and then it's really just checking each pixel against a value in the tiled dither pattern
@cominu my main performance insight in this renderer is applying some general cache-awareness to the wolf3D-raycaster situation:
these raycasters use vertical lines as their drawing primitives (one ray per vertical line on the screen). this is very rough for the cache: playdate's framebuffer is stored in "english reading order" (a row at a time, top to bottom), so stepping downward 1 pixel means jumping 52 bytes in the buffer...and a pixel is 1 bit, so it has to do bit-twiddling for each pixel
@cominu basically, drawing a bunch of 1-pixel-wide vertical lines is maybe the slowest way you could draw this
instead, when raycasting, i store the line-heights in a 400-item array (one for each column of the screen's resolution), and then afterward, i step over the framebuffer in reading order, writing 8-bit strips of pixels. so it writes the frame buffer 1 full byte at a time, and steps over the line-heights array 1 value at a time (with jumps at the end of a row). much better for the cache!
@cominu and with that in place, the actual dithering part is really just a comparison
(orderedDither is my dither-pattern as an array of bytes, ditherIndex is the tiled index into that array, and ditherValue is the intended brightness of the pixel. this is C, so dither will either become 0 or 1 depending on whether or not it passed the threshold, and that gets multiplied in with the pixel's baseline "is this part of a wall that needs to be filled in" value)
@2DArray thanks, super clear explanation! I always thought there was some black magic behind the dithering algo and...in the end it's really, really simple 😅
@cominu yeah dithering is startlingly simple once you've got a handle on what it means! the really general form is "when you're gonna reduce the bit-depth of a value, add some random noise to it first"
i'm a chronic bayer-hater so i replaced the structured dithering with blue noise, and to me it looks a lot nicer. runs at full speed on the real device, too!
@fsouchu I'm honestly not sure lol. the latest clips are recorded with OBS at 720p, so I could capture above 30fps (which means I had to scale up the source when recording - maybe that's what helped?)
this drops my fps to 40 (max possible on playdate is 50) but i'm hoping/assuming i can find more stuff to optimize. worst case, locking the game to 30fps seems like it would be OKAY, and maybe even the more sane choice for a handheld (to save on battery power)
the real ideal thing would be if you could pick between 30fps (power-saver mode) and 50fps (try-hard mode)
@fsouchu that's the ticket! I really want to capture some of the drifting feel from TM2 (canyon) but I haven't gotten to that part yet since I've mostly been focusing on gfx stuff
starting to draw a proper car - i did this first version in a really basic way so i thought it'd tank my performance, but it turns out it still stays at like 45+ fps on the device. not bad!
i'm very bad at lowpoly modeling so the car accidentally looks like an old-timey beater, but i think that's sorta funny so i'll probably try to keep that vibe
made a fancier finish-line and now i'm gonna need to write my own triangle-rasterizer instead of relying on the playdate sdk's fillTriangle() helper - i'd like to get rasterized meshes to use the same dithering as the walls/floors/sky, and also i need the meshes to get occluded by walls
@fsouchu nope, lol. i still haven't tried the thing i was talking about at first, but i goofed around a bit with some related-but-less-drastic simd usage, and failed to get any actual speedup (which is what has always happened whenever i've tried to use simd in general)
@fsouchu i did get a nice speedup on the rasterizer by doing that "splat a slice of the dither pattern into an 8-or-32 bit scanline-strip all at once, instead of iterating individual pixels" thing (i already had start/end sub-indices inside each strip, so i did some bitwise shenanigans to create a mask with leading and trailing zeroes, then used that to splat the pattern)
unfortunately, i need per-pixel iteration for wall occlusion, so i can't actually use that 🙃
@2DArray@fsouchu
I might well or be following this I enough detail but could you read the current 32 bits then mask in your fill pattern and write back? Or does multiple walls make that impractical
@twitonatrain the dither pattern is storing u8 thresholds (to compare against a pixel's intended brightness), so it's not currently a blittable sprite! I could convert it into several ready-to-go sprites for different grey levels, but I imagine it'd lose some smoothness (especially in the sky, where it's a lot of smooth fades)...but maybe that'll become necessary at some point
@twitonatrain thanks! and honestly your idea is a good one which i've been trying to find ways to incorporate...i might have an avenue to use it in the mesh rasterizer (which does use blittable patterns!), if it can detect that a certain 32-bit slice of a scanline has no chance of being occluded by a wall (which means the per-pixel checks aren't needed there)
working on replay recording/playback - yes, the clip looks the same as the others in the thread, but it's a pretty good run since i was retrying the map. if you think "i see places where the racing line could have been improved" then the game is working, lol
replay data is quite small: stores keyframes containing a bitmask of user-input states (one byte), along with how many frames you held that state for (one byte). a "normal" 30 second replay tends to be around 200 bytes
unfortunately i doubt there'll be a way to do any online-sharing because of playdate limitations (i'm gonna try to get libcurl to work anyway though - i think they use it for OS features, they just don't expose any networking stuff in their sdk at this point)
@isziaui does seem possible to export like that...but then you'd still have to get the data back into someone else's device!
best I've thought of so far is a companion app that you run on a PC, and they talk to each other over USB. not ideal, but it'd save you from needing to boot the playdate into data-drive mode, at least. an over-engineered version could talk to a webserver from there, lol
when it samples the floor tex, it raycasts the texelspace grid to see how many screen-pixels it'll step along the scanline before hitting the next texel, and it can reuse the latest sample until then - fewer samples, sharper result!
it also does some smooth LOD, where it starts using a bigger and bigger "screen-pixel steps per sample" ratio as pixels get farther away
@fsouchu yeeep the mesh rasterizer is using simpler patterns everywhere so it can do the 32-bit-blitting stuff (when it knows there's no risk of wall-occlusion)
that said, I could probably bake a few blittable blue noise patterns for this. I'm not quite happy with the look of the shadow yet, so maybe that'd do the trick!
@fsouchu walls are "wolfenstyle" so they start by doing a per-column raycast, and save the resulting screenspace wall-heights to an array. then, during the fullscreen per-pixel pass (walls/floor/sky), it just has to check if a pixel is above/below the wall-range for its screen-x position
the shadow is a copy of the car mesh with a weird/flattened local-to-world matrix - what are you imagining for wall-shadows? all i've thought of is "pretend the nearby wall is a plane, project a mesh onto that"
ended up doing the less-fancy "pretend the nearby wall is a flat plane, project the car-mesh onto that" idea...mostly because it seemed easier to implement
now in the worst case, i get down to like 33fps (player car, player floor-shadow, player wall-shadow, ghost car, giant "START" mesh) but hey i'm still above 30 lol and i might be out of ideas for more wacky gfx shit to add, so i might be getting away with it
@fsouchu i was thinking about drawing the sun...but i think at the moment i'm interested in getting the title card to look like it's own thing, instead of just showing what the normal gameplay looks like (dunno if it's the right call, but that's where i'm at so far)
the motion here is temp, but it's partway to something more interesting...
been pretty busy for a few days, so not much progress here - but i've started getting into a track editor
initially this is just for me to make some built-in tracks, but i might polish it to make it user-facing if i can figure out a comfy way for people to share their maps
i can now save tracks as files, and then load them - the ghost-car in this clip is included in the track data. the spiral track file (including that ghost's replay) is 161 bytes
zipping the file actually makes it larger (241 bytes) so i think i did good. lol
conveniently, i already wrote a playdate serializer for C, for that previous music-maker app
here's the code for serializing/deserializing a track and its replay. serialization and deserialization use these same functions (the Serializer has a flag to tell it whether it's reading or writing) - so there's no need for annoying/error-prone "near duplicate" routines for input/output
it supports ascii format (more legible) and binary format (smaller file)
mostly boring backend stuff lately, but here's a clip of a track i made on the playdate and then loaded in the desktop simulator app - it includes four replays to compete against (bronze, silver, gold, author), and it seems like i can afford to show them all at once!
reduced the framerate to 30fps since it still beats that on the device, but in doing so i instantly radicalized myself into one of those "30fps is literally unplayable" people, so now i guess i have to optimize for 40 instead
(30 is fine with me in other games, i'm just used to 50 in this one)
whelp turns out i can't figure out how to get it back to 40 (particularly with 4 ghosts active) but 30 actually looks fine on the device - the lower framerate is only noticeable to me when testing in the desktop simulator
something something magic playdate screen. good enough!
@2DArray That blue noise works incredibly well, and it's so stable. I wonder how I never heard of it in 20 years of rendering work. Looks just like error-diffusion dithering. Do you know if there's any good sample code I could look at? I'd love to play with it and get to know how to use it.
implementation is dead simple: it's the exact same as what you'd do for Bayer dithering (dither pattern is a pixel-grid of brightness values, each final screen pixel becomes either black or white by comparing its intended brightness to a value in the pattern), but with a blue-noise texture as the pattern
@fsouchu yep! floating point is deterministic and the app is single-threaded, so (at least as far as i've seen...) the only thing that's unpredictable is the user-input. i wouldn't be surprised if the simulator and playdate produce replays which are incompatible with each other due to compiler differences...but i haven't serialized any replays, so i haven't seen that happen yet lol. if it turns out to be a problem, i can resort to a heftier format!
@fsouchu correct, all C! but yeah even when the performance is variable, the game just moves slower instead of using a variable deltaTime (I'm now assuming I'm gonna lock to 30fps at some point because I can't stop adding visual stuff, so I just need to make sure it's always able to run faster than that - but even if it was a problem, the sim is WAY faster than the rendering, so it could do a multi-stepping fixed-tick for the physics whenever it was dropping frames)
@fsouchu yeah exactly! currently the walls are all either axis-aligned lines or quadrant-aligned circular arcs, and both of those allow reasonably-fast ray intersection tests
(the arc test is "ray vs circle" followed by checking the sign of the circle's local hit-x and hit-y to see if it's inside the arc's chosen quadrant)
@fsouchu that said...i might end up allowing arbitrarily-rotated lines and arbitrary arcs for more level design options - the arc test could replace that quadrant trick with:
dot(localHitPoint, arcBisectDirection)
and check that against some precomputed per-arc threshold, something like
Add comment