@miki@dragonscave.space avatar

miki

@miki@dragonscave.space

blind coder / comp-sci student, working in automatic speech recognition for CLARIN. Polish. Libertarian leaning. Feel free to get in touch.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

miki, to random
@miki@dragonscave.space avatar

LaTeX is both, the best thing that happened to the accessibility of math content online and the worst thing that happened to the accessibility of math content online.

miki, to random
@miki@dragonscave.space avatar

The American obsession with libraries (and yes, it's almost uniquely American) is so weird to me.

vick21, to random
@vick21@mastodon.social avatar

It’s so hard to wipe the keyboard on a Macbook laptop these days without typing some gibberish while doing so. After powering off the machine, it restarts again the minute I touch the keyboard with a cloth. LOL

miki,
@miki@dragonscave.space avatar

@vick21 I do this on the authentication screen with VO keyboard help active.

miki, to random
@miki@dragonscave.space avatar

Tech people need an "apps that hijacked your start page and search engine" remembrance day, when we're all grateful for the ad tracking and sandboxing that replaced it.

miki, to random
@miki@dragonscave.space avatar

I feel like there's no operating system or screen reader that would give me the sort of stability and productivity I had around 2019-ish, with Windows 7 and pre-2020.1 NVDA with its speech dictionaries. Honestly the config I had back then made me feel like a God of doing things quickly.

There's a lot of good things to be said about Mac OS and VO, but that kind of productivity just isn't there. Windows is slightly better at the cost of much stability, but NVDA post speech refactor is far from what it used to be.

At least Mac has eSpeak with rate boost support now.

Caoimhe, to random
@Caoimhe@dragonscave.space avatar

How do you guys organise your clothes? Do you have any tricks or techniques to make managing them easier? Feel free to share.

miki,
@miki@dragonscave.space avatar

@Caoimhe I have these sock thingies, no idea what they're called. They have two holes, one for each sock in a pair, and they help you make sure your socks stay paired when washing.

I also sort clothes by category, T-shirts in one pile, underpants in another, socks in a third etc. I get everything in similar colors so that I don't have to wonder what goes with what. Darker stuff is usually better because it makes stains less visible.

bermudianbrit, to random

ok techy humans final query then I'm pulling the trigger on this build. All 3 of these graphics cards seem basically the same. Any of them worth choosing over any other for AI workloads because I don't give a damn about visual performance for obvious reasons: - 16GB Sparkle Intel ARC A770 Titan;

  • amd radeon rx7800 xt; or
  • 16GB NVIDIA Geforce RTX 4060 Ti

Thanks

miki,
@miki@dragonscave.space avatar

@bermudianbrit For anything AI, go with Nvidia. Software-wise, the competition just isn't there yet, even if their cards are theoretically capable of being just as good if not better.

miki, to random
@miki@dragonscave.space avatar

LaTeX pro tip:

If you need to write a simple fraction like 1/2, 1/4, 2/3 etc, where both the numerator and denominator are a single digit, you can just write \fracxy instead of \frac{x}{y}. I personally find this form to be far more readable with a screen reader and wish I discovered it sooner.

miki,
@miki@dragonscave.space avatar

@wizzwizz4 I was aware of this, but the particulars of LaTeX control sequences were too much for one toot, and this is the most common case for me.

miki,
@miki@dragonscave.space avatar

@x0 Yeah that works, but \fracx1 doesn't. I know why but it's not easy to explain in a single post, if you don't already know the particulars of how TeX control sequences work.

miki,
@miki@dragonscave.space avatar

@x0 Yes. I'd have to reread that chapter of the texbook to refresh my memory, but as far as I remember, there are two kinds of control sequences. \frac is a "word sequence" (not sure about the exact term here). Those consist of a control sequence character (the backslash) followed by one or more letters and last until a non-letter. This is why \frac1x is interpreted as \frac{1}{x}, but \fracx1 is \fracx{1}, which is invalid. Essentially, \frac accepts two "arguments",. An argument is a single character or a group of characters delimited by { and }, that's why this works.

weirdwriter, to SmallWeb
miki,
@miki@dragonscave.space avatar

@jackf723 @vick21 @weirdwriter This is how audio description on TV works, at least in most of Europe. That's how it has to be, terrestrial and satellite bandwidth is very limited, and wasting it on tracks that are used very infrequently is just unacceptable. As a broadcaster, you have a choice between overpaying for bandwidth for very little benefit, converting the AD mix to mono at some horrendously low bitrate, or overlaying the low-bitrate, mono AD track on top of normal, high-quality audio. Most broadcasters go for the last option.

miki,
@miki@dragonscave.space avatar

@jackf723 @vick21 @weirdwriter This isn't as easy as you think, movies from different sources may have different lengths, e.g. due to a PAL/NTSC difference, an extra Netflix logo etc.

The only approach which makes sense here is the german Greta system and its derivatives. It's essentially Shazam for movies, you pick a movie you want to watch, give it a short sample, and it syncs your audio description with the movie audio. The added benefit to this is that the AD is completely independent of the movie source, works in cinemas and can be played through your own headphones when watching a movie with sighted friends or family.

miki,
@miki@dragonscave.space avatar

@jscholes @jackf723 @vick21 @weirdwriter huh, I assumed satellite systems worked in the same way that the terrestrial ones do. You might be right though.

miki,
@miki@dragonscave.space avatar

@jscholes @jackf723 @vick21 @weirdwriter Poland does this somewhat regularly. We get audio description for quite a few football (soccer) matches here. It's quite surprising really, considering the fact that we barely get it for anything else. Soccer is the only thing that gets somewhat regular and consistent AD. The quality is quite crappy, I can't tell you the exact stream parameters but I know who to ask.

jasongorman, to random
@jasongorman@mastodon.cloud avatar

If you had 10 minutes to explain to a group of programmers who are new to software development how to do it better, what would you tell/show them?

What's the least a dev can know that would make the biggest difference?

What are the ABCs of software development?

miki,
@miki@dragonscave.space avatar

@jasongorman Two words. Get Copilot.

miki, to random
@miki@dragonscave.space avatar

I think the debacle with Google's AI reading THe Onion is a symptom of a deeper problem.

I bet that some people who don't speak English that well, aren't familiar with English-language online culture, don't recognize the Onion brand, or who have a low cognitive ability, could fall for the same trick and make the same mistake.

This is why I was never a big fan of the Onion, it is too easy to misinterpret and take it at face value, especially in those situations. I think the problem here is with the Onion publishing misinformation online and all the people giving it good search engine rankings by linking to it, not Google's AI.

miki,
@miki@dragonscave.space avatar

@shane Movies don't look like news articles. I guess you're sort of right when it comes to short stories that are specifically written to look like news articles, scientific papers, Wikipedia etc. Like this one for example https://qntm.org/mmacevedo

chikim, to random
@chikim@mastodon.social avatar

VOLlama v0.1.4-beta.1: System Prompt manager; Import Awesome ChatGPT Prompts; Partial support for GPT-4O (Throws an error for token counter in some cases but just ignore for now); Able to attach entire document and feed for long context model. https://chigkim.github.io/VOLlama/

miki,
@miki@dragonscave.space avatar

@simon @chikim @jscholes THey actually ban people for this apparently.

miki, to random
@miki@dragonscave.space avatar

Am I the only person who gets surprised when a C/C++ project actually builds successfully on first try?

Caoimhe, to random
@Caoimhe@dragonscave.space avatar

Is it possible to read the content of an Excel cell letter by letter with NVDA?

miki,
@miki@dragonscave.space avatar

@Caoimhe Press f2, though that only works if the text comes directly from the cell, not a formula. This trick is also pretty useful for advanced formula editing.

jscholes, to apple
@jscholes@dragonscave.space avatar

Costco had on sale. Having never used one, I decided to buy a couple. I now have two smooth, round things on my desk that apparently don't stick or attach to anything without additional hardware, that I guess I can... put in a box that I might lose? Not really sure I understand this product.

miki,
@miki@dragonscave.space avatar

@jscholes They're great to put in a bag if you're a bag-carrying person like I am. Wallets too. You can attach them to a keychain or keyring. People who own a vehicle often put one there, in case it gets stolen or even to find it when parked.

jcsteh, to random

As I understand it, with all current LLMs, having a conversation involves feeding the model the entire conversation up to this point. That is, there is no memory: the prompt you feed it just gets longer and longer. So how does that work with something like GPT-4O which could be processing audio and/or video at a much faster rate? Surely the prompts must get very large very quickly with anything beyond a short interaction? Doesn't that mean the responses take longer and cost more as the conversation gets longer?

miki,
@miki@dragonscave.space avatar

@chikim @jcsteh Also there might be some caching involved. The expensive operation in LLMs is attention, which needs to be calculated for every pair of tokens, and that's O(n^2). However, when we're only adding a few new tokens to an already existing prompt, we only need to calcualte the new pairs, and that's just O(n+m*m), not O((n+m)^2). Most implementations throw all those calculations away after finishing every request. This makes sense, these attention vectors take up a lot of memory and there's usually load balancing involved, so even if you make a request with the same prompt, it's probably going to hit another instance. If you have a persistent connection to a single server and it's easy to determine exactly when this connection starts and ends, it might make sense to cache, which lowers the cost considerably.

TheQuinbox, to random

Interesting observation: almost all of the blind hackers in my friend circle are bookworms, me included. I mean, some of us like audio over epub or vice versa, same with genres, but we're all bookworms. Wonder why?

miki,
@miki@dragonscave.space avatar

@TheQuinbox I feel like books for us are what movies are for others. At least for me, it takes a lot less mental effort to listen to a book (whether that be with TTS or audio) than to listen to a movie with AD. In this day and age, there's also a lot less stigma about books than there is about TV / games / social media, so you can read as much as you want, guilt-free.

KathyReid, to microsoft
@KathyReid@aus.social avatar

Why does want to implement ? It's not about images. It's about modelling what workers do on Windows, and then replacing them.

The most expensive part of a computer is the fallible feelings-filled unpredictable meat sack that operates it.

Google has YouTube, Google Photos, Maps, and a bucket load of search data, Google Analytics, advertising, as well as it's data (e.g. transcriptions). And a bunch of data from Android services. From this data they can model speech, model videos and model advertising systems, and how humans respond to them.

But they can't model what people do on computers.

Amazon has Prime data, and a bucket load of compute. But no operating system data. They can build models based around e-commerce and advertising systems.

But they can't model what people do on computers.

Meta has waves hands enough analytics to model human behaviour in the Metaverse.

But they can't model what people do on computers.

Microsoft has GitHub.
Microsoft has LinkedIn.
Microsoft has SharePoint.
Microsoft has Teams.
Microsoft has Dynamics.
Microsoft has O365.
Microsoft has Windows telemetry data.

Microsoft can model what people do on (Windows) computers. Like fill out spreadsheets.Write emails. Synthesize web pages of research. Interact with colleagues on Teams. Create and edit documents.

Microsoft wants data so they can model what people do with operating systems.

Then replace them.

Imagine a CoPilot that doesn't just write buggy code. Imagine one that also does spreadsheets. That creates documents on SharePoint. That communicates with colleages on Teams. That has a customer pipeline on Dynamics.

That's what Recall is about - 360 degree surveillance of the worker, to model their functions, make them fungible, replicable - and replaceable.

miki,
@miki@dragonscave.space avatar

@KathyReid The fatal flaw in this argument is the fact that recall data stays local and isn't sent to Microsoft

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • lostlight
  • All magazines