@miki@dragonscave.space avatar

miki

@miki@dragonscave.space

blind coder / comp-sci student, working in automatic speech recognition for CLARIN. Polish. Libertarian leaning. Feel free to get in touch.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

miki, to random
@miki@dragonscave.space avatar

I feel like there's no operating system or screen reader that would give me the sort of stability and productivity I had around 2019-ish, with Windows 7 and pre-2020.1 NVDA with its speech dictionaries. Honestly the config I had back then made me feel like a God of doing things quickly.

There's a lot of good things to be said about Mac OS and VO, but that kind of productivity just isn't there. Windows is slightly better at the cost of much stability, but NVDA post speech refactor is far from what it used to be.

At least Mac has eSpeak with rate boost support now.

Caoimhe, to random
@Caoimhe@dragonscave.space avatar

How do you guys organise your clothes? Do you have any tricks or techniques to make managing them easier? Feel free to share.

miki,
@miki@dragonscave.space avatar

@Caoimhe I have these sock thingies, no idea what they're called. They have two holes, one for each sock in a pair, and they help you make sure your socks stay paired when washing.

I also sort clothes by category, T-shirts in one pile, underpants in another, socks in a third etc. I get everything in similar colors so that I don't have to wonder what goes with what. Darker stuff is usually better because it makes stains less visible.

bermudianbrit, to random

ok techy humans final query then I'm pulling the trigger on this build. All 3 of these graphics cards seem basically the same. Any of them worth choosing over any other for AI workloads because I don't give a damn about visual performance for obvious reasons: - 16GB Sparkle Intel ARC A770 Titan;

  • amd radeon rx7800 xt; or
  • 16GB NVIDIA Geforce RTX 4060 Ti

Thanks

miki,
@miki@dragonscave.space avatar

@bermudianbrit For anything AI, go with Nvidia. Software-wise, the competition just isn't there yet, even if their cards are theoretically capable of being just as good if not better.

miki, to random
@miki@dragonscave.space avatar

LaTeX pro tip:

If you need to write a simple fraction like 1/2, 1/4, 2/3 etc, where both the numerator and denominator are a single digit, you can just write \fracxy instead of \frac{x}{y}. I personally find this form to be far more readable with a screen reader and wish I discovered it sooner.

miki,
@miki@dragonscave.space avatar

@wizzwizz4 I was aware of this, but the particulars of LaTeX control sequences were too much for one toot, and this is the most common case for me.

weirdwriter, to SmallWeb
miki,
@miki@dragonscave.space avatar

@jackf723 @vick21 @weirdwriter This is how audio description on TV works, at least in most of Europe. That's how it has to be, terrestrial and satellite bandwidth is very limited, and wasting it on tracks that are used very infrequently is just unacceptable. As a broadcaster, you have a choice between overpaying for bandwidth for very little benefit, converting the AD mix to mono at some horrendously low bitrate, or overlaying the low-bitrate, mono AD track on top of normal, high-quality audio. Most broadcasters go for the last option.

miki,
@miki@dragonscave.space avatar

@jackf723 @vick21 @weirdwriter This isn't as easy as you think, movies from different sources may have different lengths, e.g. due to a PAL/NTSC difference, an extra Netflix logo etc.

The only approach which makes sense here is the german Greta system and its derivatives. It's essentially Shazam for movies, you pick a movie you want to watch, give it a short sample, and it syncs your audio description with the movie audio. The added benefit to this is that the AD is completely independent of the movie source, works in cinemas and can be played through your own headphones when watching a movie with sighted friends or family.

miki,
@miki@dragonscave.space avatar

@jscholes @jackf723 @vick21 @weirdwriter huh, I assumed satellite systems worked in the same way that the terrestrial ones do. You might be right though.

miki,
@miki@dragonscave.space avatar

@jscholes @jackf723 @vick21 @weirdwriter Poland does this somewhat regularly. We get audio description for quite a few football (soccer) matches here. It's quite surprising really, considering the fact that we barely get it for anything else. Soccer is the only thing that gets somewhat regular and consistent AD. The quality is quite crappy, I can't tell you the exact stream parameters but I know who to ask.

jasongorman, to random
@jasongorman@mastodon.cloud avatar

If you had 10 minutes to explain to a group of programmers who are new to software development how to do it better, what would you tell/show them?

What's the least a dev can know that would make the biggest difference?

What are the ABCs of software development?

miki,
@miki@dragonscave.space avatar

@jasongorman Two words. Get Copilot.

miki, to random
@miki@dragonscave.space avatar

I think the debacle with Google's AI reading THe Onion is a symptom of a deeper problem.

I bet that some people who don't speak English that well, aren't familiar with English-language online culture, don't recognize the Onion brand, or who have a low cognitive ability, could fall for the same trick and make the same mistake.

This is why I was never a big fan of the Onion, it is too easy to misinterpret and take it at face value, especially in those situations. I think the problem here is with the Onion publishing misinformation online and all the people giving it good search engine rankings by linking to it, not Google's AI.

miki,
@miki@dragonscave.space avatar

@shane Movies don't look like news articles. I guess you're sort of right when it comes to short stories that are specifically written to look like news articles, scientific papers, Wikipedia etc. Like this one for example https://qntm.org/mmacevedo

chikim, to random
@chikim@mastodon.social avatar

VOLlama v0.1.4-beta.1: System Prompt manager; Import Awesome ChatGPT Prompts; Partial support for GPT-4O (Throws an error for token counter in some cases but just ignore for now); Able to attach entire document and feed for long context model. https://chigkim.github.io/VOLlama/

miki,
@miki@dragonscave.space avatar

@simon @chikim @jscholes THey actually ban people for this apparently.

miki, to random
@miki@dragonscave.space avatar

Am I the only person who gets surprised when a C/C++ project actually builds successfully on first try?

Caoimhe, to random
@Caoimhe@dragonscave.space avatar

Is it possible to read the content of an Excel cell letter by letter with NVDA?

miki,
@miki@dragonscave.space avatar

@Caoimhe Press f2, though that only works if the text comes directly from the cell, not a formula. This trick is also pretty useful for advanced formula editing.

jscholes, to apple
@jscholes@dragonscave.space avatar

Costco had on sale. Having never used one, I decided to buy a couple. I now have two smooth, round things on my desk that apparently don't stick or attach to anything without additional hardware, that I guess I can... put in a box that I might lose? Not really sure I understand this product.

miki,
@miki@dragonscave.space avatar

@jscholes They're great to put in a bag if you're a bag-carrying person like I am. Wallets too. You can attach them to a keychain or keyring. People who own a vehicle often put one there, in case it gets stolen or even to find it when parked.

jcsteh, to random

As I understand it, with all current LLMs, having a conversation involves feeding the model the entire conversation up to this point. That is, there is no memory: the prompt you feed it just gets longer and longer. So how does that work with something like GPT-4O which could be processing audio and/or video at a much faster rate? Surely the prompts must get very large very quickly with anything beyond a short interaction? Doesn't that mean the responses take longer and cost more as the conversation gets longer?

miki,
@miki@dragonscave.space avatar

@chikim @jcsteh Also there might be some caching involved. The expensive operation in LLMs is attention, which needs to be calculated for every pair of tokens, and that's O(n^2). However, when we're only adding a few new tokens to an already existing prompt, we only need to calcualte the new pairs, and that's just O(n+m*m), not O((n+m)^2). Most implementations throw all those calculations away after finishing every request. This makes sense, these attention vectors take up a lot of memory and there's usually load balancing involved, so even if you make a request with the same prompt, it's probably going to hit another instance. If you have a persistent connection to a single server and it's easy to determine exactly when this connection starts and ends, it might make sense to cache, which lowers the cost considerably.

TheQuinbox, to random

Interesting observation: almost all of the blind hackers in my friend circle are bookworms, me included. I mean, some of us like audio over epub or vice versa, same with genres, but we're all bookworms. Wonder why?

miki,
@miki@dragonscave.space avatar

@TheQuinbox I feel like books for us are what movies are for others. At least for me, it takes a lot less mental effort to listen to a book (whether that be with TTS or audio) than to listen to a movie with AD. In this day and age, there's also a lot less stigma about books than there is about TV / games / social media, so you can read as much as you want, guilt-free.

miki, to random
@miki@dragonscave.space avatar

This whole Microsoft Recall thing makes me want to return to my "permanent storage of speech history" idea. Annotate it with some metadata like timestamps, app name and window title, stick it in a vector database for RAG, and some really interesting possibilities start to emerge.

miki,
@miki@dragonscave.space avatar

@zersiax yeah, this would need to be all local, maybe using Open AI for the RAG step itself if that.

KathyReid, to microsoft
@KathyReid@aus.social avatar

Why does want to implement ? It's not about images. It's about modelling what workers do on Windows, and then replacing them.

The most expensive part of a computer is the fallible feelings-filled unpredictable meat sack that operates it.

Google has YouTube, Google Photos, Maps, and a bucket load of search data, Google Analytics, advertising, as well as it's data (e.g. transcriptions). And a bunch of data from Android services. From this data they can model speech, model videos and model advertising systems, and how humans respond to them.

But they can't model what people do on computers.

Amazon has Prime data, and a bucket load of compute. But no operating system data. They can build models based around e-commerce and advertising systems.

But they can't model what people do on computers.

Meta has waves hands enough analytics to model human behaviour in the Metaverse.

But they can't model what people do on computers.

Microsoft has GitHub.
Microsoft has LinkedIn.
Microsoft has SharePoint.
Microsoft has Teams.
Microsoft has Dynamics.
Microsoft has O365.
Microsoft has Windows telemetry data.

Microsoft can model what people do on (Windows) computers. Like fill out spreadsheets.Write emails. Synthesize web pages of research. Interact with colleagues on Teams. Create and edit documents.

Microsoft wants data so they can model what people do with operating systems.

Then replace them.

Imagine a CoPilot that doesn't just write buggy code. Imagine one that also does spreadsheets. That creates documents on SharePoint. That communicates with colleages on Teams. That has a customer pipeline on Dynamics.

That's what Recall is about - 360 degree surveillance of the worker, to model their functions, make them fungible, replicable - and replaceable.

miki,
@miki@dragonscave.space avatar

@KathyReid The fatal flaw in this argument is the fact that recall data stays local and isn't sent to Microsoft

capital, to random
@capital@scalie.zone avatar

Microsoft recall is fucking insane.

Recall snapshots are kept on Copilot+ PCs themselves, on the local hard disk, and are protected using data encryption on your device and (if you have Windows 11 Pro or an enterprise Windows 11 SKU) BitLocker.

Your doing what? Microsoft wh-

Recall uses Copilot+ PC advanced processing capabilities to take images of your active screen every few seconds. [...]

[...] The default allocation for Recall on a device with 256 GB will be 25 GB, which can store approximately 3 months of snapshots. [...]

WHAT WHY NO ST-

Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers. That data may be in snapshots that are stored on your device, especially when sites do not follow standard internet protocols like cloaking password entry.

Microsoft please... th-the tech support scams... think about what happens if this gets bre-

Recall also does not take snapshots of certain kinds of content, including InPrivate web browsing sessions...

Oh, okay I guess that's san-

...in Microsoft Edge.

AAAAAAAAAAAAAAAAAAAAAAAA

It treats material protected with digital rights management (DRM) similarly; like other Windows apps such as the Snipping Tool, Recall will not store DRM content.

Ah, but of course. The DRM is protected...

miki,
@miki@dragonscave.space avatar

@capital The fact that DRM is protected isn't some kind of evil / malicious scheme by Microsoft, it's just how Windows (and literally all other systems, Linux included) works. No app, whether Microsoft or third-party, is allowed to touch that data.

miki,
@miki@dragonscave.space avatar

@minneyar @capital Passwords are already protected, how would they detect credit card numbers?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • cisconetworking
  • GTA5RPClips
  • osvaldo12
  • khanakhh
  • DreamBathrooms
  • magazineikmin
  • Youngstown
  • everett
  • mdbf
  • slotface
  • InstantRegret
  • rosin
  • JUstTest
  • cubers
  • modclub
  • normalnudes
  • Durango
  • thenastyranch
  • ethstaker
  • tacticalgear
  • ngwrru68w68
  • Leos
  • anitta
  • provamag3
  • tester
  • lostlight
  • All magazines