miki

@miki@dragonscave.space

blind coder / comp-sci student, working in automatic speech recognition for CLARIN. Polish. Libertarian leaning. Feel free to get in touch.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

miki, 3 hours ago to random

I feel like there's no operating system or screen reader that would give me the sort of stability and productivity I had around 2019-ish, with Windows 7 and pre-2020.1 NVDA with its speech dictionaries. Honestly the config I had back then made me feel like a God of doing things quickly.

There's a lot of good things to be said about Mac OS and VO, but that kind of productivity just isn't there. Windows is slightly better at the cost of much stability, but NVDA post speech refactor is far from what it used to be.

At least Mac has eSpeak with rate boost support now.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Caoimhe, 11 hours ago to random

How do you guys organise your clothes? Do you have any tricks or techniques to make managing them easier? Feel free to share.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 3 hours ago

@Caoimhe I have these sock thingies, no idea what they're called. They have two holes, one for each sock in a pair, and they help you make sure your socks stay paired when washing.

I also sort clothes by category, T-shirts in one pile, underpants in another, socks in a third etc. I get everything in similar colors so that I don't have to wonder what goes with what. Darker stuff is usually better because it makes stains less visible.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bermudianbrit, 20 hours ago to random

ok techy humans final query then I'm pulling the trigger on this build. All 3 of these graphics cards seem basically the same. Any of them worth choosing over any other for AI workloads because I don't give a damn about visual performance for obvious reasons: - 16GB Sparkle Intel ARC A770 Titan;

amd radeon rx7800 xt; or

16GB NVIDIA Geforce RTX 4060 Ti

Thanks

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 20 hours ago

@bermudianbrit For anything AI, go with Nvidia. Software-wise, the competition just isn't there yet, even if their cards are theoretically capable of being just as good if not better.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 1 day ago to random

LaTeX pro tip:

If you need to write a simple fraction like 1/2, 1/4, 2/3 etc, where both the numerator and denominator are a single digit, you can just write \fracxy instead of \frac{x}{y}. I personally find this form to be far more readable with a screen reader and wish I discovered it sooner.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 1 day ago

@wizzwizz4 I was aware of this, but the particulars of LaTeX control sequences were too much for one toot, and this is the most common case for me.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

weirdwriter, 1 day ago to SmallWeb

On TTS audio description https://robertkingett.com/posts/6567/ #IndieWeb #SmallWeb #NoPaywall

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 16 hours ago

@jackf723 @vick21 @weirdwriter This is how audio description on TV works, at least in most of Europe. That's how it has to be, terrestrial and satellite bandwidth is very limited, and wasting it on tracks that are used very infrequently is just unacceptable. As a broadcaster, you have a choice between overpaying for bandwidth for very little benefit, converting the AD mix to mono at some horrendously low bitrate, or overlaying the low-bitrate, mono AD track on top of normal, high-quality audio. Most broadcasters go for the last option.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 16 hours ago

@jackf723 @vick21 @weirdwriter This isn't as easy as you think, movies from different sources may have different lengths, e.g. due to a PAL/NTSC difference, an extra Netflix logo etc.

The only approach which makes sense here is the german Greta system and its derivatives. It's essentially Shazam for movies, you pick a movie you want to watch, give it a short sample, and it syncs your audio description with the movie audio. The added benefit to this is that the AD is completely independent of the movie source, works in cinemas and can be played through your own headphones when watching a movie with sighted friends or family.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 16 hours ago

@jscholes @jackf723 @vick21 @weirdwriter huh, I assumed satellite systems worked in the same way that the terrestrial ones do. You might be right though.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 16 hours ago

@jscholes @jackf723 @vick21 @weirdwriter Poland does this somewhat regularly. We get audio description for quite a few football (soccer) matches here. It's quite surprising really, considering the fact that we barely get it for anything else. Soccer is the only thing that gets somewhat regular and consistent AD. The quality is quite crappy, I can't tell you the exact stream parameters but I know who to ask.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jasongorman, 4 days ago to random

If you had 10 minutes to explain to a group of programmers who are new to software development how to do it better, what would you tell/show them?

What's the least a dev can know that would make the biggest difference?

What are the ABCs of software development?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999

miki, 4 days ago

@jasongorman Two words. Get Copilot.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 4 days ago to random

I think the debacle with Google's AI reading THe Onion is a symptom of a deeper problem.

I bet that some people who don't speak English that well, aren't familiar with English-language online culture, don't recognize the Onion brand, or who have a low cognitive ability, could fall for the same trick and make the same mistake.

This is why I was never a big fan of the Onion, it is too easy to misinterpret and take it at face value, especially in those situations. I think the problem here is with the Onion publishing misinformation online and all the people giving it good search engine rankings by linking to it, not Google's AI.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 4 days ago

@shane Movies don't look like news articles. I guess you're sort of right when it comes to short stories that are specifically written to look like news articles, scientific papers, Wikipedia etc. Like this one for example https://qntm.org/mmacevedo

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

chikim, 4 days ago to random

VOLlama v0.1.4-beta.1: System Prompt manager; Import Awesome ChatGPT Prompts; Partial support for GPT-4O (Throws an error for token counter in some cases but just ignore for now); Able to attach entire document and feed for long context model. https://chigkim.github.io/VOLlama/

reply

expand (30)

collapse (30)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon

miki, 3 days ago

@simon @chikim @jscholes THey actually ban people for this apparently.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 4 days ago to random

Am I the only person who gets surprised when a C/C++ project actually builds successfully on first try?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Caoimhe, 4 days ago to random

Is it possible to read the content of an Excel cell letter by letter with NVDA?

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 4 days ago

@Caoimhe Press f2, though that only works if the text comes directly from the cell, not a formula. This trick is also pretty useful for advanced formula editing.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jscholes, 4 days ago to apple

Costco had #Apple #AirTags on sale. Having never used one, I decided to buy a couple. I now have two smooth, round things on my desk that apparently don't stick or attach to anything without additional hardware, that I guess I can... put in a box that I might lose? Not really sure I understand this product.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 4 days ago

@jscholes They're great to put in a bag if you're a bag-carrying person like I am. Wallets too. You can attach them to a keychain or keyring. People who own a vehicle often put one there, in case it gets stolen or even to find it when parked.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jcsteh, 5 days ago to random

As I understand it, with all current LLMs, having a conversation involves feeding the model the entire conversation up to this point. That is, there is no memory: the prompt you feed it just gets longer and longer. So how does that work with something like GPT-4O which could be processing audio and/or video at a much faster rate? Surely the prompts must get very large very quickly with anything beyond a short interaction? Doesn't that mean the responses take longer and cost more as the conversation gets longer?

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jaybird110127

miki, 4 days ago

@chikim @jcsteh Also there might be some caching involved. The expensive operation in LLMs is attention, which needs to be calculated for every pair of tokens, and that's O(n^2). However, when we're only adding a few new tokens to an already existing prompt, we only need to calcualte the new pairs, and that's just O(n+m*m), not O((n+m)^2). Most implementations throw all those calculations away after finishing every request. This makes sense, these attention vectors take up a lot of memory and there's usually load balancing involved, so even if you make a request with the same prompt, it's probably going to hit another instance. If you have a persistent connection to a single server and it's easy to determine exactly when this connection starts and ends, it might make sense to cache, which lowers the cost considerably.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

TheQuinbox, 5 days ago to random

Interesting observation: almost all of the blind hackers in my friend circle are bookworms, me included. I mean, some of us like audio over epub or vice versa, same with genres, but we're all bookworms. Wonder why?

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ datajake1999, miki

miki, 5 days ago

@TheQuinbox I feel like books for us are what movies are for others. At least for me, it takes a lot less mental effort to listen to a book (whether that be with TTS or audio) than to listen to a movie with AD. In this day and age, there's also a lot less stigma about books than there is about TV / games / social media, so you can read as much as you want, guilt-free.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 5 days ago to random

This whole Microsoft Recall thing makes me want to return to my "permanent storage of speech history" idea. Annotate it with some metadata like timestamps, app name and window title, stick it in a vector database for RAG, and some really interesting possibilities start to emerge.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 5 days ago

@zersiax yeah, this would need to be all local, maybe using Open AI for the RAG step itself if that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 5 days ago to microsoft

Why does #Microsoft want to implement #Recall? It's not about images. It's about modelling what workers do on Windows, and then replacing them.

The most expensive part of a computer is the fallible feelings-filled unpredictable meat sack that operates it.

Google has YouTube, Google Photos, Maps, and a bucket load of search data, Google Analytics, advertising, as well as it's #GCP data (e.g. #STT transcriptions). And a bunch of data from Android services. From this data they can model speech, model videos and model advertising systems, and how humans respond to them.

But they can't model what people do on computers.

Amazon has Prime data, and a bucket load of compute. But no operating system data. They can build models based around e-commerce and advertising systems.

But they can't model what people do on computers.

Meta has waves hands enough analytics to model human behaviour in the Metaverse.

But they can't model what people do on computers.

Microsoft has GitHub.
Microsoft has LinkedIn.
Microsoft has SharePoint.
Microsoft has Teams.
Microsoft has Dynamics.
Microsoft has O365.
Microsoft has Windows telemetry data.

Microsoft can model what people do on (Windows) computers. Like fill out spreadsheets.Write emails. Synthesize web pages of research. Interact with colleagues on Teams. Create and edit documents.

Microsoft wants #MicrosoftRecall data so they can model what people do with operating systems.

Then replace them.

Imagine a CoPilot that doesn't just write buggy code. Imagine one that also does spreadsheets. That creates documents on SharePoint. That communicates with colleages on Teams. That has a customer pipeline on Dynamics.

That's what Recall is about - 360 degree surveillance of the worker, to model their functions, make them fungible, replicable - and replaceable.

reply

expand (38)

collapse (38)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kaffeeringe, njtierney, Nerdfest, CerstinMahlow +33 more

miki, 5 days ago

@KathyReid The fatal flaw in this argument is the fact that recall data stays local and isn't sent to Microsoft

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

capital, 6 days ago to random

Microsoft recall is fucking insane.

Recall snapshots are kept on Copilot+ PCs themselves, on the local hard disk, and are protected using data encryption on your device and (if you have Windows 11 Pro or an enterprise Windows 11 SKU) BitLocker.

Your doing what? Microsoft wh-

Recall uses Copilot+ PC advanced processing capabilities to take images of your active screen every few seconds. [...]

[...] The default allocation for Recall on a device with 256 GB will be 25 GB, which can store approximately 3 months of snapshots. [...]

WHAT WHY NO ST-

Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers. That data may be in snapshots that are stored on your device, especially when sites do not follow standard internet protocols like cloaking password entry.

Microsoft please... th-the tech support scams... think about what happens if this gets bre-

Recall also does not take snapshots of certain kinds of content, including InPrivate web browsing sessions...

Oh, okay I guess that's san-

...in Microsoft Edge.

AAAAAAAAAAAAAAAAAAAAAAAA

It treats material protected with digital rights management (DRM) similarly; like other Windows apps such as the Snipping Tool, Recall will not store DRM content.

Ah, but of course. The DRM is protected...

reply

expand (28)

collapse (28)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rasterweb, mhuber1, nojhan, twipped +39 more

miki, 6 days ago

@capital The fact that DRM is protected isn't some kind of evil / malicious scheme by Microsoft, it's just how Windows (and literally all other systems, Linux included) works. No app, whether Microsoft or third-party, is allowed to touch that data.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

miki, 6 days ago

@minneyar @capital Passwords are already protected, how would they detect credit card numbers?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...