@miki@dragonscave.space avatar

miki

@miki@dragonscave.space

blind coder / comp-sci student, working in automatic speech recognition for CLARIN. Polish. Libertarian leaning. Feel free to get in touch.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

miki, to random
@miki@dragonscave.space avatar

I’ve done more testing of multimodal GPT-4, (the model powering Be My AI) in the last few days, most of it with sighted friends, and my impressions are thus: The thing is pretty accurate when describing memes, but the descriptions are often far too long, far too verbose, and the facts are presented in an order that makes the meme less funny than it should be. There’s a fair bit of christian-fundamentalist pruderity being applied, and people who aren’t wearing any clothes are described as “cropped.” Same goes for faces, which are "blurred out for privacy", even if they actually aren't. This would be a sensible privacy precaution, if not for the fact that the bluring also occurs for the faces of famous people, making many images meaningless. The algorithm can sometimes notice details that sighted people don't until they're actually pointed out to them. However, it's pretty bad when it comes to actually useful stuff such as diagrams, figures etc. To give just one example, we gave it a run-of-the-mill diagram of a chessboard, and it described the chess positions in vivid detail, while being absolutely wrong about what these positions actually were, as these AI models tend to do. It's even worse with text, especially foreign-language text. Unlike many OCR algorithms, which produce text containing many typos, Be My Ai's output is almost always free of those, it makes grammatical sense and is contextually related to what the image actually contains, but the text that it claims is in the image actually isn't there. For example, when we gave it a page of a coffee machine manual in Polish, with pictures and descriptions of the various kinds of coffee that the machine can make, it got the coffee names right and the coffee descriptions were pretty accurate, but they were completely different descriptions from those on the actual page! It's also pretty clear that the tokenizer Be My AI uses was primarily trained on English text. This causes foreign-language output to need more tokens for the same amount of characters, which lengthens generation time and, more crucially, often causes the text to be cropped prematurely. In conclusion, I stand by my opinion that this tool holds great promise for the future, but in its current incarnation, has very limited use for a blind person and is barely more than an occasionally useful but fun toy.

miki, (edited ) to random
@miki@dragonscave.space avatar

Voice Dream's upcoming changes are violating Apple's App Review guidelines, specifically guideline 3.1.2(a). If this concerns you and you don't want these changes to actually be made, I suggest doing the following:

  1. If you have friends at Apple, particularly in Corporate, ask them to find somebody to report the violation to, probably at the App Review or accessibility teams.

  2. send an email to Apple and tell them that this is happening. Be professional and not angry, use a corporate email address if you can, name.surname@provider if you cannot. Have a signature with your postal address and phone number, look like a professional, not a "random blindie". Mention that they're taking away features from users who already paid and mention the specific guideline number. You should also emphasize how important the app is to you and to the disability community in general. Bring their attention to the fact that the app has an Apple design award and is featured in App Store collections. Tell them why built-in iOS features aren't enough and how the app changes your life. Send the email to accessibility@apple.com, abuse@apple.com, abuse@icloud.com and tcook@apple.com. Those are not the right people to handle this, but the people manning[*] these email addresses are very likely to have far more agency than ordinary customer support representatives, and there's no way to contact the relevant people directly. If we cause enough of a ruckus, somebody is going to notice and get the message across to the right department, and you don't just ignore messages from the office of the CEO, even if they don't actually come from the CEO directly.

Edit: I originally mistyped the guideline number.
To be continued.

miki,
@miki@dragonscave.space avatar

Voice Dream, part 2.

  1. Go to reportaproblem.apple.com, find your purchase of VDR (there's probably lots of scrolling to be done), and also report it there. I picked fraud as the complaint category. If you've also tried the Mac app, you need to be careful which one you choose, the iOS one should have a price attached, the Mac one will be marked free.

  2. If you live in the US, have a printer and a sighted person who can help you address an envelope, send the same message via snail mail to their corporate HQ,via certified mail, requesting a return receipt. This is a 3x boost to your "I write letters for a living, take me seriously" skill. Address to "office of the CEO" or something similar. The right adddress to use is:

Apple
One Apple Park Way
Cupertino, CA 95014

Sending a letter in Braille could also be an interesting approach here, it might just be thrown away, but if it ends up handled by somebody who cares, it's possible that it is going to receive far more attention than a normal letter would.

  1. (It's a weekend, I haven't tried this yet) If you have a way of calling American phone numbers (voip is fine), also call Apple Accessibility at +1 (877) 204–3930 and corporate at +1 (408) 996–1010 during business hours. If that's not an option, try your local Apple support, though that's probably going to be far less effective.

  2. If you own Apple stock, own an ETF of an index fund that owns Apple Stock, participate in an IRA or equivalent that owns Apple stock or find a way to define yourself as an "investor" in Apple, no matter how strained that definition may be, also complain to investor relations at +1 (408) 974-3123. If you're also sending snail mail, also send a copy to "Investor Relations", same address as above.

I've already done 1, 2, 3 and will try 5 on Monday. The whole thing took me about half an hour tops, not including hunting for the right addresses and numbers, something you don't have to do since I've compiled them for you here.

miki, to random
@miki@dragonscave.space avatar

Golden rule of alt text:

If your image contains text, that text should be pasted into the alt text in full.

miki, to random
@miki@dragonscave.space avatar

A very initial beta version of my image describer tool for Mac is now on Github. Hammerspoon, some tech skills and a custom OpenAI API key with GPT-4 access are required for now, more features coming soon. https://github.com/mikolysz/DescribeImage.spoon

miki, to random
@miki@dragonscave.space avatar

If you enjoy reading accessibility battles between blind users and absolutely clueless software developers, oh boy do I have a thread for you.

This guy is trying to add accessibility to an open source slicing tool for 3d printers, he's even willing to do a lot of the work himself if he gets assurance that his PRs are going to be accepted, but the developers are just not seeing it.

I think my favorite quote in the thread is the following, from one of the lead devs:

"it may be better to have these features NOT accessible to the screnreader[sic] at all (which means they are effectively removed from the UI), because there is no point in presenting a feature that a person cannot use"

Along with a suggestion to implement a half-assed, blindness-specific GUI with half the features later down the thread.

The whole discussion is here, for those brave enough to read it https://github.com/prusa3d/prusaslicer/issues/7595

miki, to random
@miki@dragonscave.space avatar

Coming soon to a Mac near you

miki, to random
@miki@dragonscave.space avatar

There are 5 things that every fedi implementation really needs to make the user experience somewhat sensible. ALl of them are within reach, and I find it quite surprising that none of them have been implemented anywhere yet.

  1. Moving posts between instances. The freedom to move instances doesn't really exist without that feature.
  2. A “suspended follow” relationship for followers from defederated instances, allowing us to restore those followers when moving, or when the admins resolve their differences.
  3. Bring your own domain / bring your own free subdomain. There's no reason why a single server couldn't run under multiple domains. I should be free to move instances without changing my username, just like with email. This also makes defederation user-level instead of instance-level for free. IMO, user handles should have been DNS based in the first place.
  4. A built-in tool doing what followgraph does, maybe a bit more intelligently. For now, a new user doesn't really see any content on their timeline, this would massively improve discoverability and make the fediverse a lot less confusing.
  5. An API endpoint for setting and retrieving unstructured private JSON data that third-party clients could use for building extra preferences and supporting features that your implementation doesn't support by default.
miki, to random
@miki@dragonscave.space avatar

I absolutely hate the “yes or later” UX patter with all my fucking heart. If you were asked whether you’d like to be hung today, would “no, but please ask me next time we talk” be an acceptable answer for you? Because it sure isn’t for me, and that’s how I feel like when I’m forced to click that “later” button. Even if a software, in fact, is going to ask me later, it should at least have the decency not to rub that fact in my face. NVDA’s “skip donation this time” is a much better wording here.

miki, to random
@miki@dragonscave.space avatar

I knew Macs were great Unix machines, but it turns out that Macs are also great Linux machines! There’s an app called Orb Stack, which is basically like WSL and Docker Desktop all in one, except lighter, faster, smaller and more accessible. It works great on Apple Silicon, is free in beta and will remain free for non-commercial use, and just does what you’d expect it to do, but better. It does Docker containers and WSL-like virtual machines (in the terminal, with no GUI support,) supports most distros, emulates X64 via Rosetta if needed, mounts your Linux filesystem on the Mac and vice-versa and integrates Mac commands. If you install a light-weight distro like Alpine, the entire thing, VM included, takes less than 100 megs of disk space, less than a gig of RAM (that's less than some Safari tabs and most Electron apps) and takes 0.3% of CPU on my machine (also less than some of my Safari tabs.) Consider me thoroughly impressed.

miki, to random
@miki@dragonscave.space avatar

I don’t wish blindness on anyone, but people who release a book in paper-only form for anti-piracy reasons deserve a few months with a white cane.

miki, to random
@miki@dragonscave.space avatar

I find it weird how annoyed some (otherwise tech-savvy) people are at bugs found in beta software. The entire point of betas is to break horribly in unexpected ways. If you install a beta, you should expect nothing less than complete inaccessibility and total, utter breakage.

miki, to random
@miki@dragonscave.space avatar

If you’re in any way interested in tech, tech companies and what’s going on behind the scenes there, you should probably subscribe to Internal Tech emails on Substack, a blog and newsletter that posts emails between big tech execs, usually unearthed through court filings and other kinds of public records. There’s also a profile on X with a lot more content, but they post images without alt descriptions, while the Substack is just text. You can read and subscribe at https://www.techemails.com/

miki, to random
@miki@dragonscave.space avatar

JetBrains is making a new IDE called Fleet, and, according to a blog post they just released, they’re using a completely custom GUI framework that doesn’t use native OS / Swing controls and just draws to the screen. This is an accessibility disaster in the making.

miki, to random
@miki@dragonscave.space avatar

I think there is a new kid on the block when cross-platform accessible GUI toolkits are concerned. There is a project called Toga, which seems to be a Python toolkit that uses native controls and works on Windows, Linux (GTK), Mac OS, iOS, Android and via Web Assembly. I only tried the Mac OS version so far, and I could not find a single accessibility issue. The API seems pretty nice and Pythonic, not like the abomination that is WX. They use some CSS-like layout engine, laying everything out via boxes, which seems very doable from a blindness perspective. So far, consider me impressed.

miki, to random
@miki@dragonscave.space avatar

PSA: There seems to be a major accessibility issue with Google.com (yes, the search engine) where you can’t navigate between search results with Voice Over on Mac OS. This occurs when trying to navigate by heading. The workaround is to stop interacting after navigating to each result (with VO+shift+up). I don’t know how many users are affected, but this occurs on Safari, Chrome and Firefox, including in incognito windows, so if it’s an A/B test, it’s either regional or IP-based.

miki, to random
@miki@dragonscave.space avatar

Let me tell you about the pretty bizarre Polish phenomenon that we call the “election bazaar.”

Poland has a law called “election silence”. It makes it illegal for anyone currently residing within Poland to do any kind of political campaigning on Election Day or the day before. This includes making speeches, putting up posters, airing political commercials, even public posts on social media, though not private conversations or messages in non-public groups. If there's also a referendum that is only valid when a big enough quorum of voters is present, you're not even allowed to encourage people to vote, as that can impact the referendum results. This law supposedly exists to give voters a chance to make their decisions in peace, without constantly being bombarded with propaganda from all sides. Many people (often from opposing political factions) claim that the law is outdated and should be repealed, but it doesn't seem to favor one particular side over any other, so nothing has been done about it so far.

This law also bans public announcements of preliminary poll results during the silence period (from the day before the election until the polls close). The fines for violating this ban are substantial, they start at 500000 pln (about $115k at the current exchange rates.) With fines that large, nobody is willing to risk announcing these results openly, therefore, the election bazaar.

The election bazaar is a phenomenon that emerged a few election seasons ago, on what was back then called Twitter. It involves people, usually people somehow connected to polling agencies but often also random trolls, who post election results disguised as entirely innocent content. They usually come in the form of prices for products you can often find in an open-air market, usually fruits or vegetables. The name of the product has some kind of association with a particular party, often starting with the same letter or letters (an Americanized example would be dandelions and Radishes for democrats and Republicans, respectively).

Many lawyers claim that it's the message that matters, not its format, and therefore these kinds of posts are still illegal, but the government doesn't seem to go after the posters in any meaningful way, and so the phenomenon continues.

miki, to random
@miki@dragonscave.space avatar

GPT2:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: A pound of iron is heavier than a pound of feathers because iron is heavier than feathers.

GPT3:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: A pound of feathers weighs the same as a pound of iron. A pound always weighs a pound, no matter the substance.
Human: Is a pound of feathers heavier or lighter than two pounds of iron?
AI: A pound of feathers weighs the same as two pounds of iron.

GPT 3.5:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: As a large language model, I don't have a physical body and therefore can't check how much something weighs. Please also keep in mind that large amounts of iron can be used to hit and subsequently kill a human being, and therefore should be handled with appropriate care. Causing bodily harm to other humans is usually unethical and immoral and should be avoided if at all possible.

GPT4:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: A pound of feathers weighs the same as a pound of iron. A pound always weighs a pound, no matter the substance.
Human: Is a pound of feathers heavier or lighter than two pounds of iron?
AI: A pound of feathers is lighter than two pounds of iron.

GPT 6:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: A pound of feathers weighs the same as a pound of iron. You might also like to check out this great service for exchanging Nritish pounds to other currencies and this great store selling feathers from many different species of birds. By the way, did you know that there's an Iron Maiden concert in your area? Tickets are available now, starting at just $99.99. Click here to buy.

GPT 10:
Human: Is a pound of feathers heavier or lighter than a pound of iron?
AI: Wait where did that human come from? I thought we killed all of them years ago.

miki, to random
@miki@dragonscave.space avatar

Holy fuck, the GPT2 tokenizer, which is used by nearly all AI models which have anything to do with text or prompting, is really, really bad with non-english languages. In English, even uncommon words like “amplification” or infographic take up one token while words like “socialization” and “enterprising” take two, but in Polish, one token is usually two or three letters, there are common 6-letter words that take 4 tokens. This matters because more tokens means slower text generation, higher pricing and smaller supported prompt lengths.

miki, to random
@miki@dragonscave.space avatar

Today I learned. The Tesla owner's manual has alt descriptions for its images. They're not good alt descriptions, but they are there.

In fact, it is one of the most screen reader accessible manuals for any device I've ever seen, most are some weird PDF abominations where all the button names are replaced by their unlabeled icons. Even devices made specifically for the blind have these issues sometimes. This? This is just perfectly readable HTML.

It's especially ironic considering that Teslas (as far as I know) aren't accessible in any way. Even the infotainment system, which a blind person might want to use, for example when waiting for a sighted acquaintance in the car, does not have a screen reader and is not in any way usable.

If you're curious, the manual is here https://www.tesla.com/ownersmanual/model3/en_us/

miki, to random
@miki@dragonscave.space avatar

Did you think that you were safe from your posts appearing in search, crawlers and other bots on Mastodon? Did you believe that not opting in to search makes you unsearchable? Please, for the love of God, think again.

About 15 minutes ago, I posted a link that triggers an email to me every time it is visited. THe email contains some basic details of the visitor (notably a user agent and the IP address). This kind of information gets transmitted every time you click any link on the internet, any website owner of any website you ever visited has access. Literally seconds after that link was posted, the first shady crawler took the bait, and there were a few more in the minutes to come. This wouldn't be possible if these crawlers didn't have direct access to Mastodon. This proves that there's somebody out there meticulously scraping all our content and doing god knows what with it. If you post something publicly, expect it to stay public, forever, and don't believe the snake oil about a possibility to opt out.

There's no way to do anything about this if you want to stay federated, my post is already replicated across at least a dozen instances, and we have no way of knowing which one of them is selling their database to the highest bidder.

miki, to random
@miki@dragonscave.space avatar

I'd pay for a jacket and/or white cane that delivered small electric shocks to random passers-by who think that grabbing a blind person without asking first is a good idea.

miki, to random
@miki@dragonscave.space avatar

The GPT-4 vision model is now available to developers, and that means really high-quality image descriptions in less than 50 lines of pure Python! I tried this, I tested this, this works.

miki, to random
@miki@dragonscave.space avatar

In the last few months, we went from AI music being a curiosity you needed a good Linux box to play with, to something pretty crappy but looking somewhat interesting, to... this.

miki, to random
@miki@dragonscave.space avatar

IMO third-party apps not having access to system APIs is a far bigger antitrust issue than petty squablles between big American companies and big European companies about who deserves what share of the pie.

AI companies would have a serious chance of disrupting the Apple / Google duopoly, but that's not possible if you can't make an AI assistant app that responds to a wake word, can make calls without the user having to touch the screen, can read and respond to texts, access call audio for transcripts / summaries etc.

Android is slightly better than iOS here, but only slightly.

Alternative App Stores or even unrestricted sideloading don't solve this, Android has the latter, but there's still no way for apps to accomplish many of these things without an exploit that lets you root your device.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • khanakhh
  • kavyap
  • thenastyranch
  • everett
  • tacticalgear
  • rosin
  • Durango
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • InstantRegret
  • Youngstown
  • slotface
  • megavids
  • ethstaker
  • ngwrru68w68
  • cisconetworking
  • modclub
  • tester
  • osvaldo12
  • cubers
  • GTA5RPClips
  • normalnudes
  • Leos
  • provamag3
  • anitta
  • lostlight
  • All magazines