@dmwyatt@techhub.social
@dmwyatt@techhub.social avatar

dmwyatt

@dmwyatt@techhub.social

Experienced in software development, team coordination, and product strategy. Engaged in the intersection of technology, science, and philosophy. Looking to connect over innovative ideas and thoughtful discussions. Enjoys the balance of nature and tech.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Tekchip, to llm
@Tekchip@mastodon.social avatar

I've noticed lots of sites disabling copy/paste I presume in an attempt to prevent scraping to feed LLM training.

The fediverse doesn't seem to be doing this.

Does that mean the fediverse could have an outsized impact on LLM results as the places for LLM scrapers dwindles?

Is there a sort of fedi-consensus on if we want to help or hurt LLMs? How might we go about doing one or the other if we have more influence?

dmwyatt,
@dmwyatt@techhub.social avatar

@Tekchip

Unfortunately, copy/paste protection has zero effect on LLM scraping as scraping does not use copy/paste and copy/paste protection is trivial to bypass.

dmwyatt, to python
@dmwyatt@techhub.social avatar

collections.ChainMap from is pretty cool. One place I use it when I want to merge configuration from CLI, environment vars, and config files. You can do that like this...

dmwyatt,
@dmwyatt@techhub.social avatar

@isagalaev

That might be! The first time I ever used extensively was when I was writing a template engine. (Think Jinja2 or Django)

mjohanning, to ilaughed
@mjohanning@birds.town avatar

Does anyone know of a good (preferably free) tool / script that can sort photos into folders for year and month automatically by using the photos’ metadata? Seems like something that should be relatively simple.

dmwyatt,
@dmwyatt@techhub.social avatar

@mjohanning This is exactly something in the wheelhouse of or

This is something I could easily write myself, but also something I'd rather just offload to someone else.

https://chat.openai.com/share/efdc46a6-6bdd-4c8f-af96-ab7df9c3fd03

simon, to random
@simon@simonwillison.net avatar

A pattern I'm leaning more into at the moment is having code in the browser handle parsing difficult file formats, then that browser code sends the modified version back to an API on the server

Examples:

  • parse a huge CSV/TSV in the browser, send to backend in batches
  • ditto for .xlsx
  • load a PDF in the browser, extract text with PDF.js, send just the text to the server
  • similar trick, but turn the PDF into a JPEG for each page and submit those images
  • convert SVG to PNG in the browser
dmwyatt,
@dmwyatt@techhub.social avatar

@simon That's what I was thinking when I used tiktoken in the browser via WASM on this little project I whipped up that concatenates all the text files in a github repo for sharing with LLMs. https://gh-repo-dl.cottonash.com/ (https://github.com/dmwyatt/gh_repo_download)

Let the user's machine do that work so my digital ocean droplet running a ton of different services isn't spending time on it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • rosin
  • thenastyranch
  • ethstaker
  • DreamBathrooms
  • osvaldo12
  • magazineikmin
  • tacticalgear
  • Youngstown
  • everett
  • mdbf
  • slotface
  • ngwrru68w68
  • kavyap
  • provamag3
  • Durango
  • InstantRegret
  • GTA5RPClips
  • tester
  • cubers
  • cisconetworking
  • normalnudes
  • khanakhh
  • modclub
  • anitta
  • Leos
  • megavids
  • lostlight
  • All magazines