@embiggenData@c.im
@embiggenData@c.im avatar

embiggenData

@embiggenData@c.im

Data Science is my thing. Fan of FOSS. A good day is being able to write code in R with no meetings and nary a Microsoft product in sight. Introvert, perfectionist, lover of excellence.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

stevensanderson, to Glue
@stevensanderson@mstdn.social avatar

Some one just reached out looking to extract values from a cell that are produced by from the

Example a cell value like 251 (13%) they just want the 251, so I did something like this:

library(tidyverse)
tibble(
value = glue::glue("{11:20} ({1:10}%)"),
reged_val = str_extract(value, "\d+(?=\W|$)") |>
as.numeric()
)

A tibble: 10 × 2

value reged_val
<glue> <dbl>
1 11 (1%) 11

Thoughts?

#R

embiggenData,
@embiggenData@c.im avatar
RickiTarr, to random
@RickiTarr@beige.party avatar

If you have to do something you really don't want to do, how do you get yourself to do it?

embiggenData,
@embiggenData@c.im avatar

@RickiTarr I pretend I'm the manager of the person putting off Doing the Thing. From that perspective the excuses usually sound pathetic, and manager me tells whiny baby me to just get it done. Then I still put it off, forget any of this ever happened, and enter the "find out" phase.

ramikrispin, to python
@ramikrispin@mstdn.social avatar

Me last night, at the moment I realized (in a six-month delay) that HuggingFace supports the deployment of Shiny apps (both R and Python) 🤗

Image credit: Giphy

video/mp4

embiggenData,
@embiggenData@c.im avatar

@ramikrispin Thanks, I wasn't aware either that you can deploy Shiny apps on Hugging Face! https://huggingface.co/docs/hub/spaces-sdks-docker-shiny

embiggenData, to HashtagGames
@embiggenData@c.im avatar


Killers of the Flower Moonpie

carnage4life, to random
@carnage4life@mas.to avatar

When newspapers say they are “owed money” from Google and Facebook, what they are really arguing is that they’d be making more money if these sites didn’t exist so they should get a cut of their revenue.

The problem is no one is owed a business model. Makers of camcorders & digital cameras aren’t lobbying that Apple owes them money because 17% of iPhone usage is taking photos & videos.

Newspapers are making a ridiculously self serving argument because most readers don’t want to pay for news.

embiggenData,
@embiggenData@c.im avatar

@carnage4life
@pluralistic has written about this several times and has provided ideas for remediation: https://www.eff.org/deeplinks/2023/04/saving-news-big-tech

scheidegger, to random
@scheidegger@mastodon.social avatar

let's say you had absolutely zero common sense and a modicum of spare money, and wanted to put together a personal compute farm.

What's the right move? A bunch of ec2 instances? An 8u rack in the garage (or some closet, or a local colo facility like it's 2002)?

Have any of you done something like this? Under what budget?

embiggenData,
@embiggenData@c.im avatar

@scheidegger Sounds like you need Grandson of Anton

dgar, to random
@dgar@aus.social avatar

I’m putting on my glasses.

embiggenData,
@embiggenData@c.im avatar
Cmastication, to random
@Cmastication@mastodon.social avatar

Every time I touch Power BI I walk away flabbergasted that this is an enterprise tool that actual adults use in the real world. It is absolutely unintelligible to me. The most trivial things like “group this date/time field by day before plotting” seem impossible. Googling is horrible because of the false positives and vague language. I’m going to end up coding a Voila notebook instead, I guess.

embiggenData, (edited )
@embiggenData@c.im avatar
Cmastication, to random
@Cmastication@mastodon.social avatar

Yeah, my kids out there on the street reppin’ the brands she loves…

embiggenData,
@embiggenData@c.im avatar

@Cmastication Allow me to introduce my son, "Cheetos Man" (doing his best Blue Steel)

image/jpeg

malwaretech, to random

Anyone know what software is used to make these kinds of graphs?

image/png
image/png

embiggenData,
@embiggenData@c.im avatar

@malwaretech I can't tell for sure what software created those specific graphs, but is a great tool for this. https://www.lucidchart.com/pages/product

Some_Emo_Chick, to random
@Some_Emo_Chick@mastodon.social avatar

Load bearing tuna

embiggenData,
@embiggenData@c.im avatar

@Some_Emo_Chick Chicken of the sea-ment

Alice, to random
@Alice@beige.party avatar

This season always reminds me that, at some point in the past, someone bit into a pumpkin and somehow decided it was food and subsequently convinced a bunch of other people to believe the same.

embiggenData,
@embiggenData@c.im avatar

@Alice An English-speaking coworker and I were eating with Chinese colleagues in China. For each course, we Americans asked the names of the ingredients. Usually someone would think briefly and provide the English translation. We had some laughs at more exotic cuisine, like Chicken Knuckles.

Then came a big stumper. The Chinese contingent huddled together, engaging in a serious debate for several minutes. Every so often they would look our way with a confounded expression, and go back to their discussion.

Finally, their spokesperson came back with a verdict. I was nervous--I didn't want to offend them but wasn't sure I wanted to nosh a mystery food that required this much deliberation to explain.

He started out hesitantly. "In America, you... just LOOK at this." Uh oh, this can't be good! Then he got a huge grin. "But in China, we EAT it!!!" Now we were really puzzled / afraid.

But after a few more minutes of inquiry, we finally understood what it was. Pumpkin! They knew about American Jack o' Lanterns (try translating/explaining that name) but weren't familiar with pumpkin pie.

khalidabuhakmeh, to random
@khalidabuhakmeh@mastodon.social avatar

I’m running a experiment. Please boost for reach.

When I say VS Code, what is the first word that comes to mind?

Please reply, but don’t peek at other people’s responses until you’ve done yours.

embiggenData,
@embiggenData@c.im avatar
mrworthington, to random

Recently used arrow + duckdb to get some SQL practice in and blogged about it. Was blown away that this doc rendered even though the dataset was originally 10GB in size.

On a side note: does anyone know if you can use arrow::open_dataset() on a pins parquet or arrow object?

https://www.mrworthington.com/articles/rstats/gnarly-data-arrow-sql-duckdb/

embiggenData,
@embiggenData@c.im avatar

@grrrck @mrworthington Yes, works great with . I'd also highly recommend checking out https://docs.rilldata.com/, which uses DuckDB.

alternativeto, to random
@alternativeto@mas.to avatar

Junior devs writing comments be like

embiggenData,
@embiggenData@c.im avatar

@alternativeto The LLM trainers thank them

Private
embiggenData,
@embiggenData@c.im avatar

@eliocamp @rstats ?factor could be clarified and streamlined. It would also be helpful to mention why factors can be beneficial: they let you specify the sort order of text values in tables and plots; converting a low-cardinality character column in a large data.frame can save space and let you maximize your memory; etc.

embiggenData,
@embiggenData@c.im avatar

@bbolker @eliocamp @rstats @tslumley Oh that's interesting. Was not aware of CHARSXP cache--thanks for the info!

On my machine this outputs 2.4GB:

print(object.size(rep(rownames(mtcars), 1e7)), units="Gb")

But the factor version only requires 1.2GB:

print(object.size(rep(as.factor(rownames(mtcars)), 1e7)), units="Gb")

This SO answer explains the 2X memory consumption is due to 64-bit storage for character vs 32-bit for integer: https://stackoverflow.com/a/34865113/1344789.

When I increase the repetitions to 4e7 for both, the character version gives "vector memory exhausted" but the factor version works. Maybe that's not a fair comparison though?

I guess what I'm trying to say is that I've noticed storing text as factor has helped me "in the real world" work with datasets that are at the memory limits.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • tester
  • kavyap
  • DreamBathrooms
  • modclub
  • GTA5RPClips
  • InstantRegret
  • magazineikmin
  • Youngstown
  • thenastyranch
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • ngwrru68w68
  • JUstTest
  • Leos
  • normalnudes
  • provamag3
  • cisconetworking
  • osvaldo12
  • everett
  • Durango
  • tacticalgear
  • anitta
  • megavids
  • ethstaker
  • cubers
  • lostlight
  • All magazines