yjunechoe

@yjunechoe@fosstodon.org

PhD candidate in Linguistics at the University of Pennsylvania studying psycholinguistics, language acquisition, and pragmatics. Sometimes writing R packages (ggtrace, jlmerclusterperm).

Mostly here for :rstats: #rstats, :julia: #JuliaLang, and 📊 #dataviz. Interested in statistical computing & graphics, metaprogramming, and reproducible reports. I'm also active on the https://fosstodon.org/@R4DSCommunity slack.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

yjunechoe, 9 days ago to random

Just passed my dissertation proposal defense - now officially ABD!!! 🥳🥳🥳

Time to take a nap and enjoy my short break before I have to finish grading students' finals by tomorrow 🙃

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 9 days ago

@Mehrad Thanks! 1 more year to go! (fingers crossed)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 1 month ago to random

Have there been any recent discussions / surveys on research software engineer (RSE) career paths via #rstats ? I keep coming back to this page in my searches (a focus group's meeting summary from useR2021) but nothing as comprehensive that's more recent:

https://user2021.r-project.org/blog/2021/09/04/role-of-r-in-research-software-engineering/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stevensanderson, gmschroe

yjunechoe, 2 months ago to random
Fun with { in #rstats

Getting LISP-y:
`{` &lt;- \(...) sys.call()[-1]  
{sum; 1; 2; 3}  
#&gt; sum(1, 2, 3)  
A "matrix literal":
`{` &lt;- \(...) matrix(c(...), ncol = length(..1), byrow = TRUE)  
{ 1:3;  
 4:6 }  
#&gt; [,1] [,2] [,3]  
#&gt; [1,] 1 2 3  
#&gt; [2,] 4 5 6  
BTW - Anyone know how to overload the { operator? It doesn't have formals unlike + and friends, so IDK how to method dispatch on the first argument...
reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mehrad

yjunechoe, 2 months ago to random

Trashing a base R function whose behavior I don't like, with an entire blog post reducing it to a meme.

Low effort, unhinged, and somewhat rage-fueled. But what better way to spend the first day of spring break?

https://yjunechoe.github.io/posts/2024-03-04-args-args-args-args/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ brodriguesco

yjunechoe, 3 months ago to random

Just switched to rendering my #rstats pkg readme examples with {asciicast} to do justice to the countless hours I put into making pretty print methods with {cli} 😊😊😊

https://github.com/yjunechoe/jlmerclusterperm

Readme example using asciicast to capture cli list bullets with original formatting

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Drmowinckels

yjunechoe, 3 months ago

@jonthegeek Oh interesting! Let me re-render the whole site (and not just the home) to see if that goes away 🙃

FWIW it works if you swap out "/man" for "/reference" in the broken URL - https://yjunechoe.github.io/jlmerclusterperm/reference/figures/README-/setup-io-dark.svg

That's where the link should point to for dark mode imgs. Maybe the light/dark mode auto-switch is only designed to work with github out of the box, and need extra care for it to work in pkgdown... Will dig into it!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 3 months ago

@gaborcsardi @jonthegeek Oh wow this is great!! Thanks for the pointers!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 3 months ago to random

Things I wish base-R had vectorized versions of:

is.null()

isTrue() & isFALSE()

identical()

switch()

Please tell me if I'm overlooking anything. I want to relive my lengths() discovery moment 😂

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

eliocamp, 4 months ago to climate

I'm trying to find literature using mixed-effects models to evaluate #climate simulations and coming up empty. Maybe google is failing me, but apparently it's not a thing? I'm surprised because moxed-effects seems like the perfect tool when analysing multiple climate models with multiple ensemble members each.

#AcademicChatter #RStats

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mwfc

yjunechoe, 4 months ago

@eliocamp Just forward search papers that cite lme4!

Oh wait, people rarely cite software in academic papers 🤦‍♂️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 4 months ago

@mwfc @eliocamp Good call! I stopped myself too short at the cynical remark - I should've advertised OpenAlex!

I'm actually one of the developers of {openalexR}, an R interface to the OpenAlex API. You can run oa_snowball() on the original Bates et al. paper and prune the output dataframe.

https://ropensci.github.io/openalexR/reference/oa_snowball.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mwfc

Elendol, 5 months ago to random

I am currently trying the pointblank package to make data validation reports. I both hate it and love it... I can almost do what I want, but impossible to hack my way to bridge the gap, I would need to re-implement too much of it. #rstats

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 5 months ago

@Elendol Do you have a specific pain point in mind? I do agree that there's some unique limitations to pointblank (partly due to its yaml round-tripping constraint), but I've found that it works pretty well in my use case!

(I also ask because I've been contributing to pointblank recently to selfishly "tame it" for my usecases, and I'm interested in hearing about other usability issues worth addressing while I'm at it!)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 5 months ago

@Elendol Ah yeah I kinda get what you mean for serially/conjointly - you'd have to get into metaprogramming territory to dynamically generate those steps. I can investigate if you have an example but you'd need something like rlang::inject() to generate steps that need to be specified with the ~ formula.

As for agent reports in quarto, there's some hiccupts partly due to some weird interactions with quarto and gt. By "tough to integrate", do you mean that the reports look mangled on render?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 5 months ago

@Elendol Oh yeah targets is the gold standard for this kind of stuff. Composing more complex tests is definitely something pointblank could be better on for sure!

And in terms of scaling, do you find it slow to add multiple validations for big data? I actually recently merged a PR that resolved a pretty big performance issue in adding validation steps to a large agent. Perhaps you can give the dev version a try and see if it solve some of the headaches!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 5 months ago

@Elendol I'm actually a bit surprised to hear this! I've personally found pointblank to be pretty good at scaling across data frames. Maybe we're thinking of "scale wide" in different senses, but in my use case I've had good success in wrapping multiple, thematically-relevant validation steps into one function, and add them to agents just like how you would add layers to a ggplot.

I don't know if you can share any details but I'd be interested to see what can be done to improve things there.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonocarroll, 7 months ago to random

Question for the #rstats hivemind:

My understanding was that R does string interning, but I (randomly) wanted to see it in action and couldn't observe such behaviour in a trivial example. If I run

x <- "uniquestring"
y <- "uniquestring"
print(pryr::address(x))
print(pryr::address(y))

in a non-interactive script I get two different addresses. ChatGPT seems to think this should produce the same address...

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 7 months ago

@jonocarroll Ooo interesting - TIL! No idea what's going on in your function case, but some clues from .Internal(inspect()) for the first puzzle:

The memory address is read off of the outer STRSXP object, which (according to R manual) are essentially character vectors that point to the actual cached/interned string constant (memory address of CHARSXP is the same).

"Everything in R is a vector" strikes again?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 7 months ago (edited 7 months ago)

@gaborcsardi @jonocarroll Makes sense! just to clarify the "vector bites again" is just a personal vent for all the times when I thought I was looking at a scalar but turns out that I was just dealing with an R vector instead (like here, where memory address was being read off of the vector because "string" is unintuitively a length-1 a vector in R). 😬

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Mehrad, 7 months ago to random
I have a question to #tidyverse gurus out there: I'm using something like the following and I want to return the sampleID and column name in question if the if condition was met. This is sample code I put together just as an example to convey the message:
my_var |&gt;  
dplyr::group_by(sampleID) |&gt;  
dplyr::summarize_all(.funs = function(x){  
 if(any(is.na(x))){  
 message("...")  
 }  
}  
#RStats
reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 7 months ago

@Mehrad I see @impulse9 beat me to it but here's the one I was writing doing the same thing 😅

$my_var <- data.frame( sampleID = gl(3, 2, labels = letters[1:3]), col1 = c(1, 2, 3, 4, NA, 6), col2 = c(T, NA, F, F, T , T) ) my_var #> sampleID col1 col2 #> 1 a 1 TRUE #> 2 a 2 NA #> 3 b 3 FALSE #> 4 b 4 FALSE #> 5 c NA TRUE #> 6 c 6 TRUE library(dplyr) my_var |> group_by(sampleID) |> summarize(across(everything(), function(x) { if(anyNA(x)) { message( "Messaging from ", "sampleID=", cur_group()$sampleID, ", column=", cur_column()) } # This last part just to make `summarize()` run x[1] })) #> Messaging from sampleID=c, column=col1 #> Messaging from sampleID=a, column=col2 #> # A tibble: 3 × 3 #> sampleID col1 col2 #> <fct> <dbl> <lgl> #> 1 a 1 TRUE #> 2 b 3 FALSE #> 3 c NA TRUE$

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 7 months ago to random

ggplot2 extension packages:

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 7 months ago

@meghansharris 😂

Also me making memes about ggplot extension packages while having published none of my own outside of github

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 8 months ago to random

Hot take: People say R is special because "everything is vectorized" but really the specialness of R is more so due to the fact that everything is a vector (and vectorization is simply the consequence of that)

It's a subtle distinction but I think that's a better framing for understanding R's design choices.

Practically, it means that I now almost always show these property when teaching R:

"x" is c("x")
length("x") is 1

Alongside the one more talked about:

c(1, 2, 3) + 1 is c(2, 3, 4)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kkarhan

MikeMahoney218, 8 months ago to random

Pre-allocating vectors is for nerds https://mm218.dev/posts/2023-08-29-allocations/index.html

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ojala

yjunechoe, 8 months ago

@MikeMahoney218 "Always pre-allocate your vectors" is a bit of an outdated dogma. Since v3.4, R already over-allocates memory by a small factor (x1.05) every time a vector needs to grow!

https://github.com/wch/r-source/commit/12aa371f88e5ece5015246e4f4b3e0b2b7f21639

https://github.com/wch/r-source/commit/37388191f5493825923b98246ec02e9b13ab1770

(I learned this from a Hadley tweet a long time ago but can't seem to locate the original discussion)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ urswilke

yjunechoe, 8 months ago to random

Between getting/recovering from covid and the start-of-semester madness, I'd totally forgotten to share my pre-recorded #JSM2023 talk recording (which could not be played live due to tech constraints).

So here is "Sublayer Modularity in the Grammar of Graphics"! - https://www.youtube.com/watch?v=613Q0j6Kjm0

Hoping to turn it into a blog post / package vignette soon for the video-averse.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ijlyttle

yjunechoe, 9 months ago to random

Feeling a mix of nervous and excited for my first stats conf at #jsm2023 🥺

Please say hi! - I'm presenting my paper "Sub-layer modularity in the Grammar of Graphics" at the Statistical Computing & Graphics student award session

I'll be talking about {ggtrace} (again), this time focusing more on the what/why: "What is sublayer modularity and why should we care (more) about it as users/practitioners of the grammar of graphics?" Of course, still plenty of R and ggplot that I'll nerd out about 😃

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ minecr

yjunechoe, 9 months ago

Welp I'm very bummed to report that I've just had to cancel my JSM trip bc I tested positive for covid 😓 I've managed to avoid it so far but I guess my luck ran out (probably got it from my flight back to US from Korea - people don't mask anymore 🥲)

I don't want the talk prep to go to waste so working on getting a recording out there (TBD). Hope to join another year!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yjunechoe, 9 months ago

@ijlyttle Thanks! I hope so too. Hope all goes well at your session!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...