@yjunechoe@fosstodon.org
@yjunechoe@fosstodon.org avatar

yjunechoe

@yjunechoe@fosstodon.org

PhD candidate in Linguistics at the University of Pennsylvania studying psycholinguistics, language acquisition, and pragmatics. Sometimes writing R packages (ggtrace, jlmerclusterperm).

Mostly here for :rstats: #rstats, :julia: #JuliaLang, and ๐Ÿ“Š #dataviz. Interested in statistical computing & graphics, metaprogramming, and reproducible reports. I'm also active on the https://fosstodon.org/@R4DSCommunity slack.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Just passed my dissertation proposal defense - now officially ABD!!! ๐Ÿฅณ๐Ÿฅณ๐Ÿฅณ

Time to take a nap and enjoy my short break before I have to finish grading students' finals by tomorrow ๐Ÿ™ƒ

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Mehrad Thanks! 1 more year to go! (fingers crossed)

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Have there been any recent discussions / surveys on research software engineer (RSE) career paths via ? I keep coming back to this page in my searches (a focus group's meeting summary from useR2021) but nothing as comprehensive that's more recent:

https://user2021.r-project.org/blog/2021/09/04/role-of-r-in-research-software-engineering/

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Fun with { in

Getting LISP-y:

`{` <- \(...) sys.call()[-1]  
{sum; 1; 2; 3}  
#> sum(1, 2, 3)  

A "matrix literal":

`{` <- \(...) matrix(c(...), ncol = length(..1), byrow = TRUE)  
{ 1:3;  
 4:6 }  
#> [,1] [,2] [,3]  
#> [1,] 1 2 3  
#> [2,] 4 5 6  

BTW - Anyone know how to overload the { operator? It doesn't have formals unlike + and friends, so IDK how to method dispatch on the first argument...

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Trashing a base R function whose behavior I don't like, with an entire blog post reducing it to a meme.

Low effort, unhinged, and somewhat rage-fueled. But what better way to spend the first day of spring break?

https://yjunechoe.github.io/posts/2024-03-04-args-args-args-args/

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Just switched to rendering my #rstats pkg readme examples with {asciicast} to do justice to the countless hours I put into making pretty print methods with {cli} ๐Ÿ˜Š๐Ÿ˜Š๐Ÿ˜Š

https://github.com/yjunechoe/jlmerclusterperm

Readme example using asciicast to capture cli list bullets with original formatting

yjunechoe,
@yjunechoe@fosstodon.org avatar

@jonthegeek Oh interesting! Let me re-render the whole site (and not just the home) to see if that goes away ๐Ÿ™ƒ

FWIW it works if you swap out "/man" for "/reference" in the broken URL - https://yjunechoe.github.io/jlmerclusterperm/reference/figures/README-/setup-io-dark.svg

That's where the link should point to for dark mode imgs. Maybe the light/dark mode auto-switch is only designed to work with github out of the box, and need extra care for it to work in pkgdown... Will dig into it!

yjunechoe,
@yjunechoe@fosstodon.org avatar

@gaborcsardi @jonthegeek Oh wow this is great!! Thanks for the pointers!

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Things I wish base-R had vectorized versions of:

  • is.null()
  • isTrue() & isFALSE()
  • identical()
  • switch()

Please tell me if I'm overlooking anything. I want to relive my lengths() discovery moment ๐Ÿ˜‚

eliocamp, to climate
@eliocamp@mastodon.social avatar

I'm trying to find literature using mixed-effects models to evaluate simulations and coming up empty. Maybe google is failing me, but apparently it's not a thing? I'm surprised because moxed-effects seems like the perfect tool when analysing multiple climate models with multiple ensemble members each.

yjunechoe,
@yjunechoe@fosstodon.org avatar

@eliocamp Just forward search papers that cite lme4!

Oh wait, people rarely cite software in academic papers ๐Ÿคฆโ€โ™‚๏ธ

yjunechoe,
@yjunechoe@fosstodon.org avatar

@mwfc @eliocamp Good call! I stopped myself too short at the cynical remark - I should've advertised OpenAlex!

I'm actually one of the developers of {openalexR}, an R interface to the OpenAlex API. You can run oa_snowball() on the original Bates et al. paper and prune the output dataframe.

https://ropensci.github.io/openalexR/reference/oa_snowball.html

Elendol, to random
@Elendol@hachyderm.io avatar

I am currently trying the pointblank package to make data validation reports. I both hate it and love it... I can almost do what I want, but impossible to hack my way to bridge the gap, I would need to re-implement too much of it.

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Elendol Do you have a specific pain point in mind? I do agree that there's some unique limitations to pointblank (partly due to its yaml round-tripping constraint), but I've found that it works pretty well in my use case!

(I also ask because I've been contributing to pointblank recently to selfishly "tame it" for my usecases, and I'm interested in hearing about other usability issues worth addressing while I'm at it!)

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Elendol Ah yeah I kinda get what you mean for serially/conjointly - you'd have to get into metaprogramming territory to dynamically generate those steps. I can investigate if you have an example but you'd need something like rlang::inject() to generate steps that need to be specified with the ~ formula.

As for agent reports in quarto, there's some hiccupts partly due to some weird interactions with quarto and gt. By "tough to integrate", do you mean that the reports look mangled on render?

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Elendol Oh yeah targets is the gold standard for this kind of stuff. Composing more complex tests is definitely something pointblank could be better on for sure!

And in terms of scaling, do you find it slow to add multiple validations for big data? I actually recently merged a PR that resolved a pretty big performance issue in adding validation steps to a large agent. Perhaps you can give the dev version a try and see if it solve some of the headaches!

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Elendol I'm actually a bit surprised to hear this! I've personally found pointblank to be pretty good at scaling across data frames. Maybe we're thinking of "scale wide" in different senses, but in my use case I've had good success in wrapping multiple, thematically-relevant validation steps into one function, and add them to agents just like how you would add layers to a ggplot.

I don't know if you can share any details but I'd be interested to see what can be done to improve things there.

jonocarroll, to random
@jonocarroll@fosstodon.org avatar

Question for the hivemind:

My understanding was that R does string interning, but I (randomly) wanted to see it in action and couldn't observe such behaviour in a trivial example. If I run

x <- "uniquestring"
y <- "uniquestring"
print(pryr::address(x))
print(pryr::address(y))

in a non-interactive script I get two different addresses. ChatGPT seems to think this should produce the same address...

yjunechoe,
@yjunechoe@fosstodon.org avatar

@jonocarroll Ooo interesting - TIL! No idea what's going on in your function case, but some clues from .Internal(inspect()) for the first puzzle:

The memory address is read off of the outer STRSXP object, which (according to R manual) are essentially character vectors that point to the actual cached/interned string constant (memory address of CHARSXP is the same).

"Everything in R is a vector" strikes again?

yjunechoe, (edited )
@yjunechoe@fosstodon.org avatar

@gaborcsardi @jonocarroll Makes sense! just to clarify the "vector bites again" is just a personal vent for all the times when I thought I was looking at a scalar but turns out that I was just dealing with an R vector instead (like here, where memory address was being read off of the vector because "string" is unintuitively a length-1 a vector in R). ๐Ÿ˜ฌ

Mehrad, to random
@Mehrad@fosstodon.org avatar

I have a question to #tidyverse gurus out there: I'm using something like the following and I want to return the sampleID and column name in question if the if condition was met. This is sample code I put together just as an example to convey the message:

my_var |&gt;  
dplyr::group_by(sampleID) |&gt;  
dplyr::summarize_all(.funs = function(x){  
 if(any(is.na(x))){  
 message("...")  
 }  
}  

#RStats

yjunechoe,
@yjunechoe@fosstodon.org avatar

@Mehrad I see @impulse9 beat me to it but here's the one I was writing doing the same thing ๐Ÿ˜…

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

ggplot2 extension packages:

yjunechoe,
@yjunechoe@fosstodon.org avatar

@meghansharris ๐Ÿ˜‚

Also me making memes about ggplot extension packages while having published none of my own outside of github

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Hot take: People say R is special because "everything is vectorized" but really the specialness of R is more so due to the fact that everything is a vector (and vectorization is simply the consequence of that)

It's a subtle distinction but I think that's a better framing for understanding R's design choices.

Practically, it means that I now almost always show these property when teaching R:

"x" is c("x")
length("x") is 1

Alongside the one more talked about:

c(1, 2, 3) + 1 is c(2, 3, 4)

MikeMahoney218, to random
@MikeMahoney218@fosstodon.org avatar

Pre-allocating vectors is for nerds https://mm218.dev/posts/2023-08-29-allocations/index.html

yjunechoe,
@yjunechoe@fosstodon.org avatar

@MikeMahoney218 "Always pre-allocate your vectors" is a bit of an outdated dogma. Since v3.4, R already over-allocates memory by a small factor (x1.05) every time a vector needs to grow!

(I learned this from a Hadley tweet a long time ago but can't seem to locate the original discussion)

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Between getting/recovering from covid and the start-of-semester madness, I'd totally forgotten to share my pre-recorded #JSM2023 talk recording (which could not be played live due to tech constraints).

So here is "Sublayer Modularity in the Grammar of Graphics"! - https://www.youtube.com/watch?v=613Q0j6Kjm0

Hoping to turn it into a blog post / package vignette soon for the video-averse.

yjunechoe, to random
@yjunechoe@fosstodon.org avatar

Feeling a mix of nervous and excited for my first stats conf at #jsm2023 ๐Ÿฅบ

Please say hi! - I'm presenting my paper "Sub-layer modularity in the Grammar of Graphics" at the Statistical Computing & Graphics student award session

I'll be talking about {ggtrace} (again), this time focusing more on the what/why: "What is sublayer modularity and why should we care (more) about it as users/practitioners of the grammar of graphics?" Of course, still plenty of R and ggplot that I'll nerd out about ๐Ÿ˜ƒ

yjunechoe,
@yjunechoe@fosstodon.org avatar

Welp I'm very bummed to report that I've just had to cancel my JSM trip bc I tested positive for covid ๐Ÿ˜“ I've managed to avoid it so far but I guess my luck ran out (probably got it from my flight back to US from Korea - people don't mask anymore ๐Ÿฅฒ)

I don't want the talk prep to go to waste so working on getting a recording out there (TBD). Hope to join another year!

yjunechoe,
@yjunechoe@fosstodon.org avatar

@ijlyttle Thanks! I hope so too. Hope all goes well at your session!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • โ€ข
  • JUstTest
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • osvaldo12
  • ngwrru68w68
  • GTA5RPClips
  • provamag3
  • InstantRegret
  • everett
  • Durango
  • cisconetworking
  • khanakhh
  • ethstaker
  • tester
  • anitta
  • Leos
  • normalnudes
  • modclub
  • megavids
  • lostlight
  • All magazines