Mehrad,
@Mehrad@fosstodon.org avatar

When I read R code and see people have very liberally used tons of packages like there is no tomorrow, I cringe. This is bad when they are doing it in research. Basically what they are doing is addicting their analysis and research to tons of packages. Like any other addition, when they don't get their fix, it's gonna hurt, and gonna hurt bad. Their research's reproducibility is close to non-existence as none of those packages will be around for ever. Pick dependencies carefully.

geospacedman,
@geospacedman@mastodon.social avatar

@Mehrad Its even worse when they mention tons of packages (usually via the one-liner of library(tidyverse)) and then the only usage is one instance of filter from dplyr...

milesmcbain,
@milesmcbain@fosstodon.org avatar

@Mehrad

Can we agree that the options are:

  1. Use well designed abstractions designed by people with more software engineering skills and time with the problem at hand.

  2. Use your own abstractions which will be less general, and more brittle owing to lower software engineering skills and less time with problem. So harder to understand and reuse.

  3. Use no abstractions and commit to many more lines of code, and loss of ability to represent domain knowledge in code. Hardest to understand.

LeafyEricScott,
@LeafyEricScott@fosstodon.org avatar

@Mehrad On the other hand, if you can use someone else's well-tested code (unit tests + large user base) rather than re-inventing the wheel every time, that's a good thing! I more often have the same cringe feeling when looking at other people's R code where they don't use packages liberally.

eliocamp,
@eliocamp@mastodon.social avatar

@LeafyEricScott @Mehrad Yeah. "Do one thing and do it well" implies using lots of packages. Leveraging the awesome community is good. Otherwise, just dismantle CRAN, retire all packages and write code like an elderly recluse in a cave.

Mehrad,
@Mehrad@fosstodon.org avatar

@eliocamp
I didn't say you should not have any dependency. What I said is that be vigelant and careful about what your are including. If you are writing a hobby code or some homework, it is fine, but if you need to publish the code and others should be able to run it on their mashine for decades in future, be careful. I personally give positive edge to packages.

On the same note, using {renv} or {packrat} is super essential if a reproducibility is important.
@LeafyEricScott

defuneste,
@defuneste@fosstodon.org avatar

@Mehrad @eliocamp @LeafyEricScott

I think this "dependency issue" (if it is really an issue) is more related to the time researchers as individual or team dedicate to code review. When I was a researcher my main goal was the result and it was hard to get a coworker review my code. A good review will have corrected the trouble you are describing.

To improve on that we need that folks are not afraid of sharing their codes.

willball12,

@defuneste @Mehrad @eliocamp @LeafyEricScott ironically some of the comments on this thread are good examples of why some people will not want to share their code - in case someone comes along to aggressively tell them they are lazy & a wannabe

Mehrad,
@Mehrad@fosstodon.org avatar

@willball12
Good for you. If you want to fool yourself and pretend you don't understand what I meant, go ahead. Some people like @eliocamp are legend in pointing fingers and shouting when they don't have a solid argument about the real discussed issue.

If you publish your research code and it should be reproducible, you should comment the code, put it in proper format and don't be lazy. Really that simple. The rest is semantics.

My codes are public, constructive criticism is most welcome.

LeafyEricScott,
@LeafyEricScott@fosstodon.org avatar

@Mehrad I agree though that one should choose packages that will be around for a while (widely used, actively developed, etc), but a big part of reproducibility is can someone else read and understand your code. Using R packages rather than writing your own functions for everything usually improves the readability of your code (and therefore one important aspect of reproducibility)

Mehrad,
@Mehrad@fosstodon.org avatar

@LeafyEricScott all that is true, but what have been observing recently especially with the grow of wannabees "data scientists" with 5 hour experience on iris data is that they use the packages because they are lazy. They choose a package because it does something trivial or just because it us a FUD. Additionally, many packages have their own pre-assumptions or expectations from data. Blindly mashing them because it is sexy and cool is just bad.

Mehrad,
@Mehrad@fosstodon.org avatar

@LeafyEricScott I have seen many times that they import {glue} and {dplyr} just for creating a ggplot legend and for doing a single super simple subsetting with dplyr::filter. This is being lazy imho. If you want to select only females from a table based on a gender/sex column, just use subset from base, no need to load dplyr + magitr + ... just for a single binary selection.
I use {dplyr} a lot, but loading a dependency should worth it. Blindly loading a package is recipe for disaster imho.

eliocamp,
@eliocamp@mastodon.social avatar

@Mehrad @LeafyEricScott Your tone ('wannabees "data scientists"', 'lazy') is extremely gatekeepy and toxic. People in the community have different experiences, levels of expertise, and needs. This demeaning language has no place in our community.

LeafyEricScott,
@LeafyEricScott@fosstodon.org avatar

@eliocamp @Mehrad yeah, I’m a professional data scientist and I would totally load dplyr just to do some simple data wrangling. Loading packages is almost always orders of magnitude faster than remembering how to do things in base R for me

Mehrad,
@Mehrad@fosstodon.org avatar

@LeafyEricScott you got my point wrong. I'm not saying dplyr is bad. What I'm saying is that loading a dependency for code that should be run for years to come on other people's computer needs careful selection of dependencies. Are you against that?

RichardShaw,
@RichardShaw@mastodon.scot avatar

@Mehrad @LeafyEricScott

Writing perfect code is very rarely important. Given time pressures most of the time it just has to be good enough to do the job. Being polite and respectful of other is people is almost always important.

Mehrad,
@Mehrad@fosstodon.org avatar

@RichardShaw
Different code is for different purposes. As I said before, if you are doing a quick code or personal or explorative, go ahead and load any package‌, but if youe code should be reproducible, take the time to pick dependencies carefully before publishing your code.

Maybe taking things in their context is something that should be considered as being polite 😉
Know me before pointing fingers. Read my other toots, etc.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • magazineikmin
  • InstantRegret
  • thenastyranch
  • cubers
  • Youngstown
  • ethstaker
  • slotface
  • mdbf
  • rosin
  • Durango
  • kavyap
  • GTA5RPClips
  • khanakhh
  • JUstTest
  • tacticalgear
  • ngwrru68w68
  • cisconetworking
  • modclub
  • everett
  • osvaldo12
  • normalnudes
  • provamag3
  • anitta
  • tester
  • Leos
  • megavids
  • lostlight
  • All magazines