Kidding aside, every tool has its own #superpowers and #shortcomings, and I know that Power BI can do certain things that Tableau can't do, but I'm also sure that the reverse is true as well.
Erin and I frequently talk about the #ToolsAndTechniques we use at work during our lunches and evening walks, and I'm genuinely looking forward to #learning more about #TheDarkSide from her, as we continue honing each other's minds like iron sharpening iron. ⚔️
I managed to simulate data and use those to calculate power for an upcoming experiment. However, power is highly variable because there is quite some variation in the real data. In some cases I need only 16 blocks for 90% power, whereas in others not even 20 blocks is enough. How would you proceed?
I used the simr package with a generalized linear model to calculate power.
Do you use Continuous Integration in your #bioinformatics or #datascience projects or know of projects that do? If so, can you provide the link to the example? If not, why? Do you feel it would take too much time?
(1/3) Meta released Code Llama 🚀 today - an LLM for code generation. It is built on top of Llama 2, and it includes the following functionality:
✅ Code generation based on user prompts
✅ Code completion
✅ Code debugging
✅ Supporting languages such as Python, C++, Java, PHP, Typescripts (JS), C#, and Bash
The Cookbook Polars for R, by Damien Dotta, is a new book that provides an introduction to the R version of Polar with practical examples. In addition, the book provides a side-by-side comparison, when applicable, to other data packages in R, such as base R, dplyr, and data.table.
:gt: @graph_tool is a comprehensive and efficient :python: Python library to work with networks, including structural, dynamical and statistical algorithms, as well as visualization.
It uses :cpp: C++ under the hood for the heavy lifting, making it quite fast.
Request for help from anyone with #rstats package development experience or knowledge of time data, especially if you've worked with .ical files before: checks failing in the {calendar} package preventing updated on CRAN and I'm not sure why 🤷 . Thanks to new contributors for reviving this package after ~5 years dev hiatus! Please spread the word @rOpenSci and anyone in this #foss for #DataScience (or at least dates) space! Details: https://github.com/ATFutures/calendar/issues/50
(1/2) Moirai - Salesforce's Foundation Forecasting Model 🚀
Salesforce recently released Moirari - a new #Python 🐍 library with a foundation model for time series forecasting applications. According to the release blog - the model comes with universal forecasting capabilities and can handle multiple scenarios and different frequencies.
(1/4) Setting A Dockerized Python Environment — The Hard Way
I create a (relatively) short tutorial about setting up a dockerized 🐳 Python 🐍 environment on the command line (CLI). Generally, I don't advocate anyone to set their Python development workflow via the CLI. There are better tools to work with Python and Docker, such as VScode with the Dev Containers extension. 🧵👇🏼
This is an excellent tutorial from @debruine on how to create a power simulation in #Rstats and call the simulation n times with parameters in a dataframe. Which is so useful for wanting to check how power is affected by different aspects of study design
But I get messages in R that purrr::pmap_dfr are superseded, so are we supposed to switch to a different set of functions for passing a dataframe of parameters to a function?
I'm really happy to be speaking at the NHS-R conference on Wednesday 11th October 2023! I'll talk about how to build data science projects that are reproducible!To sign up https://nhsrcommunity.com/events/
The size of the Docker image could quickly increase during the build time. I became more mindful of the image size when I started to deploy on Github Actions. The bigger the image size, the longer the run time and the higher the runtime cost.
This is when you should consider using a multi-stage build 🚀.
Let's say there's a periodic process that I'm sampling. The period is changing slowly (<1% per cycle). I get a sample on a lot of the cycles, but not necessarily every one.
I'm sure I can bodge together an #algorithm to figure out the "fundamental period" and how it is changing over time, but I also bet something already exists.
I think in PowerShell and can manage in Python. I want to learn Rust to the degree I can write in it directly, rather than prototyping in PowerShell and then converting.
A lot of what I do is data manipulation and analysis. (Take several CSV files as input, and output new CSV files that answer business questions based on the inputs.) I'm seriously impressed with Rust's performance here.
If you've made this transition, advice on where to begin?
Question for #datascience and #dataanalytics folks of Mastodon - how do you deal with time-series data in #Python#Pandas and what would you prefer to use instead?
I‘m starting to get fed up with how half-baked the implementation is and it‘s feeling like time drain
Yesterday, Amazon released a new open-source project, Chronos - a family of pre-trained time series forecasting models based on language model architectures.