#DataAnalysis - kbin.social

dredmorbius, 11 months ago to random
Hacker News front-page analytics

A question about what states were most-frequently represented on the HN homepage had me do some quick querying via Hacker News's Algolia search ... which is NOT limited to the front page. Those results were ... surprising (Maine and Iowa outstrip the more probable results of California and, say, New York). Results are further confounded by other factors.

Thread: https://news.ycombinator.com/item?id=36076870

HN provides an interface to historical front-page stories (https://news.ycombinator.com/front), and that can be crawled by providing a list of corresponding date specifications, e.g.:
https://news.ycombinator.com/front?day=2023-05-25 
Easy enough.

So I'm crawling that and compiling a local archive. Rate-limiting and other factors mean that's only about halfway complete, and a full pull will take another day or so.

But I'll be able to look at story titles, sites, submitters, time-based patterns (day of week, day of month, month of year, yearly variations), and other patterns. There's also looking at mean points and comments by various dimensions.

Among surprises are that as of January 2015, among the highest consistently-voted sites is The Guardian. I'd thought HN leaned consistently less liberal.

The full archive will probably be < 1 GB (raw HTML), currently 123 MB on disk.

Contents are the 30 top-voted stories for each day since 20 February 2007.

If anyone has suggestions for other questions to ask of this, fire away.

And, as of early 2015, top state mentions are:
 1. new york: 150 2. california: 101 3. texas: 39 4. washington: 38 5. colorado: 15 6. florida: 10 7. georgia: 10 8. kansas: 10 9. north carolina: 9 10. oregon: 9 
NY is highly overrepresented (NY Times, NY Post, NY City), likewise Washington (Post, Times, DC). Adding in "Silicon Valley" and a few other toponyms boosts California's score markedly. I've also got some city-based analytics.

#hn #hackernews #data #DataAnalysis #WebCrawling
reply

expand (39)

collapse (39)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ denspier

elduvelle, 7 months ago (edited 7 months ago) to python

As a #Neuroscientist (broadly-speaking), which #Coding language do you prefer for your data processing and data analysis?

I’m particularly interested in understanding why so many people seem to use R these days - comments welcome!

#Matlab #Python #R #DataAnalysis #Programming

reply

expand (20)

collapse (20)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ elduvelle

KathyReid, 8 months ago to TwitterMigration

Good morning everyone! Here's my latest #Connections #Introduction #Introductions #TwitterMigration post, where I curate interesting accounts for you to follow from across the #Fediverse :fediverse:

@maryrobinette is a #writer #author, and I am listening to her incredible #LadyAstronaut series at the moment. If you love #SciFi (esp hard scifi) you should read it, too! 🇺🇸

@sayashk is a #ComputerScience #PhD candidate at #Princeton, who is researching failures in #ML (he's also co-running a workshop on open #FoundationModels in about 15 hours, see my previous posts for more info) 🇺🇸

@michcampbell is Dr Micha Campbell and she is a #PalaeoClimate #PostDoc living on #Dharawal country 🇦🇺

@mthv is a #Research #Engineer who works in #GIS at #CNRS 🇫🇷

@astrolori is Lori and she is into #OpenSource, #fashion, #space and #tech #WomenInSTEM 🇨🇦

@pandas_dev is the official account for #pandas, the #Python #DataAnalysis tool 🐍 📊

@jessie is a lover of #languages and helps run #CommonVoice, @mozilla 's open #voice #data set, which now supports over 100 languages. She also teaches #WebDev and loves #hiking. She's awesome you should follow her 🇬🇧

That's all for now, please do share your own lists so we can create deeper connections, and a tightly-connected community here

I'm reminded here of @maryrobinette's short story - "Red Rockets" - "She built something better than fireworks. She built community."

reply

expand (12)

collapse (12)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat, dgar

cassidy, 5 months ago to opensource

Question: someone I know is doing a data science project for university, and needs to scrape some tabular data from a web site to perform analysis on as an assignment.

Is there anything open source or GNOME-related that is publicly listed as tabular data somewhere that could be interesting for them to analyze? Ideally something with at least 100 data points and multiple columns per data point, if that makes sense.

#DataAnalysis #OpenSource #GNOME #Linux

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

elduvelle, 8 months ago (edited 8 months ago) to python

Edit: already got an answer! Thank you so much @chrisXrodgers and @emdupre ❤️

Two #Coding questions, from restarting #Python after doing mostly Matlab for a while.

I really liked Tables in Matlab - what’s the best (fastest, simplest) equivalent of it in Python nowadays? #Pandas?

with Matlab you can use ‘webread’ to one-line load the contents of a public google spreadsheet, as a table - very cool! What’s the simplest equivalent in Python?

#Programming #DataAnalysis 🙏

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 1 month ago to python

Feeling stuck with Excel for data analysis?

My new book which was co-authored by David Kun, Extending Excel with Python & R, shows you how to leverage the strengths of BOTH worlds!

Here's what you'll gain:
🧐 * Advanced data manipulation & cleaning
💻 * Powerful statistical analysis & modeling
📉 * Eye-catching data visualizations
🌟 * Seamless integration back to Excel

The release date is April 30th!

#dataanalysis #python #R #excel #datavisualization #Books

Link: https://packt.link/oTyZJ

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

purplepadma, 10 months ago to random

Morning, work today but that’s all good. Back to my #DataAnalysis! I slept better but had some ker-AAAA-zee dreams. How did you sleep? What plans do you have?

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 25 days ago to python

Feeling stuck with Excel for data analysis?

My new book which was co-authored by David Kun, Extending Excel with Python & R, shows you how to leverage the strengths of BOTH worlds!

Here's what you'll gain:
🧐 Advanced data manipulation & cleaning
💻 Powerful statistical analysis & modeling
📉 Eye-catching data visualizations
🌟 Seamless integration back to Excel

Get your copy today! https://packt.link/oTyZJ

#dataanalysis #python #R #excel #data #visualization #booklaunch #Coding #Programming

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 10 months ago to random

I encourage you to roll up your sleeves and give it a try yourself. 💪🔍

Read the full blog post and start your exploration. Let's dive in and level up your data analysis game! 🚀📊

https://www.spsanderson.com/steveondata/posts/2023-07-17/

#Rprogramming #DataAnalysis #DuplicatesDetection #dplyr #BaseR #DataManipulation

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LabPlot, 5 months ago to datascience

What's the value of statistical life (VSL)?

@dataisbeautiful
LabPlot ❤️ Data

➡️ https://en.wikipedia.org/wiki/Value_of_life

#DataAnalysis #DataScience #Data #DataViz #Visualization #Plotting #Statistics #Life #Risk #Safety #Security
#USA #USDA #FOSS #OpenSource #FLOSS #VSL

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 2 months ago to python

Feeling stuck with Excel for data analysis? You're not alone! Excel is fantastic, but for truly powerful insights and visualizations, it can fall short.

Here's what you'll gain:
🧐 * Advanced data manipulation & cleaning
💻 * Powerful statistical analysis & modeling
📉 * Eye-catching data visualizations
🌟 * Seamless integration back to Excel

Reserve your copy today: https://www.amazon.com/dp/1804610690/ref=tsm_1_fb_lk

#dataanalysis #python #R #excel #datavisualization #booklaunch

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

LabPlot, 5 months ago to datascience

Using Zipf's Law to detect outliers in median age of European Countries in #LabPlot (2023 est.)

@dataisbeautiful

LabPlot ❤️ Data

➡️ https://en.wikipedia.org/wiki/Zipf%27s_law

#DataAnalysis #DataScience #Data #DataViz #Visualization #Plotting #Statistics #Age #Europe #FOSS #OpenSource

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

victorp, 9 months ago to python

5 Latest Tools You Should Be Using With Python for Data Science.
🗂️ The article provides insightful details on tools like ConnectorX, DuckDB, Optimus, Polars, and Snakemake which could enhance data wrangling, querying, manipulation, and workflow automation capabilities.

🧰 ConnectorX: Simplifying the Loading of Data

🧰 DuckDB: Empowering Analytical Query Workloads

🧰 Optimus: Streamlining Data Manipulation

🧰 Polars: Accelerating DataFrames

🧰 Snakemake: Automating Data Science Workflows

https://www.makeuseof.com/latest-python-data-science-tools/

#Python #DataScience #ConnectorX #DuckDB #Optimus #Polars #Snakemake #Programming #DataAnalysis #Productivity

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

eric_ma, 10 months ago to datascience

Looking for a recommendation(website,Substack, any other material...) where I can improve my SQL knowledge. I am looking for something that I can read(theory) and practice(exercisea). I really enjoy learning python in Substack but until now I have not found something similar for SQL.

Any advise or recommendation?

#SQL #Dataanalysis #Substack #datascience #databases

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

nicolaromano, 7 months ago to python

#random thought of the day.

How much #bias is generated in #dataAnalysis because a lot of people tend to use 42 as the random seed in their #python scripts?

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mehrad

stevensanderson, 9 months ago to datascience

🚀 Unleash the Power of R Functions: get(), get0(), dynGet(), and mget()!

#RProgramming #DataScience #DataAnalysis #RFunctions #CodeLikeAPro

Post https://www.spsanderson.com/steveondata/posts/2023-08-01/

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 10 months ago to statistics

🔬📊 Mastering Data Grouping with R's ave() Function 📊🔬

Are you tired of manually calculating statistics for different groups in your data analysis projects? Look no further! R's ave() function is here to revolutionize your data grouping experience. 🚀

Post: https://www.spsanderson.com/steveondata/posts/2023-06-27/

#Rprogramming #DataAnalysis #DataGrouping #Statistics #Efficiency #r #rstats #grouped #data #factors #stats #groupedstats #technology #innovation #opensource #opensourcesoftware #opensourcecommunity

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

purplepadma, 9 months ago to random

Morning! I’ve been into town to get fresh bread, had breakfast and I’m ready for work. All four of us are WFH today, it’s going to be hard to keep out of each other’s way. More #DataAnalysis for me, I think I’ll work on category of offence and how that correlates with participants’ scoring of satisfaction with different areas of their life #criminology. Oh wait, Miss Cinnamon has just arrived and says that we must have cuddle first :blobcatreach: Have a great day everyone!

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

rempsyc, 1 month ago to rstats

New publication in @psychonomic_soc Behavior Research Methods!

We dive deep into simplifying outlier detection in R using #easystats to follow good practices and make your data analysis more robust and replicable. Check it out! #Rstats #DataAnalysis @rstats

https://doi.org/10.3758/s13428-024-02356-w

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 11 months ago to opensource

📊🔬 Exciting news! Learn bootstrap resampling in R with lapply, rep, and sample functions. Estimate uncertainty, analyze data variability, and unlock insights. #DataAnalysis #R #RStats #OpenSource #RProgramming 🎉💻

Post: https://www.spsanderson.com/steveondata/posts/2023-06-23/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 10 months ago to opensource

file_path <- "data.csv"
if (file.exists(file_path)) {
print("The file exists!")
} else {
print("The file does not exist.")
}

In this example, we check if the file named "data.csv" exists. Depending on the outcome, it will print either "The file exists!" or "The file does not exist."

Post: https://www.spsanderson.com/steveondata/posts/2023-07-13/

#Rprogramming #DataAnalysis #ProductivityBoost #R #rstats #opensource #opensourcesoftware #opensourcecommunity #technology #innovation

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 2 months ago to datascience

Learn how to set a data frame column as the index for faster data access and streamlined operations.

In R, utilize the setDT() function from #datatable or column_to_rownames() from #tibble to seamlessly set your desired column as the index. Try it out with your datasets and experience the boost in productivity!

#DataAnalysis #RProgramming #Efficiency #DataScience #R #RStats 🚀📊

Post: https://www.spsanderson.com/steveondata/posts/2024-02-29/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 9 months ago to random

Imagine you have a bunch of data points and you want to know how many belong to different categories. This is where grouped counting comes in. We've got three fantastic methods for you to explore, each with its own flair: aggregate(), dplyr, and data.table.

Happy counting, fellow data explorer! 🎉🔍 #DataAnalysis #RProgramming #ExploreData #dplyr #aggregate #baser #r #rstats #datatable

Post: https://www.spsanderson.com/steveondata/posts/2023-08-10/

image/png
image/png

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 29 days ago to datascience

Discover efficient string splitting in R using strsplit()!

Learn practical examples and unleash the power of regular expressions.

Enhance your data cleaning skills and level up your R programming.

Experiment with strsplit() today!

Post: https://www.spsanderson.com/steveondata/posts/2024-04-26/

#DataAnalysis #DataScience #RProgramming #R #RStats #Programming #Coding

image/png

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 7 months ago to datascience

I'll give you a quick rundown on creating horizontal boxplots in R using both base R and ggplot2. We'll work with the "palmerpenguins" dataset to keep things interesting!

🚀 Base R Approach (Simple and Quick)

🚀 ggplot2 Approach (More Customization)

Both methods have their advantages.

So, why not give it a try yourself?

#DataVisualization #DataAnalysis #RProgramming #DataScience #LinkedInLearning #R #rstats #visualization #boxplot #ggplot2

Post: https://www.spsanderson.com/steveondata/posts/2023-10-02/

image/png
image/png

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...