dredmorbius, to random

Hacker News front-page analytics

A question about what states were most-frequently represented on the HN homepage had me do some quick querying via Hacker News's Algolia search ... which is NOT limited to the front page. Those results were ... surprising (Maine and Iowa outstrip the more probable results of California and, say, New York). Results are further confounded by other factors.

Thread: https://news.ycombinator.com/item?id=36076870

HN provides an interface to historical front-page stories (https://news.ycombinator.com/front), and that can be crawled by providing a list of corresponding date specifications, e.g.:

https://news.ycombinator.com/front?day=2023-05-25<br></br>

Easy enough.

So I'm crawling that and compiling a local archive. Rate-limiting and other factors mean that's only about halfway complete, and a full pull will take another day or so.

But I'll be able to look at story titles, sites, submitters, time-based patterns (day of week, day of month, month of year, yearly variations), and other patterns. There's also looking at mean points and comments by various dimensions.

Among surprises are that as of January 2015, among the highest consistently-voted sites is The Guardian. I'd thought HN leaned consistently less liberal.

The full archive will probably be < 1 GB (raw HTML), currently 123 MB on disk.

Contents are the 30 top-voted stories for each day since 20 February 2007.

If anyone has suggestions for other questions to ask of this, fire away.

And, as of early 2015, top state mentions are:

 1. new york:         150<br></br> 2. california:       101<br></br> 3. texas:             39<br></br> 4. washington:        38<br></br> 5. colorado:          15<br></br> 6. florida:           10<br></br> 7. georgia:           10<br></br> 8. kansas:            10<br></br> 9. north carolina:     9<br></br>10. oregon:             9<br></br>

NY is highly overrepresented (NY Times, NY Post, NY City), likewise Washington (Post, Times, DC). Adding in "Silicon Valley" and a few other toponyms boosts California's score markedly. I've also got some city-based analytics.

elduvelle, (edited ) to python
@elduvelle@neuromatch.social avatar

As a (broadly-speaking), which language do you prefer for your data processing and data analysis?

I’m particularly interested in understanding why so many people seem to use R these days - comments welcome!

#R

KathyReid, to TwitterMigration
@KathyReid@aus.social avatar

Good morning everyone! Here's my latest post, where I curate interesting accounts for you to follow from across the :fediverse:

@maryrobinette is a , and I am listening to her incredible series at the moment. If you love (esp hard scifi) you should read it, too! 🇺🇸

@sayashk is a candidate at , who is researching failures in (he's also co-running a workshop on open in about 15 hours, see my previous posts for more info) 🇺🇸

@michcampbell is Dr Micha Campbell and she is a living on country 🇦🇺

@mthv is a who works in at 🇫🇷

@astrolori is Lori and she is into , , and 🇨🇦

@pandas_dev is the official account for , the tool 🐍 📊

@jessie is a lover of and helps run , @mozilla 's open set, which now supports over 100 languages. She also teaches and loves . She's awesome you should follow her 🇬🇧

That's all for now, please do share your own lists so we can create deeper connections, and a tightly-connected community here

I'm reminded here of @maryrobinette's short story - "Red Rockets" - "She built something better than fireworks. She built community."

cassidy, to opensource
@cassidy@blaede.family avatar

Question: someone I know is doing a data science project for university, and needs to scrape some tabular data from a web site to perform analysis on as an assignment.

Is there anything open source or GNOME-related that is publicly listed as tabular data somewhere that could be interesting for them to analyze? Ideally something with at least 100 data points and multiple columns per data point, if that makes sense.

elduvelle, (edited ) to python
@elduvelle@neuromatch.social avatar

Edit: already got an answer! Thank you so much @chrisXrodgers and @emdupre ❤️

Two questions, from restarting after doing mostly Matlab for a while.

  1. I really liked Tables in Matlab - what’s the best (fastest, simplest) equivalent of it in Python nowadays? ?

  2. with Matlab you can use ‘webread’ to one-line load the contents of a public google spreadsheet, as a table - very cool! What’s the simplest equivalent in Python?

🙏

stevensanderson, to python
@stevensanderson@mstdn.social avatar

Feeling stuck with Excel for data analysis?

My new book which was co-authored by David Kun, Extending Excel with Python & R, shows you how to leverage the strengths of BOTH worlds!

Here's what you'll gain:
🧐 * Advanced data manipulation & cleaning
💻 * Powerful statistical analysis & modeling
📉 * Eye-catching data visualizations
🌟 * Seamless integration back to Excel

The release date is April 30th!

#R

Link: https://packt.link/oTyZJ

purplepadma, to random

Morning, work today but that’s all good. Back to my ! I slept better but had some ker-AAAA-zee dreams. How did you sleep? What plans do you have?

stevensanderson, to python
@stevensanderson@mstdn.social avatar

Feeling stuck with Excel for data analysis?

My new book which was co-authored by David Kun, Extending Excel with Python & R, shows you how to leverage the strengths of BOTH worlds!

Here's what you'll gain:
🧐 Advanced data manipulation & cleaning
💻 Powerful statistical analysis & modeling
📉 Eye-catching data visualizations
🌟 Seamless integration back to Excel

Get your copy today! https://packt.link/oTyZJ

#R

stevensanderson, to random
@stevensanderson@mstdn.social avatar

I encourage you to roll up your sleeves and give it a try yourself. 💪🔍

Read the full blog post and start your exploration. Let's dive in and level up your data analysis game! 🚀📊

https://www.spsanderson.com/steveondata/posts/2023-07-17/

LabPlot, to datascience
@LabPlot@floss.social avatar
stevensanderson, to python
@stevensanderson@mstdn.social avatar

Feeling stuck with Excel for data analysis? You're not alone! Excel is fantastic, but for truly powerful insights and visualizations, it can fall short.

Here's what you'll gain:
🧐 * Advanced data manipulation & cleaning
💻 * Powerful statistical analysis & modeling
📉 * Eye-catching data visualizations
🌟 * Seamless integration back to Excel

Reserve your copy today: https://www.amazon.com/dp/1804610690/ref=tsm_1_fb_lk

#R

LabPlot, to datascience
@LabPlot@floss.social avatar

Using Zipf's Law to detect outliers in median age of European Countries in (2023 est.)

@dataisbeautiful

LabPlot ❤️ Data

➡️ https://en.wikipedia.org/wiki/Zipf%27s_law

victorp, to python

5 Latest Tools You Should Be Using With Python for Data Science.
🗂️ The article provides insightful details on tools like ConnectorX, DuckDB, Optimus, Polars, and Snakemake which could enhance data wrangling, querying, manipulation, and workflow automation capabilities.

  1. 🧰 ConnectorX: Simplifying the Loading of Data
  2. 🧰 DuckDB: Empowering Analytical Query Workloads
  3. 🧰 Optimus: Streamlining Data Manipulation
  4. 🧰 Polars: Accelerating DataFrames
  5. 🧰 Snakemake: Automating Data Science Workflows

https://www.makeuseof.com/latest-python-data-science-tools/

eric_ma, to datascience
@eric_ma@techhub.social avatar

Looking for a recommendation(website,Substack, any other material...) where I can improve my SQL knowledge. I am looking for something that I can read(theory) and practice(exercisea). I really enjoy learning python in Substack but until now I have not found something similar for SQL.

Any advise or recommendation?

nicolaromano, to python
@nicolaromano@qoto.org avatar

#random thought of the day.

How much #bias is generated in #dataAnalysis because a lot of people tend to use 42 as the random seed in their #python scripts?

stevensanderson, to datascience
@stevensanderson@mstdn.social avatar

🚀 Unleash the Power of R Functions: get(), get0(), dynGet(), and mget()!

Post https://www.spsanderson.com/steveondata/posts/2023-08-01/

stevensanderson, to statistics
@stevensanderson@mstdn.social avatar

🔬📊 Mastering Data Grouping with R's ave() Function 📊🔬

Are you tired of manually calculating statistics for different groups in your data analysis projects? Look no further! R's ave() function is here to revolutionize your data grouping experience. 🚀

Post: https://www.spsanderson.com/steveondata/posts/2023-06-27/

#r

purplepadma, to random
@purplepadma@beige.party avatar

Morning! I’ve been into town to get fresh bread, had breakfast and I’m ready for work. All four of us are WFH today, it’s going to be hard to keep out of each other’s way. More for me, I think I’ll work on category of offence and how that correlates with participants’ scoring of satisfaction with different areas of their life . Oh wait, Miss Cinnamon has just arrived and says that we must have cuddle first :blobcatreach: Have a great day everyone!

rempsyc, to rstats
@rempsyc@mastodon.world avatar

New publication in @psychonomic_soc Behavior Research Methods!

We dive deep into simplifying outlier detection in R using to follow good practices and make your data analysis more robust and replicable. Check it out! @rstats

https://doi.org/10.3758/s13428-024-02356-w

stevensanderson, to opensource
@stevensanderson@mstdn.social avatar

📊🔬 Exciting news! Learn bootstrap resampling in R with lapply, rep, and sample functions. Estimate uncertainty, analyze data variability, and unlock insights. #R 🎉💻

Post: https://www.spsanderson.com/steveondata/posts/2023-06-23/

stevensanderson, to opensource
@stevensanderson@mstdn.social avatar

file_path <- "data.csv"
if (file.exists(file_path)) {
print("The file exists!")
} else {
print("The file does not exist.")
}

In this example, we check if the file named "data.csv" exists. Depending on the outcome, it will print either "The file exists!" or "The file does not exist."

Post: https://www.spsanderson.com/steveondata/posts/2023-07-13/

#R

stevensanderson, to datascience
@stevensanderson@mstdn.social avatar

Learn how to set a data frame column as the index for faster data access and streamlined operations.

In R, utilize the setDT() function from or column_to_rownames() from to seamlessly set your desired column as the index. Try it out with your datasets and experience the boost in productivity!

#R 🚀📊

Post: https://www.spsanderson.com/steveondata/posts/2024-02-29/

stevensanderson, to random
@stevensanderson@mstdn.social avatar

Imagine you have a bunch of data points and you want to know how many belong to different categories. This is where grouped counting comes in. We've got three fantastic methods for you to explore, each with its own flair: aggregate(), dplyr, and data.table.

Happy counting, fellow data explorer! 🎉🔍 #r

Post: https://www.spsanderson.com/steveondata/posts/2023-08-10/

image/png
image/png

stevensanderson, to datascience
@stevensanderson@mstdn.social avatar

Discover efficient string splitting in R using strsplit()!

Learn practical examples and unleash the power of regular expressions.

Enhance your data cleaning skills and level up your R programming.

Experiment with strsplit() today!

Post: https://www.spsanderson.com/steveondata/posts/2024-04-26/

#R

image/png

stevensanderson, to datascience
@stevensanderson@mstdn.social avatar

I'll give you a quick rundown on creating horizontal boxplots in R using both base R and ggplot2. We'll work with the "palmerpenguins" dataset to keep things interesting!

🚀 Base R Approach (Simple and Quick)

🚀 ggplot2 Approach (More Customization)

Both methods have their advantages.

So, why not give it a try yourself?

#R

Post: https://www.spsanderson.com/steveondata/posts/2023-10-02/

image/png
image/png

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • thenastyranch
  • osvaldo12
  • cubers
  • InstantRegret
  • DreamBathrooms
  • cisconetworking
  • magazineikmin
  • Youngstown
  • Durango
  • mdbf
  • slotface
  • rosin
  • provamag3
  • kavyap
  • tacticalgear
  • modclub
  • khanakhh
  • anitta
  • ethstaker
  • tester
  • everett
  • GTA5RPClips
  • normalnudes
  • megavids
  • Leos
  • lostlight
  • All magazines