Rule #12 of #datascience: be your own best competitor. If your ML model is your key differentiator, once you ship it start working on a new and improved version. #kcdc
K-Fold Cross Validation: re-split your training vs. test data a bunch of times (usually 5x) to see whether your model is valid or whether you wound up in one of the naturally-occurring clusters in random data. #datascience#kcdc
Have a favorite data science model, but try that in competition with another model.
Linear equations normally work well because you're either dealing with people or things that depend on people. But also check against a nonlinear model, and pick which one models the data better.
You can't add apples and oranges; match units on your calculations when doing data science. Normalize your units like you're back in grade school. #kcdc#datascience
"on a scale of 1 to 5 how happy are you"...just because you replace "unhappy" with 2 doesn't make it numeric data, any more than replacing it with "potato"
The "six degrees of separation" statement...turns out not to be a myth. Microsoft did a similar study with user accounts/email addresses and the number was 6.2. #kcdc#datascience
For structured data, the schema is important when the data is written. For unstructured data, the schema is important when the data is read. #kcdc#datascience
Kidding aside, every tool has its own #superpowers and #shortcomings, and I know that Power BI can do certain things that Tableau can't do, but I'm also sure that the reverse is true as well.
Erin and I frequently talk about the #ToolsAndTechniques we use at work during our lunches and evening walks, and I'm genuinely looking forward to #learning more about #TheDarkSide from her, as we continue honing each other's minds like iron sharpening iron. ⚔️
Okay, #datascience and #nlp friends. I’m poking around for the “right way” to approach a problem: I want to calculate the overal homogeneity of many short snippets of text (phrases and sentences), and many large spans of text (500-1500 word documents).
Free in London this Saturday afternoon? Want to mosey around the British Academy and hear from a range of excellent speakers on many important subjects of our times?
I'm chairing a session on 'ChatGPT, AI, and the future' in the garden, 2-3pm, with Tim Gordon (Co-founder, Best Practice AI) and Hetan Shah (CEO, British Academy)
'Much consideration has been given to how machine learning is influencing our lives, and what it means for the near future. This panel will consider ChatGPT’s current influence in research, and how it might be a tool for copyright theft, content creation, knowledge-sharing or misinformation.
This panel of experts will critically examine whose voices are being heard in the discourse around AI, what choices we must make about how it is implemented, and which technologies are bringing genuine value to world of education and research.'
Personal: I am very happy to announce that I have accepted a tenure-track Assistant Professor position at the University of Groningen. Looking forward to further collaborations, and am glad to continue working within the Information Systems Group at the Bernoulli Institute.