dataDataGoose Issue #1
This is dataDataGoose, a weekly newsletter for Data Science students and hobbyists.
I’m Josh Caulfield, an InfoSec professional. I decided to pursue a part-time bachelor's degree in data science after realising its potential to be the key skill that shapes the upcoming century. Join me on this learning journey — each week I’ll post a new update here (and to your inbox) sharing everything new I’ve learned and read that week.
This week in Data Science
OpenAI, Google, Microsoft and Facebook have been dominating the AI arms race in recent weeks - with each org releasing or announcing new tools, toys and APIs in recent weeks and months. One notable competitor in the advanced AI space, with far less financial might, has been MIDJOURNEY.
For the uninitiated, Midjourney is a small collective of immensely talented researchers building image generation AI, that has succeeded in generating near-photorealistic images that can deceive even the most astute of audiences.
One example of just how difficult to discern these generated images can be is the above depiction of Pope Francis dripped out in high-fashion streetwear and jewellery, which sent Twitter users into a frenzy before it was announced widely the image wasn’t real. This confusion marked a key point in AI progress, where even sceptical audiences familiar with AI generated images were duped en masse.
Following this episode, Midjourney chose to remove the ability to generate images for free, requiring all users who want to generate images to be registered and paying members of their community. Read more at FORBES.
Learn more about Midjourney HERE - I especially encourage browsing their showcase section, which highlights some of their recent and highest rated generations, demonstrating the power of this AI tool.
Data Science 101
Often, when dealing with real world data, values fail to align perfectly with a normal distribution. When outliers are present in datasets, the shape of its distribution can distort and skew in the direction of the outliers. Learn about data skewness in this short primer:
Courses & Projects
Just over two weeks remain on the HUMBLE BUNDLE “Pocket Guides 2023“ bundle, boasting sixteen quick-reference guides from O’Reilly Publishing. The books in this bundle are convenient resources to search when you need a quick refresher when writing some code or queries. Many of the titles included here are applicable to data science.
This week’s free resource serves great as either an intro or refresher of common algorithms in python, with a free 5 hour course produced by Real Tough Candy. Understanding common algorithms and being able to implement them is a vital kill in any data scientists repertoire. check out the course on the freeCodeCamp YOUTUBE channel.
Need some dummy data as a placeholder or for a personal project? Checkout MOCKAROO, a quick and easy way to throw together dummy data inline with your projects requirements and schema.
This weeks Casual Corner suggestion serves both as superbly informative and engaging entertainment in its own right, but also an A* example in video data visualisation. YouTube channel POLYMATTER produces regular broadcast quality mid-length video essays breaking-down and explaining novel and often lesser-covered topics, typically covering tech and asian politics.
The dataDataGoose Community
I’d be foolish to pretend this inaugural issue of dataDataGoose is being sent to a vast community with its non-existent subscriber base so far, but to the few who may encounter this post and are intrigued by the format, I ask: What would you like to see in future issues of dataDataGoose?
Whether you’re interested in data science professionally or as a hobby, I’d love to hear some of your thoughts on what makes the field so interesting to you.