Newsletters From:
Data Science Newsletter | Data Elixir
Data Elixir is a data science newsletter with top picks from around the web each week. Covering machine learning, analytics and data visualization.
Data Elixir - Issue 493
07-16-2024
ISSUE 493 · July 16, 2024InterviewsMustafa Suleyman on Defining IntelligenceIn 2010, Mustafa Suleyman co-founded and led the application of AI at DeepMind. Later, he became the CEO and founder of Inflection AI, a venture partner at Greylock in 2022, and today, he's leading Microsoft's AI efforts. This is a great discussion about the current state of AI, LLMs, and where things are going. PRESENTED BY Statsig Statsig’s Experimentation Week: We’re Shipping a New Feature Every Day.Statsig is the platform trusted by OpenAI, Anthropic, Notion and Brex to ship, test, and analyze every feature they launch. Automate experiment analysis, rollout features, and more. We’re always building, and this week, we’re taking it to another level. It’s experimentation week: every day we’re launching a new feature to make experimentation easier and more reliable. Tuesday is Stratified Sampling — follow along on LinkedIn to see the feature and what we’ll launch tomorrow. Posts & TutorialsStarting an Analytics Org From ScratchDoorDash VP Jessica Lachs has grown the company’s analytics team from a band of scrappy generalists to a highly specialized 300-person organization. In this post, she spells out how to lay the groundwork for an analytics team and how to pick the right first hires. ML Code ChallengesThese code challenges are a great way to learn about machine learning. For each challenge, you can choose to read through a short tutorial on how to answer it, submit a solution and have the solution verified. Use-cases for inverted PCAA lot of people know of PCA for it's ability to reduce the dimensionality of a dataset. It can turn a wide dataset into a thin one, with only a limited amount of information. But what about doing it the other way around as well? Can you turn the thin representation into a wide one again? And if so, what might be a use-case for that? Annotated area charts with plotnineThe plotnine visualization library is based on ggplot2 and brings the Grammar of Graphics to Python. It's a powerful approach that gives you the ability to compose plots by mapping variables in a dataframe to the visual objects that make up the plot. This step-by-step tutorial uses a simple, annotated area chart to show how to use plotnine. ⚽ Euro 2024 DataTo help support the development of the next generation of sports analysts, StatsBomb just released all of its tactical event data from all 51 matches of the recent Euro 2024 tournament. This post describes what the data is, how to access it, and code to make it manageable. If you're interested in sports analytics, this is a goldmine of data Tools & Code
PapersEncoding Spreadsheets for Large Language ModelsSpreadsheetLLM is designed to give LLMs the ability to process, analyze, and understand spreadsheets. The research includes an encoding/compression framework to help work within token constraints as well as a "Chain of Spreadsheet" framework for spreadsheet understanding and validation. This line of research unlocks a lot of possibilities. Last Issue's Top Links
|
from Data Science Newsletter | Data Elixir on 07-09-2024
Data Elixir - Issue 492
How Amazon wins with data. Boosting vs. semi-supervised learning. Science and LLMs. Sparse arrays. Practical intro to ML.
Read Morefrom Data Science Newsletter | Data Elixir on 06-25-2024
Data Elixir - Issue 491
Working with multi-TB data. Manager blind spots. Mapping data to a sphere. How to model age. Analyzing NBA player performance.
Read More07-16-2024
Data Elixir - Issue 493
ML code challenges. Euro 2024 ⚽ tactical events data. Starting an analytics org from scratch. SpreadsheetLLM. Grammar of Graphics for Python.
Read More