Resources
That's Fresh! Newsletter
Read a selection of our past issues.
- 🙌 NumPy 2.0 is almost out!And: Our new data preprocessor with Polars | Interview with S2E at Italy Insurance ForumJune 5, 2024
- 😮 What a month for new LLMs!And: Datacamp webinar with ShaliniMay 22, 2024
- ✨ GenAI true value lies beyond operational enhancementsAnd: The Future of Data Protection | New updates about AI ActApril 24, 2024
- 👁 What are 1-bit Large Language Models?And: Linkedin Live about AI Act | Mastercard's Country Manager interviewed our CEOMarch 6, 2024
- LLaMAntino - Effective Text Generation in ItalianAnd: Creating train and test datasets | Use case: Detecting money muling with the help of synthetic dataFebruary 21, 2024
- 🗞️ The NY Times sues OpenAI and MicrosoftAnd: Can AI work with little data? | La Stampa: AI means developmentJanuary 10, 2024
- Synthetic Data 101 🚨And: Why synthetic data? | New project with Poste ItalianeNovember 8, 2023
- How easy is it for LLM to infer sensitive information?And: Why is data sharing important? | Our new partnership with S2EOctober 25, 2023
- Have you heard of Pythia?And: Data augmentation tutorial | Did you say AI apocalypse?August 30, 2023
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartner’s Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
FROM THE AI WORLD
This week's recommendation is a review work published by the van der Schaar Lab. It discusses the growing interest in the machine learning (ML) community for synthetic data, which extends beyond its initial use for privacy preservation.
The paper discusses different applications of synthetic data as a means to tailor datasets to individual needs, offering potential benefits like fairness, data augmentation, and simulation.
The work also discusses the fundamental challenges before it can be widely adopted, particularly in quantifying the trust in findings derived from synthetic data. These include developing better metrics for quality and privacy, understanding the suitability of different generative models, handling outliers and underrepresented groups, comprehending the influence of synthetic data on downstream tasks, establishing verification protocols, and deciding on publishing and access methodologies for synthetic data.
The paper advocates for a paradigm shift in ML towards using customizable, realistic synthetic data. Still, it emphasizes the need for urgent solutions to ensure the trustworthiness and broader application of synthetic data.
Reading this review, we saw ourselves as well, as it touches upon many of the points we had to deal with as a startup operating in the domain!
Navigating the synthetic data sea
This technology is not only about privacy. Its potential reaches fair data, simulation, data augmentation and more. Want to know more?
CLEARBOX AI
New project with Poste Italiane
We are proud to start this project with Poste Italiane in the context of OPEN ITALY program. The initiative aims to test the new Synthetic Data paradigm.
BLOG POST
Why synthetic data?
Speaking of AI generated data, many times we are asked how this kind of data can be used in practice and how it improves real data. We answered in this video.