Resources
That's Fresh! Newsletter
Read a selection of our past issues.
- ๐ NumPy 2.0 is almost out!And: Our new data preprocessor with Polars | Interview with S2E at Italy Insurance ForumJune 5, 2024
- ๐ฎ What a month for new LLMs!And: Datacamp webinar with ShaliniMay 22, 2024
- โจ GenAI true value lies beyond operational enhancementsAnd: The Future of Data Protection | New updates about AI ActApril 24, 2024
- ๐ What are 1-bit Large Language Models?And: Linkedin Live about AI Act | Mastercard's Country Manager interviewed our CEOMarch 6, 2024
- LLaMAntino - Effective Text Generation in ItalianAnd: Creating train and test datasets | Use case: Detecting money muling with the help of synthetic dataFebruary 21, 2024
- ๐๏ธ The NY Times sues OpenAI and MicrosoftAnd: Can AI work with little data? | La Stampa: AI means developmentJanuary 10, 2024
- Synthetic Data 101 ๐จAnd: Why synthetic data? | New project with Poste ItalianeNovember 8, 2023
- How easy is it for LLM to infer sensitive information?And: Why is data sharing important? | Our new partnership with S2EOctober 25, 2023
- Have you heard of Pythia?And: Data augmentation tutorial | Did you say AI apocalypse?August 30, 2023
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartnerโs Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
FROM THE AI WORLD
As 2023 wrapped up, The New York Times lit a firework in the AI world with an important lawsuit against OpenAI and Microsoft. They're accusing OpenAI of using their content without permission to train AI models, and they're not just demanding monetary compensation: they want the AI models and data gone for good. This is part of a growing trend where content creators challenge AI companies in court.
The case is massive because The New York Times has a ton of content supposedly used to train GPT-4. They've even shown examples where ChatGPT spits stuff almost identical to their articles. In the grand scheme, this lawsuit could really shake things up. It's all pretty uncertain, with laws and court decisions trying to catch up with the fast-moving tech. For AI companies, figuring out what's safe to use for training will be a very delicate matter.
Whatever happens with this case will have significant ripple effects โ it could define who gets to train AI and how much it'll cost.
The NY Times vs Open AI
The conflict is intensifying and it seeks the destruction of all Open AI's LLM models and training data, as well as a halt to unlicensed training on The Time's articles.
CLEARBOX AI
La Stampa: AI means development
CEO Shalini brings Clearbox's journey to one of the most important Italian newspapers, highlighting tech and AI potential in the country. (Lang: ๐ฎ๐น)
BLOG POST
Can AI work with little data?
In this battling over data for AI, the question if this technology can work with scarce data seems reasonable. Surprisingly, the answer is yes.