Skip to main content

· 3 min read
Paweł Jankiewicz

Data Scientist stuck behind relationship schema

In the fast-paced world of data science, time series data is a treasure trove of insights waiting to be unearthed. Traditionally, handling this form of data often meant wrestling with tables and engaging in cumbersome transformations. But what if you could escape this paradigm and embrace a more dynamic and intuitive way? Enter FeatureExpress.

The Trouble with Tables

Keeping time series data in tables creates a significant barrier to innovation. Whenever we want to calculate features on specific observation dates (the dates for which we are calculating the features), we find ourselves tangled in a web of transformations. Though this may be manageable with regular time series data, such as stock prices, it becomes a nightmare when dealing with irregular events, like customer transactions.

The Regular vs. Irregular Time Series Battle

If your time series is predictable like a clock, adding window features and other transformations may not be too much of a challenge. However, life isn't always that simple. Many phenomena, such as customer transactions, follow no fixed pattern. Calculating reasonable features from these irregular events becomes a complex task, often resulting in imprecise or even misleading results.

The Power of Events with FeatureExpress

FeatureExpress liberates you from the confines of tables by working with a stream of events instead. These events become the building blocks and source of truth for your features.

Versatility and Simplicity

Unlike table-based approaches that dominate the design of many feature stores, FeatureExpress focuses on events, offering advantages like:

  • Ease of Understanding: Events are intuitive and mirror the real-world flow of information.
  • Flexibility: Events can be added or removed at any point, allowing for dynamic changes and exploration of "what if" scenarios.
  • Efficiency: Built with Rust, FeatureExpress enables high-performance in-memory processing, resulting in fast calculations and low latency.

Unleash Creativity with DSL

Express your feature engineering logic using FeatureExpress's DSL, tailor-made for data scientists. From time-based JOINS to aggregation functions, you have a wide array of tools at your disposal.

Conclusion: Breaking Free from Tables

While it's possible to write similar features with either events or tables, the event-based approach's appeal lies in its simplicity, flexibility, and alignment with real-world dynamics. By adopting FeatureExpress, you can transcend the limitations of table-based feature engineering, taking advantage of a system designed to accommodate complex time-based queries and various value types.

FeatureExpress offers a refreshing perspective on feature engineering, providing an efficient, clear, and robust way to deal with the complexities of time series data. Embrace the future with events, and leave those cumbersome tables behind.

· 3 min read
Paweł Jankiewicz

Hello Data Enthusiasts!

My name is Paweł Jankiewicz, a Kaggle competition grandmaster (https://www.kaggle.com/paweljankiewicz), and a fervent advocate for the power of event data. My passion for customer analytics and recommendation systems has led me down a path where handling event data became a daily affair. The challenges and complexities of modeling time and events have always intrigued me, and Rust - a language I've grown to love - has been my partner in solving performance-intensive calculations.

Today, I'm thrilled to introduce you to my latest creation: FeatureExpress.

What is FeatureExpress?

FeatureExpress is an in-memory feature engineering library that leverages Rust's efficiency and provides an easy-to-use Python interface. This alpha release of FeatureExpress is not perfect and still has some missing pieces, especially around incremental features. However, the decision to release it now is aimed at gauging interest and finding collaborators and investors to help me take this project to the next level.

The Power of Event Data

Event data is the beating heart of any analytical system that deals with time. From capturing user interactions in real-time to processing and making predictions, handling event data correctly is key to unlocking the full potential of any dataset.

Unique Features of FeatureExpress

  • Clear Separation of Time: Avoid subtle data leaks with distinct handling of past and future.
  • Complex Time-Based Joins: Implement JOINS in time effortlessly.
  • High Performance: Written in Rust, expect parallel and speedy materialization of features.
  • Declarative Syntax: Define what you want, not how you want it, using our SQL-like DSL.

What's Missing?

This being an alpha release, there are certain aspects that are still under development, particularly around some computation modes like incremental features. But fear not! The core functionalities are robust, and I believe it's time to share this innovative tool with the community.

Why FeatureExpress?

If you, like me, find joy in taming the intricacies of event data, especially in customer analytics and recommendation systems, FeatureExpress will resonate with you. It embodies my years of experience, struggle, and learning in dealing with time-centric data.

Collaboration and Investment Opportunities

The journey of FeatureExpress is just beginning, and I am actively seeking collaborators who share this vision. Whether you are an aspiring contributor or an investor looking for the next big thing in data science, FeatureExpress offers a unique opportunity.

Conclusion

FeatureExpress is a love letter to event data, Rust, and the pursuit of effective feature engineering. I invite you to explore this alpha release and share your feedback, criticisms, or even a virtual high-five.

With FeatureExpress, we're taking a significant step towards making feature engineering a more expressive and event-driven endeavor. Join me in this exciting journey!

Happy data wrangling!

Paweł Jankiewicz, Kaggle Grandmaster and Creator of FeatureExpress