return to table of content

Apache Superset

noduerme
32 replies
1d2h

So it's irritating to me that this is ranking #1 on HN (why is it, btw?) I just pulled the trigger on a large data gathering project using Metabase, and feel a bit hampered by the limitations in terms of charts and plugins... but I considered Superset first, and after a lot of thought I decided that almost everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time. In fact I wasn't even sure if Superset was still an active project or if it just looked like one, in the way e.g. no one bothered to pull the OpenOffice website offline.

So now that I picked Metabase, Superset is topping HN for no apparent reason. Why?

smaudet
8 replies
1d2h

"semi-abandoned disasterware"

Hmm. I suppose all open source looks that way if it doesn't get regular funding/attention.

Apache does house a lot of abandonware. They had some relevance as recently as 6-7 years ago but they've been largely replaced by nginx I think. That being said, I view them like the local soup-kitchen - important to have and maintain, but not where I want to go for a 5-star meal.

malfist
4 replies
1d2h

The Apache foundation is way larger than just the server

smaudet
3 replies
1d2h

Yes, I agree. However a lot of their forward facing projects seem to be effective abandon-ware (few people interested in contributing, competing more popular solutions based on forks, or just no longer relevant).

These projects don't give the apache foundation an appearance of importance or relevance, rather they make it look rather rundown.

jakjak123
0 replies
18h52m

The Apache Foundation also takes on projects that are literally abandoned. It acts as an umbrella that takes over hosting a project for commercial actors that can no longer develop it, but want to at least give existing users a open source (Apache License) version of the software to continue with/depend on.

KptMarchewa
0 replies
1d

I would consider Airflow, Spark and Flink to be their forward facing projects, and they are all very actively developed.

DaiPlusPlus
0 replies
1d

That's how open-source abandonware is supposed to work though: the idea is that whenever a (for-profit) company produces something that it can't afford to run anymore but also can't afford to shut-down and damage their customer relationships, then they'll open-source the project and give it to an open-source foundation for stewardship and repo hosting. Yes, it's where software goes-to-die-a-long-death, but it also gives some people hope, and the possibility of giving it a new life in future. Currently, the Apache Foundation is the go-to place for that, and it benefits everyone considering the alternatives are worse.

Obivously the main "alternative" is for the original company to simply shut down the product/service, which can do irreperable harm to a company when they have high-profile customers who are utterly dependent on a service.

Another alternative is to use an open-source foundation that's directly managed by the original company, which is what Microsoft did with its DotNet Foundation ( https://dotnetfoundation.org/ ) - and while Microsoft's legal team ensures the foundation is "legally" independent, in practice we know all the significant shots are being called from within Microsoft-proper; but it does give us some modest reassurances that .NET won't suddenly return to being closed-source overnight.

Another alternative is to not open-source it and to instead sell it off to another company that can maintain it while still being profitable - this is what Adobe did with Flash: they sold it all off to Samsung because their Harman division wanted to continue using Flash for embedded/automotive UX work. This approach can work, but doesn't benefit the wider ecosystem the way that open-sourcing does - and something something shareholder value and return-on-investment by selling rather than writing-it-off...

What companies won't do is let any of their engs that are passionate about a project split-off from the company to run and maintain it, le sigh.

nekoashide
1 replies
1d1h

Any time I hear "Apache Foundation" my stomach turns as I hesitate to ask my next question. "What we are trying to use from them is built on Java right"

stuff4ben
0 replies
23h46m

That would be anything hosted by the Eclipse Foundation. Either Java-based or abandonware or sometimes both.

jakjak123
0 replies
1d1h

Apache hosts many, many projects, some good, some bad, some abandoned, some fucking great.

lars_francke
6 replies
1d2h

almost everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time

Can you name a few examples?

lars_francke
1 replies
1d1h

I think it's a great feature to have explicit lifecycle for open source projects.

Lots of other projects just die silently and/or you are unsure of the status.

Here you at least have a chance to revive them if you like as there is always an overarching organisation.

bombcar
0 replies
1d

The problem really is that some Apache projects are actually alive (Apache itself, apparently Superset, Groovy, etc) and some appear alive at first glance.

More things should move into the Attic, like OpenOffice.

noduerme
1 replies
1d1h

Well, OpenOffice as I said. Cordova is/was a hot mess (with some nice pioneering features, just really not well maintained imo and felt like quicksand to build even a small app on) Then the sort of long slow death of Flex (now Royale?) Apache seems like where software no one loves anymore goes to die.

rpeden
0 replies
1d1h

I suppose it depends on projects you're using. For many developers their primary exposure to the Apache Foundation is through projects like Maven and Kafka, and those certainly don't feel dead.

smaudet
0 replies
1d1h

Ivy, Netbeans, Open Office, Shiro, Solr all jump at me off this list:

https://projects.apache.org/projects.html?name

These are all projects that once were (more) relevant, however seem to have become rather niche (Gradle, Jetbrains/VSCode, GoogleDocs/Libreoffice e.g. for the first three are the dominant competitors).

Most of these projects (like the massive commons listings) are either used by some Java library somewhere (meaning their success/relevance is tied to the usage of Java), or are obscure enough that they are no longer used widely and so suffer from lack of interest.

There are gems in this list, to be sure, but if you just run into half-maintained projects all the time you're not likely to associate good things with the Apache name?

beastman82
4 replies
1d1h

topping HN for no apparent reason

I think the HN algo is pretty easily manipulated. I worked at a startup that had an effective process to get things to the front page

CoastalCoder
3 replies
1d1h

I think the HN algo is pretty easily manipulated. I worked at a startup that had an effective process to get things to the front page

That sounds (potentially) sleazy. If you think it's a technique that HN could potentially defend against, I encourage you to explain it to hn@ycombinator.com.

noduerme
1 replies
1d1h

Maybe it's a YC startup.

ambigious7777
0 replies
22h10m

AFAIK YC startups don't get any more boost on the front page than normal posts.

ativzzz
0 replies
22h26m

That sounds (potentially) sleazy.

Pretty sure it's as simple as posting in your general slack channel "@here we posted a new article to HN, go upvote and write a comment"

smallmancontrov
3 replies
1d2h

Because we (the FBI Surveillance Van) saw that you picked Metabase, called our shady French-accented overlord, and he told us to dump it.

noduerme
0 replies
1d1h

I knew it!!

CoastalCoder
0 replies
1d1h

I thought your outrageous French accent just meant you're going to taunt him a second time.

tomnipotent
0 replies
1d

I used Metabase at my last gig (CTO @ e-commerce, 30+ users) and it was well-received and dare I say even a bit adored. It was the only self-hosted tool I'd receive after-hours text messages about going down that someone urgently needed back up for some task due tomorrow.

Business users loved the self-serve query builder, and it wasn't uncommon to walk around the office and see Metabase up on someones screen. My CEO absolutely loved it, and used it daily including to put together data for board decks.

None of my users cared about visualizations, and lived in tabular data. This included finance, marketing, merchandising, operations, and executives (CEO/COO/CFO). The only people that lamented the limited visualization were analysts. Power users did all their day-to-day work in Excel or other tools anyway, such as managing marketing spend or inventory allocations.

Metabase was great for dashboards and self-service (ad-hoc). 10/10 would deploy again.

swalsh
0 replies
1d2h

Yeah, I'm in a similar thought process. I've been burned multiple times by Apache, will not touch ever again.

skadamat
0 replies
1d1h

Apache Airflow, Kafka, Spark, ECharts, and many others are still going strong! It really depends on the project to be honest.

rickspencer3
0 replies
1d1h

I think that there is an active company behind Superset called Preset.

https://preset.io/

I don't think it's semi-abandoned. I had a brief interaction with the project in my previous job, and I found the community and the company to be reasonably engaged and responsive.

renewiltord
0 replies
23h53m

Apache Software Foundation is just an umbrella organization to keep things on life support till someone can apply sufficient motive force to resurrect. I think that's really valuable. Lots of projects there have had that effort applied to them and kept going.

jakjak123
0 replies
1d1h

I have the opposite experience. Lots of good stuff is hosted by Apache Foundation, such as Kafka, Maven, Cassandra, Camel, the Tika project, Superset, Solr, but I will admit they had more relevance 10 years ago. And I dont think there are many organizations that keep open source projects alive longer than the Apache projects.

hasty_pudding
0 replies
1d2h

Everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time.

Amen brother.

marcinzm
18 replies
1d3h

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

I tried Superset a few years back, and maybe it's changed since then, but intuitive is about the last thing I'd use to describe it. Things which I could figure out in a few minutes on any other BI tool literally took me hours of searching. It didn't help that they decided to rename core concepts at some point so half the online documentation made no sense anymore. Others at those companies who tried it at the time said similar things.

fayten
7 replies
1d2h

I also found Superset unintuitive to use and setup as well. I settled on standing up Metabase because it was so simple to get started with trying it since it can be launched as a single jar. The business users loved it and so did I and administration with a Postgres backend instead of the internal h2 database was a breeze.

xtracto
1 replies
17h23m

I had the same experience. Featurewise Superset looked better, but after wasting a couple of hours trying to install it, I just gave up.

Instead I installed Metabase in 5 minutes tops: spin ec2 instance, whether and java -jar . I've never looked back.

The only thing that turns me off I'd that it's implemented in an obscure language. At one time I wanted to add some custom postprocessing to an api (given an sql query, get some python/pandas postproc command from a sql comment and execute it in the returned table), but the used language is just not for me (some lisp dialect)

NoThisIsMe
0 replies
13h26m

Clojure is not particularly obscure

datadrivenangel
1 replies
1d2h

Metabase is great. It truly is a BI tool. Superset is more of a visualization platform, which works great if you have engineers building reports. Less good if you expect more junior analysts to be super productive.

sumoboy
0 replies
1d2h

We ran into the same exact issue with Superset not being intuitive, just for a different audience that is more technical. Also went with Metabase which is good, easy to use, lacks some a few chart types but overall the past year has seen quite a few changes and bug fixes consistently happening.

hodgesrm
0 replies
14h21m

My experience with Superset was the opposite. It's easy to install using containers. You can have it up and running and connected to ClickHouse in a few minutes. I also found the internal design pretty intuitive--the SQL query lab is much easier than Grafana's editor.

I like Grafana too, but there's basically no isolation between your query and the SQL database at least in the Altinity Grafana plugin for ClickHouse which is the main one I use.

c0brac0bra
0 replies
21h12m

Yep, we've really liked Metabase for embedding in our platform.

staticautomatic
3 replies
1d1h

Let’s be honest, intuitive is the last word we’d use to describe most Apache projects.

pseudosavant
2 replies
22h0m

They are doing pretty well that it is even clear what the project is really even about. Good luck figuring that out within 30 seconds of hitting the average Apache project homepage.

slyall
1 replies
20h36m

There seem to be dozens of Apache "Big Data" projects that all look kinda the same unless you are a Big Data person.

bushbaba
1 replies
1d2h

It’s more intuitive than the open source alternatives but is not as intuitive as tableau and others.

marcinzm
0 replies
23h15m

Metabase is more intuitive. Also, being unintuitive isn't great but not the worst thing. A project not even realizing that (and thinking the exact opposite) is much much worse. Unintuitive can be fixed with PRs over time. Delusional project leadership cannot.

mritchie712
0 replies
19h3m

Had a similar experience with Superset. A few others have mentioned Metabase and I agree it's better, but if you're looking for a different approach to data, check out Definite (https://www.definite.app/). It's a "data stack in a box". A few things we're doing differently:

1. Built-in data warehouse - We spin up a duckdb database for you to load data to

2. 500+ connectors - You don't need to buy a separate ETL and you can pull in all your data (e.g. Postgres, Stripe, HubSpot, Zendesk, etc.) automatically

3. Semantic layer - Define dimensions, measures, and joins in one place. We have pre-built models for all the sources we support (e.g. the Stripe model already has measures for MRR, churn, etc.)

4. Simple BI - Build a table with the data you want and generate visuals off that table

I'm mike@definite.app if you have any questions.

codeduck
0 replies
1d2h

I've just been playing with superset. I'd have to agree. Things which are easy in SQL are... disturbingly hard or nonobvious in superset.

And the documentation is sparse at best.

atombender
0 replies
23h35m

Are there better alternatives?

FridgeSeal
0 replies
21h57m

It wasn’t fast either when I used it.

What it was though, was riddled with dozens of Python runtime errors and innumerable glitches.

Metabase is where it’s at.

adeptima
18 replies
1d2h

Had a very good experience with Superset.

Superset allowed us to replace Tableau and not looking back

Took me a while figure out how to embed it into my app using Superset Embedded SDK.

Superset Embedded SDK - "Embedded SDK allows you to embed dashboards from Superset into your own app, using your app's authentication. Embedding is done by inserting an iframe, containing a Superset page, into the host application."

https://github.com/apache/superset/tree/master/superset-embe...

Superset is based on very high quality and well maintained chart library eChart

https://echarts.apache.org/examples/en/#chart-type-linesG

Community Roadmap

https://github.com/apache/superset/projects?query=is%3Aopen

Huge respect to Preset.io and its team for contributing to the project and keep it in a great shape

https://preset.io/blog/

Superset source code is very easy to read and understand, and as a result it's possible to implement some advanced caching techniques reduce the load on charts.

No BI is perfect.

Watching Superset for years gives me confidence the project will work as supposed down the road, and eventually some of its packages can be reusable for all kind of visualizations and data hacking.

Our main approach to visualisation is to start with eChart and simple Reactjs wrapping and spin off Superset on subdomain for power users, and later see which one works better. Same look gives a very pleasant experience.

adeptima
2 replies
23h5m

Looks great!

Reminds me Obsidian DataView but with charts https://github.com/blacksmithgu/obsidian-dataview

This whole ideas to have data, visualisations and knowledge base in one private offline place is very appealing

hughess
0 replies
18h42m

We're fans of Obsidian! DataView looks cool - love the ability to define the tables in code inline in the markdown. That's similar to how we inline DuckDB WASM SQL queries in markdown: https://docs.evidence.dev/core-concepts/queries/

archiewood
0 replies
18h25m

I love Obsidian.

The Markdown <-> Markup typing experience is just so good compared to e.g. Slack, Reddit and other markdown-esque tools

meekaaku
1 replies
7h47m

Evidence looks cool, and I evaluated sometime back. The docs says the pages are all pre-rendered for all possible combinations. Is that the case still? If so, if I have a date filter, is it going to pre-render all possible dates?

hughess
0 replies
6h31m

We recently changed our architecture to include interactivity without having to pre-render all combinations. Pages are still pre-rendered with their initial content, but each Evidence app now ships with filter components and an in-browser DuckDB instance so you can build interactive apps. We call this Universal SQL - if you're interested, we wrote up our rationale for doing this here: https://evidence.dev/blog/why-we-built-usql/

Here's an example project with some filter components and custom styling: https://ecommerce.evidence.app/

This is still a static app - the data warehouse was only hit during the app's build process

klaussilveira
3 replies
1d

How do you deal with data visibility and permissions? I mean, most tables have data that should only be seen by a specific user or group ID, and that layer is usually handled by the application. It would be awesome to expose the power of Superset for users, but I imagine creating the security layer would be a pain.

re5i5tor
1 replies
23h46m

I have this question too

spdustin
0 replies
22h7m

You can use row-level security, or specify RBAC with pretty much any SQL query.

boyka
2 replies
1d1h

I have no experience with Superset. Can you elaborate on a few points where you see it excel beyond Tableau?

adeptima
1 replies
1d1h

I dont want to start a rant against Tableau. It's a powerhouse. It's a great superior software. But when it comes to optimizing cost and comparing the total cost of ownership and opportunity to stop paying for Tableau server license we voted in favor of Superset and mix of Reactjs+Echarts widgets.

https://www.tableau.com/products/server

If you have money, dedicated team of data analytics who are already familiar with Tableau - no need to torture them with other tools.

skadamat
0 replies
1d1h

Honestly it's so hard to compare Tableau and Superset. Tableau has every feature and bell / whistle imagine-able. But it's heavy, desktop oriented, and pricey.

Superset is lightweight and open source, but only has 5% of the features. So it really depends what you need!

Jzush
2 replies
1d1h

I’d like to see these types of apps start offering SVG embedding of things like graphs. Frames are such a pain.

wswope
0 replies
17h45m

Bokeh is an option in the frontend-viz space that puts out pretty solid SVG for statically-rendered charts, while also having the option of more Tableau-like interactive functionality with input fields, dynamic filters, etc. Might be a decent option for you?

Their interactive "embedded-mode" avoids iframes too... but it's built with web components, so you wind up in shadow-DOM hell if you want to do anything dynamic on the view's contents.

rusackas
0 replies
19h39m

That's probably not trivial, but it seems plausible. The beauty of open source is that you can help contribute this if you're fired up about it!

j-a-a-p
0 replies
7h58m

Had good results with echarts. With Superset not so much: complicated to install, lost all dashboards after an update, cryptic error messages, custom queries meh: we decided to use views in Postgres. The project with Superset was finished successfully, but the time spend is a multiple compared to using something like Power BI.

All in all, not very innovative, but highly needed open source version of a traditional BI tool. Definitely something to follow and to use in temporary, not too demanding use cases. And hopefully a future replacement of Tableau or Power BI.

cogman10
12 replies
1d2h

This looks like grafana, right? Why would I use this instead of grafana?

skadamat
2 replies
1d1h

I love Grafana but Grafana doesn't really support non-time-series visualization that well.

sgt
1 replies
9h55m

Why is that, though? I'd think that there'd be some plugins/extensions for Grafana that could do this. Grafana could then become the next PowerBI/Tableau/Superset killer eventually.

skadamat
0 replies
3h32m

Different audience / use case. I've noticed that products often lean towards speaking to app builders (full stack swe's) or data builders (data analysts / scientists / data engineers). They require different mental models I feel.

Grafana I sense is culturally focused on observability visualization (aka needs of full stack devs). Culture is very hard to change!

jldugger
2 replies
1d

They're both washboarding apps, and while I'm sure they each have panel types the other doesn't yet support, I don't think that's intrinsic. The differentiation as I see it, is that Superset is designed to craft SQL queries and visualize the results. The query builder is probably where this shows the most.

To make it more concrete -- coworkers tell me Grafana doesn't work so well with Apache Druid, while Superset supports it quite well.

jldugger
1 replies
20h10m

*dashboarding, yikes

totalhack
0 replies
17h17m

I thought this was some jargon I didn't know haha.

prpl
1 replies
1d2h

You can’t trivially plug grafana in front of any SQL database, and grafana is more about graphing/plotting (usually time series).

xyzzy_plugh
0 replies
1d2h

You can actually plug grafana in front of any SQL database, but I'm not sure it's a good idea.

pachico
1 replies
1d2h

The fundamental difference is that Grafana isn't great at cross referencing data in different data sources. (I love Grafana and I pay for the Cloud version.)

peterleiser
0 replies
20h25m

I found that running TrinoDB in a docker container and adding the trino plugin to grafana was very straightforward. TrinoDB feels magical sometimes, except that the SQL syntax they use seemed awkward IIRC. Also, there are inexplicable performance problems with certain queries that require trying subtlety different SQL queries until it snaps out of it.

samuell
0 replies
1d2h

Much more focused on interactive slicing and dicing of data, rather than mostly following a few pre-defined time-series, as is the focus of Grafana.

As such, closer to an open source replacement for PowerBI.

bfung
0 replies
1d2h

grafana is built more for operational and timeseries data, but not so optimal for complex analytical queries. Ex: up-to-second data on cpu load on a host.

superset is the flip side of grafana; not good for up-to-second updates, but good for complex queries. Also, non-time series stuff. Ex: Which customer groups bought which products for all time? <— that type of BI stuff.

adlpz
10 replies
1d2h

Has anyone tried both this and Metabase? I've used Metabase in a few projects and I find it very nice. This seems more powerful, perhaps?

Is it worth it for BI on small datasets?

CalRobert
6 replies
1d2h

Yes, I am at a company using Metabase, but I have a decent amount of experience with Superset (albeit from many years ago).

The reason we chose Metabase was that it had table joins, while Superset doesn't (unless it has added them since I used it). It also looks a bit sleeker. But I strongly prefer Superset; I found that with Metabase I had to turn a lot of things off to make it usable (Let me see "the_table" not "The Table"!), I was constantly annoyed at the opacity around models vs "questions", etc. and every time I wanted to change a question Metabase insisted on creating a new one instead. The real issue here was when we wanted to swap out the data source for a lot of questions but there was no clean way to do so without MB just creating new questions.

Also, Metabase doesn't have serialization unless you pay them AND you self-host, (if I'm self hosting then what exactly am I paying for?) and that's pretty annoying. https://www.metabase.com/docs/latest/installation-and-operat....

But it does let you join tables. Sometimes that's enough to make MB worth dealing with.

rusackas
2 replies
19h43m

Superset lets you join tables within the same database. If you want to do cross-DB joins, we have a new (beta) in-memory meta-DB that lets you do this, but we generally see and recommend people using things like Trino for this.

Cilvic
0 replies
18h10m

Is that new? Last time I checked this was the major downside from superset

CalRobert
0 replies
12h32m

Nice! When was that added?

noduerme
0 replies
1d1h

The "model" vs "question" thing is really annoying as there's no real difference from the user's perspective, and it's easy to accidentally convert a model back to a question without noticing when you publish something. You notice when you try to drill into the chart. There's a lot of annoying manual labor in metabase, e.g. I want to filter something into 10 different charts and I need to duplicate it 10 times and change a filter on each one. Still yeah joins are nice. A non-bugged aggregate count/sum as a window function would be nicer.

itsoktocry
0 replies
1d1h

I was constantly annoyed at the opacity around models vs "questions"

Yeah, somewhere along the line Metabase decided to get opinionated on "self-serve". I imagine it works well for some teams and companies, but for the tech-oriented, it's annoying.

I prefer my BI tools to be platforms that make for easy charting and cross-filters, while I build and control the models behind the scenes with a tool like dbt.

adlpz
0 replies
1d

Thanks! Very detailed answer.

I've found the weird "make it easy" mindset a bit annoying with Metabase too. The whole questions, nice table names...

I'll give Superset a try in my next project I think.

sokols
0 replies
23h23m

Metabase is great, I use it with a Oracle Database.

skadamat
0 replies
1d1h

Metabase is a bit more user-friendly to be honest than Superset. Superset has a WAY more liberal license, so it's ideal for people who want to customize Superset and build data apps.

code_biologist
0 replies
5h30m

Reposting from a comment of mine about 60 days ago:

I recently ran a little shootout between Superset, Metabase, and Lightdash — all open source with hosted options. All have nontrivial weaknesses but I ended up picking Lightdash. Superset is the best of them at data visualization but I honestly found it almost useless for self-serve BI by business users if you have existing star schema. This issue on how to do joins in Superset (with stalebot making a mess XD) is everything difficult about Superset for BI in a nutshell. https://github.com/apache/superset/issues/8645

Metabase is pretty great and it's definitely the right choice for a startup looking to get low cost BI set up. It still has a very table centric view, but feels built for _BI_ rather than visualization alone.

Lightdash has significant warts (YAML, pivoting being done in the frontend, no symmetric aggregates) but the Looker inspiration is obvious and it makes it easy to present _groups of tables_ to business users ready to rock. I liked Looker before Google acquired it. My business users are comfortable with star and snowflake schemas (not that they know those words) and it was easy to drop Lightdash on top of our existing data warehouse.

Wilduck
6 replies
22h30m

Is Superset a decent tool if you're just a single person doing data analysis? Say I have a handful of sqlite databases, and just want to be able to develop some queries / charts. I was looking into Tableau / Power BI / Superset, and all of them seemed pretty heavyweight for a single user, and none of them seemed super easy to get setup locally.

Any recommendations for a good piece of software for the single user case? Or a more convenient way to run the heavyweight tools?

unixhero
4 replies
22h29m

Tableau is the best, most powerful, most mature of the three, most feature complete and easiest of the three. I think they give you a 30 day trial.

This is a single user application, unless you make it part of your built application.

VenkatPram7
1 replies
15h22m

Superset isn't a single user application?

unixhero
0 replies
12h0m

Ah, sure

javchz
0 replies
2h40m

I'll say PowerBI has the potential to be more powerful, but you need to love the whole M and DAX languages eco system. And the integration with python and R it's not that bad.

But if your vis are with the scope of native Tableau capabilities, then Tableau it's more friendly and gets less in the way of you and your work.

bigger_cheese
0 replies
14h12m

If you are doing data analysis I don't think any of the 3 pieces of software you mentioned are going to be that helpful.

I see these products as tools for data visualization and reporting i.e. presenting prepared datasets to users in a visually appealing way. They aren't as well suited for serious analytics.

I can't comment on Superset or Tableau but I am familiar with Power BI (it has been rolled out across my org), the type of statistics you can do with it are fairly rudimentary. If you need to do any thing beyond summarizing (counts, averages, min, max etc). It is not particularly easy.

For data analysis I use SAS or R. This software allows you do things like multivariate regression, timeseries forecasting, PCA, Cluster analysis etc. There is also plotting capability.

Both these products are kind of old school, I've been using them since early 2000's, the "new school" seems to be Python. Pretty much all the recent data science people in my organization use Python. Particularly Pandas and libraries like Seaborn (https://seaborn.pydata.org/).

The "power" users of Power BI in my organization tend to be finance/HR people for use cases like drill down into cost figures or Interactively presenting KPI's and other headline figures to management things like that.

throwaw12
5 replies
1d2h

anyone knows how does it compare to Looker?

totalhack
2 replies
17h6m

The lack of a semantic layer and join limitations are what made me pass on superset, but that was a couple years ago so maybe those features have been added.

I built my own semantic layer instead. I use this in production in my company but obviously use at your own risk as it's a one-man show.

https://github.com/totalhack/zillion

anentropic
1 replies
5h56m

This looks interesting for me, but I'd really like more detail about the architecture and deployment in the docs.

There is this:

A final SQL query against the combined data from the DataSource Layer

The Combined Layer is just another SQL database (in-memory SQLite by default) that is used to tie the datasource data together and apply a few additional features such as rollups, row filters, row limits, sorting, pivots, and technical computations.

But it leaves me with questions - how/when does this get populated? What other options are there besides in-memory SQLite? (I presume that's just a convenience for development and would use something else in production?)

Or is it just what Superset calls a 'metastore' i.e. data about the data, and the queries are run against the data source layer?

anentropic
0 replies
2h20m

Or from a comment elsewhere in this thread about Superset:

Superset lets you join tables within the same database. If you want to do cross-DB joins, we have a new (beta) in-memory meta-DB that lets you do this

...is it this?

Maro
5 replies
1d2h

I love Superset.

I've been running it in production since 2017, at two jobs, the current one a big corporation.

Best general-purpose, database-backed dashboarding system out there. I would never pay for Tableau or PowerBI.

Same for Airflow.

atlas_hugged
4 replies
1d2h

Same for Airflow? I’m not sure I understand what you mean.

luccasiau
3 replies
1d2h

They were both made by Airbnb and then open-sourced, which is the similarity I assume they meant

jerrygenser
2 replies
23h46m

They were also more specifically authored by the same individual!

rusackas
1 replies
19h41m

Maxime, the original author of Airflow/Superset, is also the CEO of Preset (where I work), so he/we are still working on Superset every day :)

edanm
0 replies
59m

Oh that's awesome! Must be awesome to work on that. We've been using Airflow in production for 6 years at this point with various clients and it's been great, and we're trying to sell people on Superset now as well.

posix_monad
4 replies
1d

Is this capable of performing efficient JOINs across non-homogeneous data-stores?

Lucasoato
1 replies
22h10m

Should it? If you really need that, join the different sources with TrinoDB (or any related managed service like AWS Athena) and connect it to Superset.

ildjarn
0 replies
21h59m

It’s common for business questions to only be answerable with a join over a few different stores.

I think Athena can only query data on S3?

totalhack
0 replies
17h20m

Superset would be on my shortlist if I had to use something else, but the join limitations were part of why I passed.

grzaks
0 replies
11h13m

We use https://cube.dev/ as intermediate layer between data warehouse database and Superset (and other "terminal" apps for BI like report generators). You define your schema (metrics, dimensions, joins, calculated metrics etc) in cube and then access them by any tool that can connect to SQL db

paddy_m
3 replies
1d1h

I wish more projects had guided tour videos that demonstrated the power of the tool in the hands of an expert user. Not "get started" but "why should I care".

Wes McKinney used to have an excellent 5 minute introduction to pandas in this genre.

paddy_m
0 replies
1d

I saw that video on the website. It isn't narrated or captioned as to what the users is trying to accomplish

rusackas
0 replies
19h50m

You can check this out. This is a Preset Demo, but shows quite a bit of Superset within Preset (which offers multiple instances of Superset as "Workspaces") https://www.youtube.com/watch?v=V0HwGnC1rU8

rglullis
2 replies
1d2h

Anyone that worked it and could compare with Redash?

skadamat
1 replies
1d1h

Well Redash got acquired so development stopped, biggest difference between Superset & Redash. Preset.io supports Superset still

rglullis
0 replies
1d1h

Redash development slowed down for sure, but it's not looking abandoned. It's just that I've been using it for some time now, I'm wondering if is anything feature-wise that could justify the switch.

mikpanko
2 replies
1d1h

Does anybody know why Superset started trending today? Is there a major release?

rusackas
0 replies
19h56m

There is a major release on the horizon (4.0) and there were just a couple of patch releases for the 3.x variants. I'm surprised to see it trending too, but I'm happy about it. More people need to know that Open Source BI is here, and here to win.

remram
0 replies
20h12m

Is there more than this single HN submission?

fuzztester
2 replies
23h17m

Wow, those Apache guys have so many projects. Of course, they've been at it for years, starting with the Apache web server, then Tomcat, etc., and also, many projects were first developed outside and then handed over to them, for whatever reasons.

andrewshadura
1 replies
20h22m

And sometimes projects are handed to them to die. The way they (mis)handle OpenOffice is unforgivable.

fuzztester
0 replies
18h9m

Interesting, did not know.

In what way, any details?

Not been tracking that or using OpenOffice for a while.

adamgamble
2 replies
19h19m

We use metabase heavily at work. However where it seems like all these tools fall down is organization around the hundreds of dashboards and questions. I wish it had like a built wiki or something to build out more navigation. Anyone know of any good ways to do that?

xtracto
0 replies
17h18m

Mhmm this gives me an idea.. what if I could "group" metabase sql queries by "similarity" (either of results or of the query itself)

Another option could be to use LLM to summarize, tag and group queries for better discoverability.

_pastel
0 replies
15h58m

100% agree.

One thing that helps is hooking metabase up to its own database and building queries on your queries, e.g.:

    select *
    from report_card 
    where dataset_query ilike '%' || {{query}} || '%'
(You can also join in metadata like the author, when it was last ran, etc.)

We also try really hard to keep the Collection directory structure clean and consistent. But it's still really hard.

zX41ZdbW
1 replies
1d

Superset is powerful, but I wonder why they don't fix "papercuts", e.g., misaligned pixels on a spinner, or inability to copy a value from a table's cell, or non-monospace font for numbers in a table, etc. There are hundreds of small annoyances in the product.

rusackas
0 replies
19h48m

We try! We also accept PRs and Issues if there are things bugging people, of course. It's always a balancing act between building some new feature that people are clamoring for, or fixing those cosmetic issues that always crop up.

uraura
1 replies
4h42m

From the introduction, I can see a list of backend technologies. But do they have a high level architecture diagram? I don't know what I really need for production setup.

anentropic
0 replies
2h24m

I wanted the same info, sadly lacking.

AFAICT needs a db (MySQL/Postgres) and a cache (Redis/Memcached) and one (or more?) web workers.

Then optionally also Celery workers (for "async queries" i.e. slow running)... not sure how optional that is though.

twic
1 replies
22h10m

How does this compare to Jupyter notebooks and the ecosystem around that? Do the use cases overlap, or are they completely different things?

Lucasoato
0 replies
22h4m

In my experience, people with a business related background have an easier time learning how to use BI tools (this is true even if Superset may be less user-friendly than other commercial product like Tableau); Jupyter is an interactive computing platform that is based on notebooks and cells, that's more useful for data scientists/engineers whose needs might exceed the capabilities of a SQL interface.

tomrod
1 replies
22h8m

It's been a few years since I evaluated superset. Did they ever resolve drilldown (filter for one chart on a page, populate to all charts)?

rusackas
0 replies
19h58m

Yep... there's Drill By, which is more flexible than drill-down. Rather than having to specify a strict hierarchy of drilling "levels" you can pick columns, hierarchical or otherwise, to drill into.

rongenre
1 replies
1d2h

We use this at my ginormous employer in order to give devs limited access to production data.

martin82
1 replies
13h3m

Bummer that it can't pull data from JSON APIs, which Redash can do.

ldjkfkdsjnv
1 replies
1d2h

I remember using Superset in 2017 or so, was forced to by a manager that would not pay for off the shelf software. I also did a few open source contributions to fix some bugs, it was a disaster. A huge rats nest of python. Might have changed in the last few years, am surprised its still active

jimvin
0 replies
1d1h

It's definitely come a long way since 2017! It's improved markedly in terms of functionality and performance. It looks much prettier now as well.

kumarvvr
1 replies
13h57m

Tried installing it, locally in a Python Virtual Env.

Apparently installation will not work with Python 3.12, dur to deprecation of distutils.

Does anyone have any method to install this?

hiepdev
1 replies
15h17m

How does it compare to Kibana + Elasticsearch?

nullify88
0 replies
12h11m

A big thing here is that Superset and most of the other BI tools can connect directly to databases which is commonly the source of truth or data warehouse in some businesses. Secondly, Elastic have focused on other operational areas such as security, observability, and indexing / search. Kibana can do some dashboarding on those areas and its UI is nice, but Superset and similar tooling are more suited for BI purposes.

emilienaples
1 replies
22h33m

How would you compare Superset with PowerBI for analytics and CSS integration? Trying to develop features and advanced analytics capabilities into an app?

rusackas
0 replies
19h53m

You can style dashboards with CSS as much as you'd like, though there are some limitations (canvas/webGL elements). I wrote a whole blog post on it: https://preset.io/blog/customizing-superset-dashboards-with-...

If you want to style the whole application, you can fork the repo and go bananas. If you're looking for theming, there's more to be done yet on that front, and I wrote an article on that too: https://preset.io/blog/theming-superset-progress-update/

cheema33
1 replies
21h31m

I recently discovered Apache Superset. I would love to use it in our product. Does anybody know if it possible to integrate it into an existing product? I am mostly curious about hooking up its authentication system to our own authentication system, which is based on auth in ASP.NET Core 8.

Cilvic
0 replies
21h26m

Took me a while figure out how to embed it into my app using Superset Embedded SDK.

Superset Embedded SDK - "Embedded SDK allows you to embed dashboards from Superset into your own app, using your app's authentication. Embedding is done by inserting an iframe, containing a Superset page, into the host application."

atbpaca
1 replies
20h34m

This looks really good! How does it compare to Tableau?

rusackas
0 replies
19h45m

Well, it's free! Or significantly cheaper even if you opt to use Preset to run a hosted/managed/compliant version of it, and not have to deal with config/security/upgrades/migrations. This article is a year old, but it might help a bit: https://preset.io/blog/apache-superset-vs-tableau/

HermitX
1 replies
21h4m

Here is a fantastic video made by Soumil Shah, using MinIO+Hudi+StarRocks+Superset. It is amazing to have an interactive query experience on a data lake directly! https://www.youtube.com/watch?v=JkKBzrQTKx0

3abiton
0 replies
18h34m

Thanks for sharing, it's so exciting to see so many OSS BI frameworks

wesleyyue
0 replies
12h16m

Surprised no one has mentioned hex yet. There was a post on the yc internal forum today about data stacks and a lot of founders mentioned they liked hex. I hadn't heard too much about them before but they looked interesting for someone (me) who typically prefers something closer to a jupyter notebook and simple stacks.

vietvu
0 replies
16h26m

Used Superset back in 2016 and 2020; both time chose Metabase for our clients' BI dashboard and Superset for our internal dashboard. Superset is nice, easy to modify and extend but not user friendly as Redash or Metabase. But after the author launched Preset, it seems to have improved much with the company effort. It looks like to me the best way for OSS to advance is to have a company dedicated to improve it.

vfclists
0 replies
1d1h

Generally what you get when VentureCapital/PrivateEquity buys out Redash.io, messes up end users in the process and spits it out a few years later, leaving users confused as to where it stands in the BI tools landscape.

spdustin
0 replies
22h11m

For my last employer, I set up Superset for a number of our clients to show all sorts of heavily customized marketing analytics dashboards, web performance graphs, project management burndown reports, you name it. As with another commenter's experience, we also got a client to replace Tableau with it, and not look back. Such a great product.

rietta
0 replies
21h38m

Neat. I have to admit I about had a heart attack reading "Superset" as "Sunset" at first. I've become too jaded about stuff being shut down and announced on HN. Very pleasantly surprised when I read correctly and clicked through to see its about data analytics.

orestis
0 replies
3h5m

Can this work to give end-users/customers the ability to create their own reports/charts, respecting data access visibility etc?

I am in need of a "dashboarding" feature in our SaaS, but it seems there's a gap between PowerBI/Tableau/Metabase/Superset and various charting libraries. The former are too much "turn key" and the latter require a ton of work to setup all the chart-building UI and features...

nvrmnd
0 replies
1d

One thing to keep in mind with BI software is that the users are often very different than, well, those individuals that prefer to use mutt as an email client.

Many, or most, users for a BI tool will be operations, product managers, and business management who simply will not find the interface to be intuitive, responsive, or well designed. At least that's my experience.

loufe
0 replies
2h4m

I wonder why so few BI software support Pi databases. They are pervasive and mission critical in commodity industries, but there only seems to be proprietary options available.

lf-non
0 replies
10h0m

Full fledged BI tools like Superset and Metabase are amazing for their intended use cases.

But they may be an overkill if your primary use case is to infrequently build semi-interactive reports for non-technical end-users and your use cases are are mostly covered by standard graphs & tables. Esp. so if you are familiar with SQL and have access to the underlying data source. Two nifty utilities I have found to be very useful for latter kind of use cases are SQLPage and Evidence.

They make it very convenient to whip out some SQL and convert that to a neat professional looking web ui that can be forwarded to an end user. In case of Evidence it is a statically generated site, and in case of SQLPage it is a web app that connects to a live database.

SQLPage: https://sql.ophir.dev/

Evidence: https://evidence.dev

lars_francke
0 replies
1d2h

We've built a Kubernetes Operator for Apache Superset at Stackable: https://github.com/stackabletech/superset-operator/

It's part of our Open Source Data Platform and it's one of the few open source BI tools out there and there are not a lot of alternatives in this space. We generally like it.

indymike
0 replies
1d2h

Can vouch for Superset. I use it in a couple of my companies and love it.

datatrashfire
0 replies
16h44m

love superset, but one thing that I would love to see is to make it easier for dashboards/charts to use a dynamic table that the user can select.

we have multiple tenants + developer instances of our warehouse. to reuse the same dashboard in this setup we need to create at least 3 virtual datasets, plus wrangle a bunch of boiler plate jinja.

dang
0 replies
1d2h

Related. Others?

Open source Business intelligence platform made with Python - https://news.ycombinator.com/item?id=29368664 - Nov 2021 (49 comments)

Apache Superset 1.1 - https://news.ycombinator.com/item?id=27439939 - June 2021 (28 comments)

The Apache Software Foundation Announces Apache Superset as a Top-Level Project - https://news.ycombinator.com/item?id=25905277 - Jan 2021 (1 comment)

Apache Superset is an enterprise-ready business intelligence web application - https://news.ycombinator.com/item?id=21133931 - Oct 2019 (7 comments)

amai
0 replies
9h13m

Does it have horizontal bar charts nowadays?