Sourcegraph CEO here. We made our main internal codebase (for our code search product) private. We did this to focus. It added a lot of extra work and risk to have stuff be open source and public. We gotta stay focused on building a great code search/intelligence product for our customers.
That's what ultimately lets us still do plenty of things for devs and the OSS community:
(1) Our super popular public code search is at https://sourcegraph.com/search, which is the same product customers use internally on their own codebases. We spend millions of dollars annually on this public instance with almost 1M OSS repositories to help out everyone using OSS (and we love when they like it so much they bring it into their company :-).
(2) We also have still have a ton of open-source code, like https://sourcegraph.com/github.com/sourcegraph/cody (our code AI tool).
BTW, if any founders out there are wondering whether they should make their own code open-source or public, happy to chat! Email in profile. I think it could make sense for a lot of companies, but more so for infrastructure products or client tools, not so much for full server-side end-user applications.
Been a fan of sourcegraph since 2016 or so, it's been exciting to watch the pivots along the way. That being said, the loss of transparency here is pretty sad, speaking as a large FOSS repo owner. What were the main factors apart from risk that went into the decision?
Thanks for being a fan. And I understand it's a bummer to not have our code be public and open-source anymore. Sorry.
It's a bunch of reasons that add up. I'll give some more details for anyone curious.
(And I know that despite these reasons, lots of HNers probably wish it was not so. I agree! I too wish for a world where all companies could have their code be public and open source.)
- We have a lot of tech around large-scale code graph, indexing, etc., stuff that is very differentiated and hard to build. We were starting to put some of this in separate private repositories and link them in at build time, but that was complex. It added a lot of code complexity, risked bugs, and slowed us down, and if a lot of the awesome stuff was private anyway, what was the point?
- As we've been building Cody (https://cody.dev), our code AI tool, we've seen a LOT more abuse. That's what happens when you offer any free tier of a product with LLM inference. We had to move a lot more of our internal backend abuse logic to private repositories, and it added code complexity to incorporate that private stuff in at build time.
- It confused devs and customers to have 2 releases: an open-source release with less scaley/enterprisey features, and an enterprise release. It was a pain to migrate from one to the other (GitLab also felt this pain with their product) because the open-source build had a subset of the DB schema and other things. It was confusing to have a free tier on the enterprise release (lots of people got that mixed up with the open-source release), and it made our pricing and packaging complex so that lots of our time was spent helping customers understand what is paid and what isn't.
- There were actually very very few companies that were going to pay but then decided to use the open-source version and not pay us. A lot of people probably assume that's why we made this move, but it's not. I think this is because people like the product and see value in it, including all the large-scale code nav/search features that are in our enterprise version.
- Although very very few companies used our open-source version to avoid paying us, we did see it cause a lot of annoyance for devs who were asked by their management to try cloning our product or to research our codebase to give their procurement team ammunition to negotiate down our price. This honestly was just a waste of everyone's time.
- If we got a ton of contributions (we never really solicited any), then it might've changed the calculus. Sourcegraph is an end-user application that you use at work (and when fun-coding, but the primary revenue model is for us to charge companies). For various reason, end-user server-side applications just don't get nearly as many contributions. Maybe it's because you'd need to redeploy your build for a bunch of other users at your company, not just yourself. Maybe it's because they necessarily entail UX, frontend, and scaling stuff, in addition to just adding new features.
- We heard from people who left GitHub that people at GitHub were frequently monitoring our repository to get wind of our upcoming features and launches. Someone from GitHub told me his "job is to clone Sourcegraph". Since then, they obviously deprioritized their code search to re-found GitHub on AI, so we're not seeing this threat anymore. But I didn't love giving Microsoft an unfair advantage, especially since GitHub products are not open source either.
- Since we made our code non-open-source, we've been able to pursue a lot more big partnerships (e.g., with cloud providers and other distribution partners and resellers). This is a valuable revenue stream that helps us make a better product overall. Again, because Sourcegraph is an end-user application with a UI that devs constantly use and care about, we never really had the MongoDB/Redis/CockroachDB risk of AWS/GCP/Azure just deploying our stuff and cutting us out. We're not protecting from downside here, but we are enjoying the upside because now those kinds of distribution partnerships are viable for us. To give a specific example, within ~2 months of making our code non-open-source last year, we signed a $1M+ ARR deal through a distribution partner that would not have happened if our code was open source. This is not our biggest annual deal, but it's still really nice!
We are totally focused on building the best code search/intelligence and appreciate all our customers and all the feedback here. Hope this helps explain a bit more where we're coming from!
Trying to spin that it was "for the devs" is really stretching the bounds of incredulity. We get it, its fine, you have investors to answer to, but come on don't pee on our shoes and tell us its raining.
Fair, I probably didn’t hear from the devs who weren’t annoyed by that. I heard from plenty of devs who were annoyed by it.
This wasn't a decision made based on dev input, lets be real.
This seems weirdly hostile. He laid out a bunch of points but you’re grabbing on to this one to make it seem like he’s using classic corporate-speak. Do you find it so unrealistic that the CEO of Sourcegraph has heard from devs that their managers asked them to try to clone or investigate the product before buying? That seems pretty likely
I don't think calling out clear insincerity in the service of maintaining a public image is weirdly hostile. Maybe he did hear from devs saying it was annoying they were asked to clone or make his product work for free. But he _wasn't making these decisions "for those devs"_ as he claims, it was did to increase _sales._
It's both hostile and, worse, boring. I know it sucks to be intrinsically less interesting than someone you disagree with passionately, but it is the case here that the CEO of the company explaining their policy shift is much more interesting than your rebuttals, which seem superficial and rote by comparison.
Someday somebody is going to be intrinsically more interesting about, like, supporting DNSSEC than me (maybe Geoff Huston will sign on and start commenting), and I'm going to want to claw my eyes out. I have empathy for where you're coming from. But can you please stop trying to shout this person down?
people can do things for more than one reason
If we ignore the final sentence of his reason, then you might have a point. But given his reason ends with:
Makes it pretty clear that the benefits to Sourcegraph (I.e. not wasting time negotiating with companies acting in bad faith), was a large part of this rationale.
Besides, if you had ever tried using the OSS version of Sourcegraph, you would realise that OSS Sourcegraph is a shadow of its enterprise version. Trust me, Sourcegraph didn’t loose any sales to people running OSS Sourcegraph, and anyone who’s willing to rip out the licensing system, so they can use the enterprise features without paying, obviously isn’t going to become a paying customer either.
Investigating Sourcegraph's source code as part of procurement is not only plausible, but useful work that a software engineer should be happy to do.
Stating that making such evaluations impossible is a good thing is therefore more bullshit than other reasons to go closed source.
Actually this one I get completely. There’s plenty of places or managers with dev orgs that will check if they can install something complex in house with open source. Nothing wrong with it. But it’s usually a huge waste of time.
Is it? I think at this point my company has probably saved millions of dollars by not paying for subscriptions, but hosting everything in-house. The price point of a lot of these services makes perfect sense when you are small, but paying 1M/year in subscription fees when you can host the same thing for 10k/year is just bonkers. I appreciate that someone has to pay for it for them to continue making the product, but there’s a point where it makes more sense for me to spend a year setting it up (and really only costs two weeks).
My experience was with things like openstack and kubernetes. The org decided to do “cloud” in house first with openstack and then kubernetes - and run critical services on them that had very strict performance SLA.
The amount of time needed to do the whole thing wasn’t worth it. Sure I enjoyed tinkering with the kernel and drivers and k8s. Also diving into known cgroups and namespaces worked etc. However, from a time to market/stability perspective the solution was nowhere comparable to what public cloud providers offer.
Yeah - the subscription costs more. My experience has been that when things get big and hiring gets tense in house solutions just add stress on the devs maintaining it. At least with public cloud services - it’s clearer - if the budget doesn’t exist don’t run it.
I will add that I don’t use sourcegraph nor am I connected with them in anyway. So I’m not batting for their go private strategy. Just commenting on this one point.
That math only works out nearly that cleanly if you avoid pricing out the engineer time for it.
If you’re paying $1M/year in fees, I would be shocked if you don’t have a whole team to support the open source version. Oncall, system upgrades, the usual stream of tickets about things not working right and people wanting to integrate, etc.
I do believe it can be cheaper to self-host, but I really doubt the difference in cost is 2 orders of magnitude. I’d be surprised if it was a single order of magnitude. I would wager it’s less than the sellers profit margins because of economies of scale; I would guess in the range of 10%-20%.
Well that obviously doesn’t apply to Sourcegraph because their self-host offering requires paying a subscription. You can’t use any form of Sourcegraph on private code, (at least not without all the important features being nobbled) without paying a subscription. So there’s no saving to be made from self-hosting sourcegraph
Why? Getting operational experience with the product that you might then pay a lot for seems very important. Especially if you end up liking the product/service but not the pricing changes that might then happen, so doing some exploratory fact finding for a backup plan doesn't seem to be waste of time.
For example when we used Jira on-prem and it was snappy and we were happy ... and it was a rather important point of difference compared to the slow shitocumulus version.
Also, when people are using GitHub issues to ask questions the problem is usually a lack of clear documentation. (And if spending time to link FAQ answers to potential customers is a waste of time ... then maybe it's not surprising that Sourcegraph CEO is doing damage control on HN instead of focusing on focusing or whatever.)
Yeah while I'm sure the developers that were asked to just grab the code and make it work wasn't their favorite job, I think the bigger one is further down - Github developers being tasked with reverse-engineering an open source product to create a closed source clone.
I would've respected GH more if they just used Sourcegraph and instead spent those developers on improving the open source product itself. But, I suspect that Github / Microsoft would then need a locked down license that e.g. Sourcegraph would forever remain open source, or that GH gets free licenses if they ever went closed source, or whatever.
They don't want Github to clone their product. They weren't doing it for the Github devs.
So the reason these deals are now possible is mainly because time was freed up by not having the code base opensource?
No, it's that if all the code is free and open source for anyone, we would not be able to charge for it and there would be no deals. Even if, say, 60% of our product was open-source and 40% was closed source, we might still get a lot of direct customers but would struggle to do distribution partnerships because the distribution partners have outsized incentives and capacity to reimplement the subset of the 40% they think their market needs.
I believe the question came up because the original rationale given was “we did this to focus”, not “we couldn’t sell the code for as much if it was open source”.
Both are factors, as I said in my original post (focus and risk).
“We stopped giving away some of our apples due to risk.”
“Of… liability? Or… uh, what?”
“Oh—risk that we couldn’t sell the apples we gave away, obviously.”
I was thinking business risk. Sorry it wasn’t clear.
When a software business makes decisions in the name of "focus", they're usually implicitly saying the "on the stuff that will make the company more money" part. Focus implies product/market fit.
I appreciate this answer -- it clears a lot of things up!
I think the term the industry needs to embrace is "Early Source": https://breckyunits.com/earlySource.html
Make everything public domain, fully open source, just delayed by N years.
There is a term for this, no? https://opensource.org/dosp
Interesting! I hadn't seen that term. Thanks!
I don't like their implementation though. If one thinks from natural principles, one has to reject the idea of licenses on ideas.
Early source is in harmony with nature.
Also "Early Source" rolls off the tongue better than "Delayed Open Source Publication". ;)
Yeah, but nobody will know what "Early Source" means until you explain it, whereas the latter makes perfect sense on first reading.
There was a time when no one knew what "Open Source" meant.
Will be lovely to have the source N years after AGI terminates humanity.
Lol. You could write a sci-fi novel with a world of cyborgs where the age of all cyborgs is N (when they first got access to the source). And primitives are called "Pre-Ns"
Not too shabby an idea!!
Why?
This thread reminded me to finally try Cody, I've been bouncing on and off Copilot for a few months. I wish I knew how good this was sooner, and I had no idea there was a generous free tier.
If you (or anyone here) are an open source maintainer, please sign up for free Cody Pro credits https://sourcegraph.com/supporting-open-source
My most popular repo is just barely under the cutoff, but I can't advertise it because I'll doxx my shitpost account! Damn! I'll try to apply anyways ;)
Submit it anyway, I'll approve it.
Submission sent! Thanks!
If you're open to trying new AI coding assistants, would love if you can give https://double.bot a try! (note: I'm one of the creators) The main philosophical differences is that we are more expensive and are trying to build the best copilot with the technology possible at any given time. For example, we serve a larger, more accurate, and more modern autocomplete model, but it does cost more to serve. We also do a lot of somewhat novel work in getting the details right, like improving the autocomplete model to never screw up closing brackets, and always auto-close them as if you typed them.
Huh in what way does publishing a source tarball alongside a release introduce a lot of work, risk and distraction? Your explanation makes literally no sense
EDIT: I implore the downvoters to think about this for a second. You can, actually, publish source code for a project without also committing to providing support and documentation and testing across a variety of systems. Publishing a tarball takes very little time and effort.
Doing a great job on an open source codebase requires a higher level of polish, testing, design, ux, documentation, architecture, and general forethought than internal tools just like any internal vs self serve product.
Only solving your own problems on your own hardware while being able to rely on your own well-informed team to bridge the gaps sounds much much faster and easier to me.
Sure but you can publish the source code while only solving your own problems on your own hardware, you're not required to provide support and documentation just to publish source code...
There is a significant intrinsic cost to making code "open source", through the simple act of making that source code available at all. This overhead exists without any regard for what you wish or promise. It invites myriad interactions that cost time and money for little or no offsetting benefit.
Publishing source code, if anyone uses it at all, is not "free" in any sense. I know several people that stopped open sourcing their projects (not even businesses) because the cost of making their code available isn't worth it.
Moving the goalposts to doing a great job internally also requires those things. Meanwhile, doing a perfectly fine job of FOSS requires none of them.
I hope code search will one day be offered at a lower price, so small/medium sized companies can use the product. I'll never be able to convince someone to buy it when it's 3 or more time as expensive source code hosting, and would in many cases be most expensive SaaS product per developer seat that the company uses. But it's a great product.
$9 to $20 per seat seems pretty average in the grand scheme of SaaS price modelling. I don't work in software development, but IT however.
"SaaS" is not a feature; you can’t compare products just based on the fact thay they are "SaaS". Gitlab for example brings me far more value than a tool to search my codebase; I wouldn’t put the same amount of money in both.
I feel the same way. It’s really interesting and provides cool insights. But it seems hard to explain to myself to spend more on that than GitHub or IDEs.
I’d like to hear more about the value customers get out of it as I wonder if it’s just groups with unlimited budget.
This is in the cards and thank you for the feedback! (Sourcegraph CTO here)
Correction: Public code on Github.
This looks to be restricted to searching Github only.. even though it had "context:global" on the querystring every hit came from Github, and none seen from Gitlab, Codeberg, Sourcehut and other self-hosted forges (e.g. Forgejo).
I’m sure there are 50 other ways you could categorise all the code that it searches. Nobody said that it exhaustively searches all available open-source code. I’m sure you know that that’s an impossible claim. This isn’t a correction at all. It is, at best, an elaboration. Certainly not worthy of the snark you’re giving. The reality is that GitHub hosts >99% of all open-source source code that anyone really cares about. If you have some philosophical issue with it, that’s fine, but don’t shoot the messenger by attacking individuals.
Yet another person equivocating the concepts of publishing code under an open source license and managing a project in public.
It has to be disingenuous, right? These concepts aren't complicated. I wish they would just say "we want to make more money" and stop polluting open-source discourse.
For business Open source is a business tool. Open source can be a goal, naturally, but for-profit entities have a duty to be profitable (or grow, plowing profits into building). I think there's no shame in saying this. You should not need to be elliptical in your public statements about this move. Everyone knows that this is about protecting your ability to monetize the product, and so it should be, and everyone knows this sort of move comes eventually.
Why not go the SQLite way? Open source but don't accept external contributions. Literally just dump the code.
The open/closed decision is a current weight on my mind right now. Our main competition is an open source product - it feels like it will be a tough sell to not also have the core of the product be free (Robotics framework). I might shoot you an email.