return to table of content

Slack AI Training with Customer Data

zmmmmm
50 replies
18h10m

For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data

This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.

It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.

They need to reword this. Whoever wrote it is a liability.

afc
30 replies
16h5m

Especially when a few paragraphs below they say:

If you want to exclude your Customer Data from helping train Slack global models, you can opt out.

So Customer Data is not used to train models "used broadly across all of our customers [in such a way that ...]", but... it is used to help train global models. Uh.

hackernewds
23 replies
15h36m

Why are these kinda things opt-out? And need to be discovered..

We're literally discussing switching to Teams at my company (1500 employees)

Shadowmist
11 replies
14h9m

You’d be better off just not having chat than switching to Teams.

metadat
9 replies
13h46m

But the business will suffer by most likely being less successful due to less cohesive communication.. it's "The Ick" either way.

andy_ppp
6 replies
12h29m

The idea that Slack makes companies work better needs some proof behind it, I’d say the amount of extra distraction is a net negative… but as with a lot of things in software and startups nobody researches anything and everyone writes long essays about how they feel things are.

unkulunkulu
4 replies
10h12m

Distraction is not enforced. Learning to control your attention and how to help yourself do it is crucial whatever you do in whatever time and in whatever technological context or otherwise. It is the most long term valuable resource you have.

I think we start to recognize this at larger scale.

Slack easily saves a ton of time solving complex problems that require interaction and expertise of a lot of people, often unpredictable number of them for each problem. They can answer with delay, in a good culture this is totally accepted and people still can independently move forward or switch tasks if necessary, same as with slower communication tools. You are not forced to answer with any particular lag, however slack makes it possible when needed to reduce it to zero.

Sometimes you are unsure if you need help or you can do smthing on your own. I certainly know that a lot of times eventually I had no chance whatsoever, because knowledge requires was too specialized, this is not always clear. Reducing barriers to communication in those cases is crucial and I don't see Slack being in the way here, only helpful.

The goal of organizing Slack is such that you pay right amount of attention to right parts of communication for you. You can do this if you really spend (hmm) attention trying to figure out what that is and how to tune your tools to achieve that.

andy_ppp
3 replies
9h44m

That’s a lot of words with no proof isn’t it, it’s just your theory. Until I see a well designed study on such things I struggle to believe the conjecture you make either way. It could be quite possible that you benefit from Slack and I don’t.

Even receiving a message and not responding can be disruptive and on top I’d say being offline or ignoring messages is impossible in most companies.

whatevaa
1 replies
9h32m

Your idea also comes with no proof, just your personal experience.

andy_ppp
0 replies
4h41m

Which is extremely clear from what I’m saying, it’s completely anecdotal.

unkulunkulu
0 replies
7h15m

This is your choice to trust only statements backed by scientific rigour or trying things out and applying to your way of life. This is just me talking to you, in that you are correct.

Regarding “receiving a message”: my devices are allowed only limited use of notifications. Of all the messaging/social apps only messages from my wife in our messaging app of choice pop us as notifications. Slack certainly is not allowed there

metadat
0 replies
12h22m

Good point, could be that it reduces friction too far in some instances. However, in general less communication doesn't seem better for the bottom line.

codingdave
1 replies
7h22m

I'm not sure chat apps improve business communications. They are ephemeral, with differing expectations on different teams. Hardly what I'd label as "cohesive"

Async communications are critical to business success, to be sure -- I'm just not convinced that chat apps are the right tool.

skydhash
0 replies
3h13m

From what I’ve seen (not much actually) Most channels can be replaced by a forum style discussion board. Chat can be great for 1:1 and small team interactions. And for tool interactions.

amne
0 replies
12h25m

we use Teams and it's fine.

Just don't use the "Team" feature of it to chat. Use chat groups and 1-to-1 of course. We use "Team" channels only for bots: CI results, alerts, things like that.

Meetings are also chat groups. We use the daily meeting as the dev-team chat itself so it's all there. Use Loops to track important tasks during the day.

I'm curious what's missing/broken in Teams that you would rather not have chat at all?

M4v3R
5 replies
12h24m

If you switch to Teams only for this reason I have some bad news for you - there’s no way Microsoft is not (or will not start in future) doing the same. And you’ll get a subpar experience with that (which is an understatement).

guappa
1 replies
11h33m

I think a self hosted matrix/irc/jitsi is the way to do it.

IshKebab
0 replies
11h5m

We've been using Mattermost and it works very well. Better than Slack.

The only downside is their mobile app is a bit unreliable, in that it sometimes doesn't load threads properly.

District5524
1 replies
6h55m

The Universal License Terms of Microsoft (applicable to Teams as well) clearly say they don't use customer data (Input) for training: https://www.microsoft.com/licensing/terms/product/ForallOnli... Whether someone believes it or not, is another question, but at least they tell you what you want to hear.

bayindirh
0 replies
5h56m

What if they exfiltrate customer data to a data broker and they buy it back?

It's not customer data anymore.

cpach
0 replies
7h24m

I would guess Microsoft has a lot more government customers (and large customers in general) than Slack does. So I would think they have a lot more to loose if they went this route.

trinsic2
0 replies
4h16m

Ugh Teams = Microsoft. They are the worst when it comes to data privacy. I'm not sure how that is even a choice.

rogerthis
0 replies
6h5m

Teams have better voice/video. But chat is far worse, absolutely shit, though Slack seems to be working to get there.

marricks
0 replies
14h54m

Obviously because no one would ever opt in.

jeffdn
0 replies
14h54m

I'd make sure to do an extended trial run first. Painful transition.

bayindirh
0 replies
5h57m

Why are these kinda things opt-out? And need to be discovered..

Monies.

We're literally discussing switching to Teams at my company (1500 employees)

Considering what Microsoft does with its "New and Improved(TM)" Outlook and love for OpenAI, I won't be so eager...

DougBTX
2 replies
12h52m

To me it says that they _do_ train global models with customer data, but they are trying to ensure no data leakage (which will be hard, but maybe not impossible, if they are training with it).

The caveats are for “local” models, where you would want the model to be able to answer questions about discussions in the workspace.

It makes me wonder how they handle “private” chats, can they leak across a workspace?

Presumably they are trying to train a generic language model which has very low recall for facts in the training data, then using RAG across the chats that the logged on user can see to provide local content.

ENGNR
1 replies
10h4m

My intuition is that it's impossible to guarantee there are no leaks in the LLM as it stands today. It would surely require some new computer science to ensure that no part of any output that could ever possibly be developed isn't sensitive data from any of the input.

It's one thing if the input is the published internet (even if covered by copyright), it's entirely another to be using private training data from corporate water coolers, where bots and other services routinely send updates and query sensitive internal services.

visarga
0 replies
9h2m

There is a way. Build a preference model from the sensitive dataset. Then use the preference model with RLAIF (like RLHF but with AI instead of humans) to fine-tune the LLM. This way only judgements about the LLM outputs will pass from the sensitive dataset. Copy the sense of what is good, not the data.

j45
0 replies
13h57m

Hope it's not doublespeak, ambiguity leaves it grey, maybe to play.

hackernewds
0 replies
13h10m

so if I don't want slack to train on _anything_ what do I do? I still suspect everything now

__loam
0 replies
13h48m

Opt out is such bullshit.

mayank
3 replies
18h3m

They need to reword this. Whoever wrote it is a liability

Sounds like it’s been written specifically to avoid liability.

MingFengLiu
2 replies
17h1m

I'm sure it was lawyers. It's always lawyers.

cqqxo4zV46cp
1 replies
11h20m

Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.

ben_w
0 replies
5h38m

How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.

Hm, now you mention it, I don't think I've ever seen this specific example.

Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg

Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.

chefandy
3 replies
16h14m

Nah. Whoever decided to create the reality their counsel is dancing around with this disclaimer is the actual problem, though it's mostly a problem for us, rather than them.

FuckButtons
2 replies
13h36m

It’s a problem for them if it looses customer trust / customers.

hackernewds
0 replies
13h9m

if they lose enough, they will "sorry we got caught"

if they don't, they will not do anything

chefandy
0 replies
13h2m

If it impacted their business significantly, it would restore some of the faith I've lost in humanity recently. Frankly, I'm not holding my breath.

j45
2 replies
13h57m

I'm imagining a corporate slack, with information discussed in channels or private chats that exists nowhere else on the internet.. gets rolled into a model.

Then, someone asks a very specific question.. conversationally.. about such a very specific scenario..

Seems plausible confidential data would get out, even if it wasn't attributed to the client.

Not that it’s possible to ask an llm how a specific or random company in an industry might design something…

hackernewds
1 replies
13h8m

exactly. a fun game to see why it is so hard to prevent this

https://gandalf.lakera.ai/

j45
0 replies
12m

Sometimes the obvious questions are met with a lot of silence.

I don't think I can be the only one who has had a conversation with GPT about something obscure they might know but there isn't much about online, and it either can't find anything... or finds it, and more.

throwaway4aday
0 replies
5h47m

I think it's as clear as it can be, they go into much more detail and provide examples in their bullet points, here are some highlights:

Our model learns from previous suggestions and whether or not a user joins the channel we recommend. We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data.

We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy. Simply put, our model can't reconstruct the search query or result. Instead, it learns from team-specific, contextual information like the number of times a message has been clicked in a search or an overlap in the number of words in the query and recommended message.

These suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.

To do this while protecting Customer Data, we might use an etrnal model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.

hyping9
0 replies
5h9m

They need to reword this. Whoever wrote it is a liability.

Wow you're so right. This multi-billion dollar company should be so thankful for your comment. I can't believe they did not consult their in-house lawyers before publishing this post! Can you believe those idiots? Luckily you are here to save the day with your superior knowledge and wisdom.

__loam
0 replies
13h47m

If you trained on customer data your service contains custom data.

Nition
0 replies
17h35m

- Create a Slack account for your 95-year-old grandpa

- Exclude that one account from using the models, he's never going to use Slack anyway

- Now you can learn, memorise, or reproduce all the Customer Data you like

JCM9
0 replies
7h1m

Whatever lawyer wrote that should be fired. This poorly written nonsense makes it look like Slack is trying to look shady and subversive. Even if well intended this is a PR blunder.

IanCal
0 replies
11h26m

The problem is this also covers very reasonable use cases.

Use sampling across messages for spam detection, predicting customer retention, etc - pretty standard.

Then there's cases where you could have models more like llms that can output data from the training set but you're running them for that customer.

koolba
32 replies
18h1m

We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.

Why would anyone not opt-out? (Besides not knowing they have to of course…)

Seems like only a losing situation.

m463
7 replies
17h52m

Why would anyone not opt-out?

This is basically like all privacy on the internet.

Everyone WOULD opt-out, if it was easy, and it becomes a whack-a-optput game.

note how you opt-out (generic contact us), and what happens when you do opt-out (they still train anyway)

__loam
5 replies
13h27m

Opt out should be the default by law

hackernewds
3 replies
13h6m

so upgrades and customer approves everything? slippery slope to over regulation

yellow_postit
0 replies
10h45m

Hence the cookie banners

bayindirh
0 replies
1h18m

I’d take over regulation every day over unabashed user abuse in the name of free markets.

__loam
0 replies
12h21m

The status quo is consumer abuse.

halostatue
0 replies
17h18m

When we send our notice, we are going to be sending a notice that we want none of our data used for any ML training from Slack or anyone else.

IMTDb
5 replies
17h22m

Why would anyone not opt-out?

Because you might actually want to have the best possible global models ? Think of "not opting out" as "helping them build a better product". You are already paying for that product, if there is anything you can do, for free and without any additional time investment on your side that makes their next release better, why not do it ?

You gain a better product for the same price, they get a better product to sell. It might look like they get more than you do in the trade, and that's probably true; but just because they gain more does not mean you lose. A "win less / win more" situation is still a win-win. (It's even a win-win-win if you take into account all the other users of the platform).

Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.

trinsic2
0 replies
4h41m

Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.

The problem with this reasoning, at least from what I am understanding is that you don't really know when/where the training of you data crosses the line into information you don't want to share until it's too late. It's also a slippery slope.

krainboltgreene
0 replies
16h54m

Think of "not opting out" as "helping them build a better product"

I feel like someone would only have this opinion if they've never ever dealt with any in the tech industry, or capitalist, in their entire life. So like 8-19 year olds? Except even they seem to understand that the profit absolutist goals undermine everything.

This idea has the same smell as "We're a family" company meetings.

hehdhdjehehegwv
0 replies
15h56m

I for one consider it my duty to bravely sacrifice my privacy to the alter of corporate profit so that the true beauty of LLM trained in emojis and cat gifs can bring humanity to the next epoch.

ericjmorey
0 replies
16h19m

Do I have free access to and use of those models? If not, I don't care to help them.

cess11
0 replies
7h35m

"Best" and "better" is doing a lot of extremely heavy lifting here.

Are you sure you actually want what's hiding under those weasel words?

tifik
4 replies
11h14m

Whats baffling to me is why companies think that when they slap AI on the press release, their customers will suddenly be perfectly fine with them scraping and monetizing all of their data on an industrial scale, without even asking for permission. In a paid service. Where the service is private communication.

cynicalsecurity
2 replies
5h15m

Most people don't care, paid service or not. People are already used to companies stealing and selling their data up and down. Yes, this is absolutely crazy. But was anything substantial done against it before? No, hardly anyone was raising awareness against it. Now we keep reaping what we were sawing. The world keeps sinking deeper and deeper into digital fascism.

pera
1 replies
4h56m

Companies do care: Why would you take additional risk of data leakage for free? In the best case scenario nothing happens but you also don't get anything out of it, in the worst case scenario extremely sensitive data from private chats get exposed and hits your company hard.

ffsm8
0 replies
4h32m

Companies are comprised of people. Some people in some enterprises care. I'd wager that in any company beyond a tiny upstart you'll have people all over the hierarchy that dont care. And some of them will be responsible for toggling that setting... Or not, because they just can't be arsed to with how little they care about the chat histories of the people they'll likely never even going to interact with being used to train some AI.

8338550bff96
0 replies
1h56m

I am not pro-exploiting users' ignorance for their data, but I would counter this with the observation that slapping AI on product suddenly makes people care about the fact that companies are monetizing on their usage data.

Monetizing on user activity data through opt-out collection is not new. Pretending that his phenomenon has anything to do with AI seems like a play for attention that exploits peoples AI fears.

I'll sandwich my comments with a reminder that I am not pro-exploiting users' ignorance for their data.

schneehertz
2 replies
17h45m

We offer Customers a choice around these practices.

I remembered the joke from The Hitchhiker's Guide to the Galaxy, maybe they will have a small hint in a very inconspicuous place, like inserting this into the user agreement on page 300 or so.

m463
0 replies
16h10m

Never more true than with apple.

Activating an iphone for example has a screen devoted to how privacy is important!

It will show you literally thousands of pages of how they take privacy seriously!

(and you can't say NO anywhere in the dialog, they just show you)

They are normalizing "you cannot do anything", and then everyone does it.

bigfudge
0 replies
12h1m

But the plans were on display…” “On display? I eventually had to go down to the cellar to find them.” “That’s the display department.” “With a flashlight.” “Ah, well, the lights had probably gone.” “So had the stairs.” “But look, you found the notice, didn’t you?” “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.

mvkel
2 replies
16h41m

Because it's default opt-in, and most people won't see this announcement.

dheera
1 replies
16h31m

Yep, much like just about every credit card company shares your personal information BY DEFAULT with third parties unless you explicitly opt out (this includes Chase, Amex, Capital One, but likely all others).

hackernewds
0 replies
15h35m

how do you opt out of these? I do share my data with rocket money though since there's no good alternatives :(

clwg
2 replies
16h13m

Because they don't seem to make it easy. It doesn't seem as a individual user I have any say in how my data is used, I have to contact the Workspace Owner. When I do I'll be asking them to look at alternative platforms instead.

"Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed."

hackernewds
1 replies
13h5m

You can always quit your job right? /s

clwg
0 replies
6h11m

I'm the one who picked Slack over a decade ago for chat, so hopefully my opinion still holds weight on the matter.

One of the primary reasons Slack was chosen was because they were a chat company, not an ad company, and we were paying for the service. Under these parameters, what was appropriate to say and exchange on Slack was both informally and formally solidified in various processes.

With this change, beyond just my personal concerns, there are legitimate concerns at a business level that need to be addressed. At this point, it's hard to imagine anything but self-hosted as being a viable path forward. The fact that chat as a technology has devolved into its current form is absolutely maddening.

p1esk
1 replies
17h0m

I’d be surprised if more 1% opt out.

tifik
0 replies
11h10m

I'd be surprised if any legal department in any company with one will not freak the f out when they read this. They will likely loose the biggest customers first, so even if it is 1% of customers, it will likely affect their bottom line enough to give it a second though. I don't see how they might profit from an in-house LLM more than from their enterprise-tier plans.

Their customer support will have a hell of a day today.

zeckalpha
0 replies
16h11m

Opting out in this way may implicitly opt you in to workspace specific models.

sensanaty
0 replies
9h10m

I'm willing to bet that for smaller companies, they just won't care enough to consider this an issue and that's what Slack/Salesforce is hedging on.

I can't see a universe in which large corpos would allow such blatant corporate espionage for a product they pay for no less. But I can already imagine trying to talk my CTO (who is deep into the AI sycophancy) into opting us out is gonna be arduous at best.

nyc_data_geek
30 replies
18h7m

Story time.

I was at a VC conference last year and if I learned nothing else there, I learned how to spell "AI". Every single exhibitor just about had their signage proudly proclaiming their capabilities in this area, but one in particular struck me.

They were touting the API integrations they could offer to train their "Enterprise AI"/LLM, and among those integrations were things like M365, Slack, etc.

It struck me because of the garbage in, garbage out problem. I'd like to think that the amount of shitposting I do on Slack personally will poison that particular well of training data, but this seems to point to a larger problem to me.

LLM's don't have a concept of truth or reality, or awareness of any sort. If the training data they are fed is poorly quality checked/unsanitized by human intelligence, the outputs will be as useless/noisy as the original data set. It feels to me that in the frothy rush to capture market buzz and VC, this is being forgotten.

Am I missing something, here?

leoh
7 replies
18h5m

Yes, consider an existing LLM being given “shitpost-y” messages and asking it if there is anything interesting in there. It could probably summarize it well and that could then be used for training another LLM.

etc etc

nyc_data_geek
6 replies
18h3m

This assumes everything in the training data set is accurate. Sometimes people are wrong, obtuse, sarcastic, etc. LLM's don't have any way of detecting or accounting for this, do they?

That output, then being used to train other LLM's, just creates an ouroboros of AI generated dogshit.

sp332
3 replies
17h42m

LLMs are state-of-the-art at detecting sarcasm. It won't help if the data is just wrong though.

Edit: https://arxiv.org/abs/2312.03706 Human performance on this benchmark (detecting sarcasm in Reddit comments) was 0.82, a BERT-based LLM scored 0.79.

https://arxiv.org/abd/2106.05752 LSTM, 98% at detecting sarcasm in a Project Gutenburg-based dataset.

E39M5S62
1 replies
16h55m

I literally can't tell if you're being sarcastic or not.

rrr_oh_man
0 replies
16h49m

Exactly

comboy
0 replies
16h40m

LLMs are state-of-the-art at detecting sarcasm.

This is such a precious gem.

brookst
0 replies
14h2m

And yet human civilization has survived the fact that many humans are wrong, lying, delusional, etc. There is no assumption that everything in our personal training set is accurate. In fact, things work better when we explicitly reject that idea.

LLMs do not rely on 100% factually accurate inputs. Sure, you’d rather have less BS than more, but this is all statistics. Just like most people realize that flat earthers are nutty, LLMs can ingest falsehoods without reducing output quality (again, subject to statistics)

bongodongobob
0 replies
13h15m

The training data doesn't need to be strictly accurate. If it was, you'd just be programming a deterministic robot. The whole point is the feed it actual human language. Giving it shitposts and sarcasm is literally what makes it good. Think of it like 100 people guessing the number of marbles in a jar. Average their guesses and it will be very close. The training data is the guesses, the inference is the average.

antipaul
7 replies
16h35m

What do you think chatGPT uses as training data?

The whole world’s “sh*tposting”: Reddit, blogs, and the rest of the internet.

But also books and Wikipedia and what not.

You can “smooth” all the crap out via the training procedure.

But even more, Slack can easily filter training data to, say, only posts in high-use channels.

Further, slack has other options: eg, use their customer data only for marginal fine-tuning, for example.

Or, they don’t even know their use case yet - but want to wrap their arms around your data pronto.

rozap
2 replies
15h1m

What makes you think I don't shitpost in the #engineering channel?

And heuristics don't even scratch the surface of the bigger problem where it's trained on people who aren't great at their jobs but type a lot of words on slack about circling back on KPIs.

bee_rider
1 replies
14h7m

I think those types of people are actually shockingly well paid. If slack can make bots to replace them, they’ll print money, right?

blackenedgem
0 replies
10h15m

That's all well and good until something goes down and you need someone knowledgeable to diplomatically shout at a vendor.

moneywoes
2 replies
15h6m

how does the training procedure smooth the garbage out?

thomashop
1 replies
14h36m

Through regularization techniques, data augmentation, loss functions, and gradient optimization, ensuring the model focuses on meaningful patterns and reduces overfitting to noise.

bigfudge
0 replies
12h11m

It’s not obvious how any of those would do anything but better approximate the average of a noisy dataset. RLHF might help, but only if it’s not done by idiots.

nyc_data_geek
0 replies
14h41m

ChatGPT isn't known for it's accuracy though, is it? They coined the term "hallucination" because it is wrong so much.

chatmasta
3 replies
18h4m

Why shouldn’t AI be able to shitpost too? At the very least, and much more importantly, AI should be able to recognize shitposting.

nyc_data_geek
2 replies
18h2m

This is the crux of it, and where I'm wondering if I'm missing something. Can it, today? My understanding is it cannot discern reality from fiction, thus "hallucinations" (a misnomer because it implies awareness, which these probability models lack).

tomrod
0 replies
16h58m

The poorly named hallucinations are creation of ideas from provided prompts, which ideas are not grounded in reality. It isn't the mistaken adjudication of the reality of a provided prompt.

fuchse
0 replies
17h9m

That sounds surprisingly human

williamcotton
1 replies
18h3m

The sheer scale of data on the long tail. Sure, the head is already a trash pile and has been for decades now, but there is plenty of non-monetized information all over the internet that is barely linked to or otherwise discoverable.

krainboltgreene
0 replies
16h56m

It does not matter how hard they try, nothing will rival the CommonCrawl treasure trove except maybe Google's index itself.

beeboobaa3
1 replies
18h6m

More ignored than forgotten.

nyc_data_geek
0 replies
18h5m

Wallpapered over?

wongarsu
0 replies
16h4m

Most shitposting is probably more straightforward to understand than business communication or press releases where realizing what wasn't said often carries more insight than the things that were said.

Of course training an AI model on simple, straightforward and honest data provides good results. That's the essence behind "textbooks is all you need" which lead to the phi LLMs. Those are great small LLMs. But if you want your model to understand the complexity of human communication you have to include it in your training data.

If you subscribe to the idea that to be the very best text completion engine possible you would need to have a perfect understanding of reality itself, how different humans perceive reality differently, and how they choose to communicate about this perception and their interaction with reality, themselves and other humans, then it's not unreasonable to expect that back-propagation would eventually find that optimal representation if given enough data, the right architecture and enough processing power. Or at least come somewhat close. In that paradigm there is no "bad data", only insufficient or badly balanced datasets. Just don't try doing that with a 3B parameter LLM.

swalsh
0 replies
15h56m

Along the same lines, phi-3 is kind of a sign of what you can do if you focus only on high quality data. It seems like while yes, quantity is very important, quality almot matters just as much.

mvkel
0 replies
16h42m

The best LLMs were trained on data from the open internet, which is full of garbage. They still do a pretty good job (granted it has been fine tuned and RLHF'd, but you can do that with Slack data too)

jorisboris
0 replies
16h49m

Same for Reddit or Facebook groups. There's a lot of shitposting there, but absolutely a lot of valuable information if LLMs manage to separate the wheat from the chaff.

bongodongobob
0 replies
13h18m

I think what you're missing is assuming that what an LLM "reads" thinks is a true statement. Shitposting is almost like meta slang. I feel like that's a necessary thing for it to train on to truly understand language. I feel like people underestimate the depth LLMs can pick up on.

IanCal
0 replies
10h18m

The more obvious things are that it's not training llms fully on all channels.

Some quick ideas:

Search and summarize other messages. No new llms training and about mostly linking to existing answers.

Fine tune on your messages, but only customer support messages in the public channel, not "eng-shitpost"

Natural language requests over your company data.

tifik
24 replies
11h20m

Well I really hope this massively blows up in their face when all of Europe goes to work just about now, and then North America in 5-8 hours. Let's see if we have another Helldivers 2 event that makes them do a hard backpedal after losing thousands of large customers that will not under any circumstances take the chance.

I have a friend with a law firm who just called me yesterday for advice as he's thinking about switching to Slack from Teams. I gave him a glowing recommendation because it is literally night and day, but there is no way in hell he takes any chance any sensitive legal discussions leak out through prompt hacking. He might even be liable himself for knowingly using a tool that spells out "we read and reuse your conversations".

p1esk
15 replies
10h50m

But you can opt out, right? So what’s the problem?

Also, is Teams (and other messengers) any different?

troupo
7 replies
10h40m

But you can opt out, right? So what’s the problem?

This thinking is the problem. "Oh, we just added your entire private/privileged/NDA/corporate information to our training set without your consent. What's the problem?"

Opt-out must be the default.

Edit: By "Opt-out must be the default." I mean: no one's data must be included until they explicitly give consent via an opt-in :)

Liquidor
2 replies
10h26m

Opt-out must be the default.

Don't you mean opt-in must be the default?

Or am I misunderstanding the concept of opt-ins :P

troupo
0 replies
10h25m

Opt-in is "I agree to have my data included"

Opt-out is "I don't agree to have my data included"

ellisnguyen
0 replies
10h21m

Opt-out by default = Opt-in.

Opt-in by default = Opt-out

zelphirkalt
1 replies
10h26m

Especially since once it has been trained, it is in the model, and I am not aware of any way anyone has discovered to later remove from the model single or selected training data points, except for re-training/re-learning the model. So basically the crime might already be done.

But I also know that so many businesses are too sluggish to make a switch and employees incapable of understanding the risk. So unfortunately not all of Europe will switch away. But I hope a significant number gives them the middle finger.

bayindirh
0 replies
9h35m

There's something called "machine unlearning" being worked on to address these issues.

This doesn't mean that I support Slack or any opt-in without consent training model. On the contrary. I don't have any OpenAI/Midjourney/etc. account, and don't plan to have one.

falcor84
0 replies
10h25m

Exactly! Allowing access to your data should only be opt-in

ADeerAppeared
0 replies
9h35m

Worth noting: This is a legal requirement in Europe

The GDPR mandates that consent is given affirmatively, with this kind of "oh we put it in the EULA nobody reads" being explicitly called out as non-compliant.

apignotti
2 replies
10h42m

You can opt-out by manually writing an email to them. The process matters.

whatevaa
1 replies
9h35m

They could make it even better, like requiring signed/certified physical mail /s. Or fax...

lproven
0 replies
6h24m

:-D

torginus
0 replies
10h31m

and how does that even work? Slack is a chat app. Does everyone involved in the chat need to opt out for it to be meaningful? What about bots?

rpastuszak
0 replies
10h21m

Defaults matter!

Just look at how much Apple and Mozilla get from Google by having their browser as a default (ca. $20,000,000,000 and $400,000,000 IIRC per annum).

Or look at how many people rejected the tracking prompt displayed for FB when it was added to iOS (+70%).

notachatbot1234
0 replies
10h7m

Do they discard everything processed so far every time someone opts out?

Arisaka1
0 replies
7h35m

The way to opt out is by contacting support,in an era where opt ins and outs should be handled by a toggle button.

Either they don't expect many people to wish to opt their slacks out, or they're aware of the asynchronous friction this introduces and they don't care.

KronisLV
6 replies
9h49m

Personally, I rather liked self-hosted versions of these:

Mattermost: https://mattermost.com/

Rocket.Chat: https://www.rocket.chat/

Nextcloud Talk: https://nextcloud.com/talk/

Out of those, Mattermost was the easiest to setup (just need PostgreSQL and a web server, in addition to the main container), however not being able to easily permanently delete instead of just archiving workspaces was awkward. Nextcloud Talk was very easy to get going if you already have Nextcloud but felt a bit barebones last I checked, whereas Rocket.Chat was overall the more pleasant option to use, although I wasn't the biggest fan of them using MongoDB for storage.

The user experience is pretty good with all of them, however in the groups that I've been a part of, ultimately nobody cared about self-hosting an instance, since most orgs just prefer Teams/Slack (or even Skype for just chatting/meetings) and most informal groups just default to Discord. Oh well.

cowpig
2 replies
6h23m

Surprised you didn't mention zulip: https://zulip.com/

We use it and wouldn't trade it for any of the alternatives.

zelphirkalt
0 replies
5h38m

Also is easy to set up and has Jitsi Meet integration and feels 10x more snappy than Slaaaaack.

KronisLV
0 replies
5h28m

That's a lovely addition, thanks! I'll have to try it out as well at some point.

bayindirh
2 replies
9h26m

The problem is not technical, but social with these platforms.

i.e. How do you convince 40+ people from 5 countries to add yet another memory resident chat application and fragment their knowledge to another app/mental space?

This gets way harder as the community becomes more dynamic and temporary (i.e. high circulation like students). I gave the good fight last year with someone, and they just didn't flex a nanometer citing ergonomics of Slack is way better than alternatives, and didn't care about data mining (was a possibility back then) or keeping older messages at ransom.

KronisLV
1 replies
9h13m

i.e. How do you convince 40+ people from 5 countries to add yet another memory resident chat application and fragment their knowledge to another app/mental space?

If it's a company, you can just be like: "Hey, we use this platform for communication, you can log in with your Active Directory credentials."

It also has the added benefit of acting as a directory for every employee in the company, so getting in touch can be more convenient than e-mail (while you can also customize the notification preferences, so it doesn't get too spammy), as opposed to the situation which might develop, where some teams or org units are on Slack, others on Teams and getting in touch can be more messy.

If it's a free-form social group, then you can throw that idea away because of network effects, it'd be an uphill battle, same as how sometimes people complain about people using Discord for various communities, but at the same time the reality is that old school forums and such were also killed off - since most people already have a Discord account and there's less friction to just use that.

Either way, I'm happy that self-hosted software like that exists.

bayindirh
0 replies
8h56m

If it's a company

That's a big if, and the answer is "No" in my case. If it was, that comment wouldn't be there.

It's not a "social group" either, but a group of independent institutions working together. It's like a large gear-train. A lot of connections between small islands of people. So you have to work together, and have to find a way somehow. So, it's complicated.

Either way, I'm happy that self-hosted software like that exists.

Me too. I happen to manage a Nextcloud instance, but nobody is interested in the "Talk" module.

tifik
0 replies
8h25m

Yeah my lawyer friend is worried he might even lose his license over this. It's gonna be very interesting seeing how legal departments react to this.

If you disagree with practices like this, mention this to your legal.

paxys
13 replies
18h53m

Data will not leak across workspaces.

If you want to exclude your Customer Data from helping train Slack global models, you can opt out.

I don't understand how both these statements can be true. If they are using your data to train models used across workspaces then it WILL leak. If they aren't then why do they need an opt out?

Edit: reading through the examples of AI use at the bottom of the page (search results, emoji suggestions, autocomplete), my guess is this policy was put in place a decade ago and doesn't have anything to do with LLMs.

Another edit: From https://slack.com/help/articles/28310650165907-Security-for-...

Customer data is never used to train large language models (LLMs).

So yeah, sounds like a nothingburger.

whimsicalism
8 replies
18h48m

They're saying they won't train generative models that will literally regurgitate your text, my guess is classifiers are fair game in their interpretation

swatcoder
4 replies
18h43m

You are assuming they're saying that, because it's one charitable interpretation of what they're saying.

But they haven't actually said that. It also happens that people say things based on faulty or disputed beliefs of their own, or people willfully misrprepresent things, etc

Until they actually do say something as explicit as what you suggest, they haven't said anything of the sort.

whimsicalism
3 replies
18h39m

Data will not leak across workspaces. For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of Customer Data.

I feel like that is explicitly what this is saying.

zmmmmm
2 replies
18h16m

The problem is, it's really really hard to guarantee that.

Yes if they only train say, classifiers, then the only thing that can leak is the classification outcome. But these things can be super subtle. Even a classifier could leak things if you can hack the context fed into it. They are really playing with fire here.

whimsicalism
0 replies
17h43m

yes, i certainly agree with you. i think oftentimes these policies are written by non-technical people

i'm not entirely convinced that classifiers and LLMs are disjoint to begin with

BHSPitMonkey
0 replies
11h36m

If is also hard to guarantee that, in a multi-tenant application, users will never see other users' data due to causes like mistakes AuthZ logic, caching gone awry, or other unpredictable situations that come up in distributed systems—yet even before the AI craze we were all happy to use these SaaS products anyway. Maybe this class of vulnerability is indeed harder to tame than most, but third-party software has never been without risks.

btown
2 replies
17h29m

The OP privacy policy explicitly states that autocompletion algorithms are part of the scope. "Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions."

And this can leak: for instance, typing "a good business partner for foobars is" might not send that text upstream per se, but would be consulting a local model whose training data would have contained conversations that other Slack users are having about brands that provide foobars. How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token? And sure, one could build an adversarial model that attempts to minimize this kind of leakage, but is Slack incentivized to create such a thing vs. just building an optimal autocomplete as quickly as possible?

Even if it were just creating classifiers, similar leakages could occur there, albeit requiring more effort and time from attackers to extract actionable data.

I can't blame Slack for wanting to improve their product, but I'd also encourage any users with proprietary conversations to encourage their admins to opt out as soon as possible.

yorwba
0 replies
9h44m

How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token?

This is explained literally in the next sentence after the one you quoted: "We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm."

If all the global model sees is {similarity: 0.93, past_interactions: 6, recommendation_accepted: true} then there is no way to leak tokens, because not only are the tokens not part of the output, they're not even part of the input. But such a simple model could still be very useful for sorting the best autocomplete result to the top.

whimsicalism
0 replies
17h9m

yeah i absolutely agree that even classifiers can leak, and the autocorrect thing sounds like i was wrong about generative (it sounds like an n-gram setup?)... although they also say they don't train LLMs (what is an n-gram? still an LM, not large... i guess?)

next_xibalba
1 replies
18h15m

This reminds me of a company called C3.ai which claims in its advertising to eliminate hallucations using any LLM. OpenAI, Mistral, and others at the forefront of this field can't manage this, but a wrapper can?? Hmm...

BChass
0 replies
17h53m

Ah yes, the stock everyone believed in and thought it would reach the moon during 2020.

fallingsquirrel
0 replies
18h48m

There are no data leaks in Ba Sing Se.

berniedurfee
0 replies
5h34m

Right? I take Slack and Salesforce at their word. They’re good companies and look out for the best interests of their customers. They have my complete trust.

kepano
12 replies
17h49m

In summary, you must opt-out if you want to exclude your data from global models.

Incredibly confusing language since they also vaguely state that "data will not leak across workspaces".

Use tools that cannot leak data not "will not".

mvkel
5 replies
16h39m

Most, if not all SaaS software is multi-tenant, so we've been living in the "will not" world for decades now.

kepano
2 replies
16h2m

That's exactly my point. "File over app"[1] is just as relevant for businesses as it is for individuals — if you don't want your data to be used for training, then take sovereignty of it.

[1] https://stephango.com/file-over-app

lxgr
1 replies
3h0m

"File over app" is a good way of putting it!

Something strange is happening on your blog, fwiw: Bookmarking it via command + D flips the color scheme to "night mode" – is that intentional?

kepano
0 replies
2h21m

Ah good catch. Yes the "D" key can be used to switch light/dark mode, but I didn't account for the bookmark shortcut. That should be fixed now. Thanks!

zeckalpha
1 replies
16h10m

In your experience.

mvkel
0 replies
3h7m

The SaaS business model breaks if you go single tenant except in Fortune 500 enterprise

creativeSlumber
3 replies
17h30m

what is the difference between "will not" and "cannot" in legalese?

glennericksen
1 replies
17h15m

"Will not" allows the existence of a bridge but it's not on your route and you say you're not going to go over it. "Cannot" is the absence of a bridge or the ability to cross it.

gketuma
0 replies
14h10m

Wow, well explained.

purplejacket
0 replies
16h16m

My off leash dog will not bite you, he is well behaved. My dog at home cannot bite you, he is too far away.

semitones
1 replies
17h18m

Aren't all tools, essentially, just one API call away from "leaking data"?

kepano
0 replies
16h5m

In this case they mean leak into the global model — so no. You can have sovereignty of your data if you use an open protocol like IRC or Matrix, or a self-hosted tool like Zulip, Mattermost, Rocket Chat, etc

tikkun
11 replies
19h23m

Eugh. Has anyone compiled a list of companies that do this, so I can avoid them? If anyone knows of other companies training on customer data without an easy highly visible toggle opt out, please comment them below.

paxys
3 replies
18h49m

It would be easier to compile a list of companies that don't do this.

The list:

hosteur
1 replies
17h53m

My company does not do this and have no plans to do such a thing.

berniedurfee
0 replies
5h41m

Lol, my favorite corpo-speak.

I’m not eating a steak and have no plans to eat a steak. Ask again tomorrow.

internetter
0 replies
18h5m

Nonsense. There are plenty of companies that don't have shit policies like this. A vast majority, even. Stop normalizing it.

ncr100
0 replies
16h10m

And the penalty is unnoticeable to these companies.

goles
1 replies
18h54m

Synology updated this policy back in March (Happened to be a Friday afternoon).

Services Data Collection Disclosure

"Synology only uses the information we obtain from technical support requests to resolve your issue. After removing your personal information, we may use some of the technical details to generate bug reports if the problem was previously unknown to implement a solution for our products."

"Synology utilizes the information gathered through technical support requests exclusively for issue resolution purposes. Following the removal of personal data, certain technical details may be utilized for generating bug reports, especially for previously unidentified problems, aimed at implementing solutions for our product line. Additionally, Synology may transmit anonymized technical information to Microsoft Azure and leverage its OpenAI services to enhance the overall technical support experience. Synology will ensure that personally identifiable information, such as names, phone numbers, addresses, email addresses, IP addresses and product serial numbers, is excluded from this process."

I used to just delete privacy policy update emails and the like but now I make a habit of going in to diff them to see if these have been slipped in.

bn-l
0 replies
16h36m

Like the other poster it would be great to have a name and shame site that lists companies training on customer data

weikju
0 replies
19h22m

*

ceruleanseas
0 replies
19h3m

We can fight back by not posting anything useful or accurate to the internet until there are protections in place and each person gets to decide how their data is used and whether they are compensated for it.

berniedurfee
0 replies
5h44m

*

pyromaker
11 replies
18h2m

Our mission is to build a product that makes work life simpler, more pleasant and more productive.

I know it would be impossible but I wish we go back to the days when we didn't have Slack (or tools alike). Our Slack is a cesspool of people complaining, talking behind other people's backs, echo chamber of negativity etc.

That probably speaks more to the overall culture of the company, but Slack certainly doesn't help.You can also say "tool is not the problem, people are" - sure, we can always explain things away, but Slack certainly plays a role here.

rjh29
2 replies
17h20m

That probably speaks more to the overall culture of the company

Yep. Fun fact, my last workplace had a fairly nontoxic Slack... but there was a whole second Slack dedicated to bitching and shitposting where the bosses weren't invited. Humans gonna human.

grepfru_it
1 replies
17h11m

Was not limited to just the bosses who were not invited. If you weren’t in the cool club you also did not get an invite.

A very inclusive company on paper that was very exclusionary behind the scenes.

eggdaft
0 replies
12h12m

What happened when someone from the cool club got promoted and became a boss?

matthewmacleod
1 replies
17h59m

No, I don’t think Slack does play a role in this. It is quite literally a communication tool (and I’d argue one that encourages far _more_ open communication than others).

If Slack is a cesspool, that’s because your company culture is a cesspool.

grob-gambit
0 replies
6h5m

I think open communication in a toxic environment can obviously amplify toxicity or at least less open communication can act as a damper on toxicity.

Slack is surely not the generator of toxicity but it seems obvious it could act at increasing the bandwidth.

You can't have it both ways.

barkbyte
1 replies
16h55m

Your company sucks. I’ve used slack at four workplaces and it’s not been at all like that. A previous company had mailing lists and they were toxic as you describe. The tool was not the issue.

muglug
0 replies
14h3m

Yeah, written communication is harder than in-person communication.

It’s easy to come across poorly in writing, but that issue has no easy resolution unless you’re prepared to ban Slack, email, and any other text-based communication system between employees.

Slack can sometimes be a place for people who don’t feel heard in conventional spaces to vent — but that’s an organisational problem, not a Slack problem.

zemo
0 replies
12h49m

HN isn't really a bastion of media literacy or tech criticism. If you ever ask "does [some technology] affect [something qualitative] about [anything]", the response on hn is always going to be "technology isn't responsible, it's how the technology is used that is responsible!", asserting, over and over again, that technology is always neutral.

The idea that the mechanism of how people communicate affects what people communicate is a pretty foundational concept in media studies (a topic which is generally met with a hostile audience on HN). Slack almost certainly does play a role, but people who work in technology are incentivized to believe that technology does not affect people's behaviors, because that belief allows people who work in technology to be free of any and all qualitative or moral judgements on any grounds; the assertion that technology does not play a role is something that technology workers cling to because it absolves them of all guilt in all situations, and makes them, above all else, innocent in every situation. On the specific concept of a medium of communication affecting what is being communicated, McLuhan took these ideas to such an extreme that it's almost ludicrous, but he still had some pretty interesting observations worth thinking on, and his writing on this topic is some of the earlier work. This is generally the place where people first look, because much of the other work assumes you've understood McLuhan's work in advance. https://en.wikipedia.org/wiki/Understanding_Media

vasco
0 replies
17h51m

I disagree slack plays a role. You only mentioned human aspects, nothing to do with technology. There was always going to be instant messaging as software once computers and networks were invented. You'd just say this happens over email and blame email.

userbinator
0 replies
16h19m

Switch to Teams instead.

Only half-kidding, but it's an application which is so repulsive it seems to discourage people from communicating at all.

rcaught
0 replies
16h18m

Keyboard warriors

hex4def6
11 replies
19h18m

I'm confused about this statement: "When developing AI/ML models or otherwise analyzing Customer Data, Slack can’t access the underlying content. We have various technical measures preventing this from occurring"

"Can't" is a strong word. I'm curious how an AI model could access data, but Slack, Inc itself couldn't. I suspect they mean "doesn't" instead of "can't", unless I'm missing something.

EGreg
5 replies
19h8m

Every company that promises "end-to-end encryption" is just pinky-swearing to you also. Like Telegram or WhatsApp

int_19h
2 replies
15h20m

Telegram client is open source, so you can see what exactly happens there when you enable E2EE.

EGreg
1 replies
13h20m

Reproducible builds somehow ?

LelouBil
1 replies
18h17m

Yeah, at least if the client is open source you could verify.

8372049
0 replies
3h3m

"If you can read assembly, all programs are open source."

Sure, it's easier and less effort if the program is actually open source, but it's absolutely still possible to verify on bytecode, decompiled or disassembled programs, too.

015a
2 replies
18h47m

I also find the word "Slack" in that interesting. I assume they mean "employees of Slack", but the word "Slack" obviously means all the company's assets and agents, systems, computers, servers, AI models, etc.

I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic. Like, when I heard the word "can't" my brain goes to: there is nothing anyone in the company could do, within the bounds of the law, to do this. Employees at Slack could turn off the technical measures preventing this from occurring. Employees at Signal could push an app update which side-channels all messages through to a different server, unencrypted.

Better phrasing is "Employees of Slack will not access the underlying content".

throwaway22032
0 replies
17h40m

Interestingly I'd probably go the other way.

If it's verifiably E2EE then I consider "we can't access this" to be a fairly powerful statement. Sure, the source could change, but if you have a reasonable distribution mechanism (e.g. all users get the same code, verifiably reproducible) then that's about as good as you can get.

Privacy policies that state "we won't do XYZ" have literally zero value to me to the extent that I don't even look at them. If I give you some data, it's already leaked in my mind, it's just a matter of time.

gbalduzzi
0 replies
7h54m

I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic.

I don't really agree with this statement. Signal literally can't read user data right now. The statement is true, why can't they use it?

If they can't use it, nobody can. there are no services that can't publish an update reversing any security measure available. Also doing that would be illegal, because it would render the statement "we can't access our users content" false.

In Slack case, it is totally different. Data is accessible by Slack systems, the statement "we can't access our users content" is already false. Probably what they mean is something along the lines of: "The data can't be accessed by our systems, but we have measures in place that block the access to most of our employees"

spywaregorilla
0 replies
18h58m

From their white paper linked in the same comment

Provisioning To minimize the risk of data exposure, Slack adheres to the principles of least privilege and role-based permissions when provisioning access—workers are only authorized to access data that they reasonably must handle in order to fulfill their current job responsibilities. All production access is reviewed at least quarterly.

so... seems like they very clearly can.

r_klancer
0 replies
4h45m

As an engineer who has worked on systems that handle sensitive data, it seems straightforwardly to me to be a statement about:

1. ACLs

2. The systems that provision those ACLs

3. The policies that determine the rules those systems follow.

In other words, the model training batch job might run as a system user that has access to data annotated as 'interactions' (at timestamp T1 user U1 joined channel C1, at timestamp T2 user U2 ran a query that got 137 results), but no access to data annotated as 'content', like (certainly) message text or (probably) the text of users' queries. An RPC from the training job attempting to retrieve such content would be denied, just the same as if somebody tried to access someone else's DMs without being logged in as them.

As a general rule in a big company, you the engineer or product manager don't get to decide what the ACLs will look like no matter how much you might feel like it. You request access for your batch job from some kind of system that provisions it. In turn the humans who decide how that system work obey the policies set out by the company.

It's not unlike a bank teller who handles your account number. You generally trust them not to transfer your money to their personal account on the sly while they're tapping away at the terminal--not necessarily because they're law abiding citizens who want to keep their job, but because the bank doesn't make it possible and/or would find out. (A mom and pop bank might not be able to make the same guarantee, but Bank of America does.) [*]

In the same vein, this is a statement that their system doesn't make it possible for some Slack PM to jack their team's OKRs by secretly training on customer data that other teams don't use, just because that particular PM felt like ignoring the policy.

[*] Not a perfect analogy, because a bank teller is like a Slack customer service agent who might, presumably after asking for your consent, be able to access messages on your behalf. But in practice I doubt there's a way for an employee to use their personal, probably very time-limited access to funnel that data to a model training job. And at a certain level of maturity a company (hopefully) also no longer makes it possible for a human employee to train a model in a random notebook using whatever personal data access they have been granted and then deploy that same model to prod. Startups might work that way, though.

xyst
8 replies
16h44m

The gold rush for data is wild. Private companies selling us out.

- Slack

- Discord

- Reddit

- Stackoverflow

Let’s just hope this data gold rush dies out faster than the web3 craze before OpenAI reaches critical mass and gets access to government server farms.

Alphabet boys have server farms of domestic and foreign surveillance and intelligence. Exabytes of data [1]

[1] https://en.m.wikipedia.org/wiki/Utah_Data_Center

ilrwbwrkhv
4 replies
16h39m

I mean slack is sold off. Founders made money. For all intents and purposes it's dead software.

whywhywhywhy
0 replies
4h48m

The data it has is incredibly valuable if they build their own product or sell it off. You essentially have org charts of entire companies and people asking for things and getting responses and working back and forth together long term.

In terms of building agents for doing real work this could be more valuable than things like Reddit.

smileysteve
0 replies
15h35m

It's owned by Salesforce;n if they stop growing it, it'll go the way of Heroku - and that breach didn't go well.

eggdaft
0 replies
12h14m

I actually think Slack is great and it has improved over the last 12 months.

Liquix
0 replies
15h44m

proprietary software that no one who cares about privacy or security should use? absolutely. dead? not exactly

BHSPitMonkey
0 replies
11h51m

That thread was a mischaracterization and a misunderstanding. The toggle simply exposed UI entry points to AI integrations that users could then opt to use, with consent.

TechDebtDevin
0 replies
11h50m

Meh, tbh I think these guys live in a bit of a dream world about how much their data is worth. While investors and corporate partners will rush to these companies for their data, I'm not really convinced random internet conversations are going to push anything forward but let them sell shovels. Most of the miners always go broke.

donfotto
5 replies
7h12m

Emoji suggestion: Slack might suggest emoji reactions to messages using the content and sentiment of the message, the historic usage of the emoji and the frequency of use of the emoji in the team in various contexts. For instance, if [PARTY EMOJI] is a common reaction to celebratory messages in a particular channel, we will suggest that users react to new, similarly positive messages with [PARTY EMOJI].

Finally someone has figured out a sensible application for "AI". This is the future. Soon "AI" will have a similar connotation as "NFT".

apwell23
2 replies
7h8m

"leadership" at my company tallies emoji reactions to their shitty slack messages and not reacting with emojies over a period of time is considered a slight against them.

I had to up my slack emoji game after joining my current employer

berniedurfee
0 replies
5h38m

Yikes! Thats some pretty heavy insecurity signals from leadership. Please like us! Sad.

shultays
0 replies
6h18m

Finally. I am all for this AI if it is going to learn and suggest my passive aggressive "here" emoji that I use when someone @here s on a public channel with hundreds of people for no good reason.

latexr
0 replies
7h6m

And it continues:

To do this while protecting Customer Data, we might use an external model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.

This is so stupid and needlessly complicated. And all it does is remove personality from messages, suggesting everyone conforms to the same reactions.

tylerrobinson
4 replies
19h22m

If you want to exclude your Customer Data from helping train Slack global models, you can opt out.

Well gee whiz, you want to help your old pal Slack, don’t you?

It’s such a slap in the face. And the opt-out is only available by email where more friction could be introduced.

nyc_data_geek
1 replies
18h25m

So slimy that this isn't instead an opt-in!

padolsey
0 replies
18h17m

They know, as well, that opt-in simply wouldn’t give them the scale they’d need for meaningful training data. They’re being very intentionally self interested and unconcerned with their customers best interests.

ttul
0 replies
18h21m

Real change requires legislation.

bionhoward
0 replies
19h11m

“Do not access the Services in order to build a similar or competitive product or service or copy any ideas, features, functions, or graphics of the Services;” https://slack.com/acceptable-use-policy

I wonder if OpenAI wishes they could train ChatGPT on their corporate chat history? How ironic ? Or they don’t care?

light_hue_1
4 replies
18h31m

This is the final push we needed to move to Discord. Bye slack. We won't miss you.

noman-land
2 replies
18h21m

Discord is not a better option than Slack. They are basically the same thing. Matrix is a better option from a privacy standpoint, just not from a UX one.

FujiApple
1 replies
18h7m

I recently tried Zulip [1] again after a few years and the UX is much improved on web and mobile, worth a look (it is OSS and you can self host).

[1] https://zulip.com/

noman-land
0 replies
18h1m

Thanks! I'll check it out.

Shekelphile
0 replies
15h57m

It is a bold assumption to think that discord hasn't always been using harvested text/voice/video data to train models.

dfcarney
4 replies
18h5m

In case this is helpful to anyone else, I opted out earlier today with an email to feedback@slack.com

Subject: Slack Global Model opt-out request.

Body:

<my workspace>.slack.com

Please opt the above Slack Workspace out of training of Slack Global Models.

noman-land
3 replies
17h58m

Make sure you put a period at the end of the subject line. Their quoted text includes a period at the end.

Please also scold them for behaving unethically and perhaps breaking the law.

jgalt212
0 replies
17h50m

We just opted out. I told them our lawyers have been instructed to watch them like a hawk.

drcongo
0 replies
8h33m

The period is outside the quotes though, are you suggesting we should have the quotes too?

dfcarney
0 replies
17h52m

Updated!

chefandy
4 replies
16h7m

I wonder how many people that are really mad about these guys or SE using their professional output to train models thought commercial artists were just being whiny sore losers when Deviant Art, Adobe, OpenAI, Stability, et al did it to them.

Liquix
3 replies
15h50m

squarely in the former camp. there's something deeply abhorrent about creating a place that encourages people to share and build and collaborate, then turning around and using their creative output to put more money in shareholder pockets.

i deleted my reddit and github accounts when they decided the millions of dollars per month they're receiving from their users wasn't enough. don't have the power to move our shop off slack but rest assured many will as a result of this announcement.

chefandy
2 replies
15h35m

Yeah I haven't put a new codebease on GH in years. It's kind of a PITA hosting my own gitea server for personal projects but letting MS copy my work to help make my professional skillset less valuable is far less palatable.

Companies doing this would make me much less angry if they used an opt-in model only for future data. I didn't have a crystal ball and I don't have a time machine, so I simply can't stop these companies from using my work for their gain.

8372049
1 replies
2h43m

Why do you think it's a pain to host the Gitea?

chefandy
0 replies
2h1m

Compared to hosting other things? Nothing! It's great.

Hosting my own service rather than using a free SaaS solution that is entirely someone else's problem? There's a significant difference there. I've been running Linux servers either professionally or personally for almost 25 years, so it's not like it's a giant problem... but my work has been increasingly non-technical over the past 5 years or so, so even minor hiccups require re-acclimating myself to the requisite constructs and tools (wait, how do cron time patterns work? How do I test a variable in bash for this one-liner? How do iptables rules work again?)

It's not a deal breaker, but given the context, it's definitely not not a pain in the ass, either.

WhiteNoiz3
4 replies
7h28m

To add some nuance to this conversation, what they are using this for is Channel recommendations, Search results, Autocomplete, and Emoji suggestion and the model(s) they train are specific to your workspace (not shared between workspaces). All of which seem like they could be handled fairly privately using some sort of vector (embeddings) search.

I am not defending Slack, and I can think of number of cases where training on slack messages could go very badly (ie, exposing private conversations, data leakage between workspaces, etc), but I think it helps to understand the context before reacting. Personally, I do think we need better controls over how our data is used and slack should be able to do better than "Email us to opt out".

wolfwyrd
1 replies
6h31m

The way it's written means this just isn't the case. They _MAY_ use it for what you have mentioned above. They explicitly say "...here are a few examples of improvements..." and "How Slack may use Customer Data" (emph mine). They also... may not? And use it for completely different things that can expose who knows what via prompt hacking.

WhiteNoiz3
0 replies
3h52m

Agreed, and that is my concern as well that if people get too comfortable with it then companies will keep pushing the bounds of what is acceptable. We will need companies to be transparent about ALL the things they are using our data for.

JackC
1 replies
4h58m

the model(s) they train are specific to your workspace (not shared between workspaces)

That's incorrect -- they're stating that they use your "messages, content, and files" to train "global models" that are used across workspaces.

They're also stating that they ensure no private information can leak from workspace to workspace in this way. It's up to you if you're comfortable with that.

WhiteNoiz3
0 replies
3h53m

From the wording, it sounds like they are conscious of the potential for data leakage and have taken steps to avoid it. It really depends on how they are applying AI/ML. It can be done in a private way if you are thoughtful about how you do it. For example:

Their channel recommendations: "We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data"

Meaning they use a non-slack trained model to generate embeddings for search. Then they apply a recommender system (which is mostly ML not an LLM). This sounds like it can be kept private.

Search results: "We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy" Again, this is probably a combination of non-slack trained embeddings with machine learning algos based on engagement. This sounds like it can be kept private and team specific.

autocomplete: "These suggestions are local and sourced from common public message phrases in the user’s workspace." I would be concerned about private messages being leaked via autocomplete, but if it's based on public messages specific to your team, that should be ok?

Emoji suggestions: "using the content and sentiment of the message, the historic usage of the emoji [in your team]" Again, it sounds like they are using models for sentiment analysis (which they probably didn't train themselves and even if they did, don't really leak any training data) and some ML or other algos to pick common emojis specific to your team.

To me these are all standard applications of NLP / ML that have been around for a long time.

Rebuff5007
4 replies
18h9m

How can anyone in their right mind think building AI for emoji selection is a remotely good use of time...

nextworddev
1 replies
16h37m

it's just a justification for collecting tokens

TechDebtDevin
0 replies
11h47m

Tokens (outside of a few trillion ) are worthless imo, I think OAI has pushed that limit, let the others chase them with billions into the ocean of useless conversational data and drown.

dlandis
0 replies
17h31m

"These types of thoughtful personalizations and improvements are only possible if we study and understand how our users interact with Slack."

LOL

barkbyte
0 replies
16h51m

I’d use that, at work. It would be a welcome improvement to their product.

theyinwhy
3 replies
12h10m

Good we moved to matrix already. I just hope they start putting more emphasis on Element X, which message handling is broken on iOS for weeks now.

Arathorn
2 replies
11h29m

Element X is where all the effort is going, and should be working really well. How is msg handling broken?

theyinwhy
0 replies
6h16m

I need to go back to the overview whenever I receive a new message, as the reply form is broken after each message received.

drcongo
0 replies
8h35m

Not the OP here, but I've tried really hard to use Element X and it crashes constantly.

ramijames
3 replies
18h43m

I bet Discord is next.

reportgunner
1 replies
8h29m

I regularly get ads or content on tiktok based on what I discuss in DMs on discord. It takes about an hour or sometimes even less.

nurple
0 replies
2h15m

Same, but on YouTube.

herpdyderp
0 replies
18h11m

I bet Discord is already doing it.

r_thambapillai
3 replies
17h28m

The incentive for first party tool providers to do this is going to be huge, whether its Slack, Google, Microsoft, or really any other SaaS tool. Ultimately, if business want to avoid getting commoditized by their vendors, they need be in control of their data, and their AI strategy. And that probably ultimately means turning off all of these small-utility-very-expensive-and-might-ruin-your-business features, and actually creating a centralized, access controlled, well governed knowledge base which you can plug any open source or black box LLM, from any provider.

bn-l
1 replies
16h38m

It’s definitely a moral hazard (/opportunity). As a reminder, by default on windows 11 Microsoft syncs your files to their server.

Liquix
0 replies
15h39m

all your files? no way that cozy of a blanket statement can be true. if you kept cycling in drives full of /dev/random you could fill up M$ servers with petabytes of junk? sounds like an appealing weekend project

jonnycomputer
0 replies
16h59m

"commoditized by their vendors" is exactly the phrase I was looking for. It's why I wanted my co to self-host Mattermost instead of using Slack.

icoe
3 replies
19h0m

Not to be glib, but this why we built Tonic Textual (www.tonic.ai/textual). It’s both very challenging and very important to protect data in training workflows. We designed Textual to make it easy to both redact sensitive data and replace it with contextually relevant synthetic data.

Ephil012
2 replies
18h45m

To add on to this: I think it should be mentioned that Slack says they'll prevent data leakage across workspaces in their model, but don't explain how they do this. They don't seem to go into any detail about their data safeguards and how they're excluding sensitive info from training. Textual is good for this purpose since it redacts PII thus preventing it from being leaked by the trained model.

Disclaimer: I work at Tonic

a2128
1 replies
15h41m

How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "11 spices - mix with 2 cups of white flour ... 2/3 teaspoons of salt, 1/2 teaspoons of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 70 years

icoe
0 replies
2m

Fair question, but you have to consider the realistic alternatives. For most of our customers inaction isn't an option. The combination of NER models + synthesis LLMs actually handles these types of cases fairly well. I put your comment into our web app and this was the output:

How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "17 spices - mix with 2lbs of white flour ... half teaspoon of salt, 1 tablespoon of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 75 years.

blhack
3 replies
18h47m

How could this possibly comply with European "right to be forgotten" legislation? In fact, how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).

whimsicalism
1 replies
18h42m

how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).

I don't believe that is the current interpretation of GDPR, etc. - if the model is trained, it doesn't have to be deleted due to a RTBF request afaik. there is significant legal uncertainty here

Recent GDPR court decisions mean that this is probably still non-compliant due to the fact that it is opt-out rather than opt-in. Likely they are just filtering out all data produced in the EEA.

8372049
0 replies
2h56m

Likely they are just filtering out all data produced in the EEA.

Likely they are just hoping to not get caught and/or consider it cost of doing business. GDPR has truly shown us (as if we didn't already know) that compliance must be enforced.

beefnugs
0 replies
17h34m

This "ai" scam going on now is the ultimate convoluted process to hide sooo much tomfuckery: theres no such thing as copyright anymore! this isn't stealing anything, its transforming it! you must opt out before we train our model on the entire internet! (and we still won't spits in our face) this isn't going to reduce any jobs at all! (every company on earth fires 15% of everyone immediately) you must return to office immediately or be fired! (so we get more car data teehee) this one weird trick will turn you into the ultimate productive programmer! (but we will be selling it to individuals not really making profitable products with it ourselves)

and finally the most aggregious and dangerous: censorship at the lowest level of information before it can ever get anywhere near peoples fingertips or eyeballs.

paulv
2 replies
14h56m

It seems like we've entered an era where not only are you paying for software with money, you're also paying for software with your data, privacy implications be damned. I would love to see people picking f/oss instead.

eggdaft
1 replies
12h7m

Problems with f/oss for business applications:

1. Great UX folks almost never work for free. So the UX of nearly all OSS is awful.

2. Great software comes from a close connection to users. When your software is an OS kernel that works just fine for programmers, but how many OSS folks want to spend their free time on zoom talking to hundreds of businesses and understanding their needs, so they can give them free software?

See also: year of Linux desktop

AlexandrB
0 replies
7h16m

The good news for FOSS is that the UX of most commercial software is also awful and generally getting worse. The bad news is that FOSS software is copying a lot of the same UX trends.

matt3210
2 replies
17h59m

Consent should be opt-in not opt-out. Yes means yes!

noman-land
1 replies
17h34m

Can we start going a step further by demanding that consent must be opt-in and not opt-out? Requesting isn't good enough.

8372049
0 replies
2h36m

GDPR got that covered, now it just needs to become global, and enforced.

jonnycomputer
2 replies
17h6m

Is this new? As in, when was this policy developed?

hehdhdjehehegwv
2 replies
15h54m

I long ago replaced Slack with Signal chat rooms. You can set an auto delete and it’s all secure to start with. Also, free.

hyping9
1 replies
5h7m

Cool. Now do it for a 3,000 person org with varied tech skills and patience.

hehdhdjehehegwv
0 replies
2h51m

There is no way a chat room with 3,000 people is remotely useful to anybody.

budududuroiu
2 replies
18h20m

We offer customers a choice around these practices

If you’re so customer-first, and make it so easy to opt out, just make it opt-in instead. Oh wait, you’re just a lying pathetic corpo

hu3
1 replies
18h14m

Sadly, even Firefox is using opt-out these days. I feel we're going downhill with regards to privacy.

https://news.ycombinator.com/item?id=40355982

What Firefox’s search data collection means for you

We understand that any new data collection might spark some questions. Simply put, this new method only categorizes the websites that show up in your searches — not the specifics of what you’re personally looking up.

Sensitive topics, like searching for particular health care services, are categorized only under broad terms like health or society. Your search activities are handled with the same level of confidentiality as all other data regardless of any local laws surrounding certain health services.

Remember, you can always opt out of sending any technical or usage data to Firefox. Here’s a step-by-step guide on how to adjust your settings. We also don’t collect category data when you use Private Browsing mode on Firefox.

As far as user experience goes, you won’t see any visible changes in your browsing. Our new approach to data will just enable us to better refine our product features and offerings in ways that matter to you.
budududuroiu
0 replies
18h3m

Yeah these statements are always so benevolent, with the patronising undertone of “this is for your own good, for a better experience”.

SMH

willmadden
1 replies
16h14m

They are breaking a lot of laws by doing this. There are plenty of regulated industries in healthcare and financial services using slack.

bostik
0 replies
1h4m

Indeed. I flagged this internally the first thing this morning, and fully expect to field questions from our clients over the next couple of weeks as the news percolates through to upper echelons.

I can tolerate Slack and/or Salesforce at large building per-customer overlays on top of a generic LLM. Those at least can provide actual business value[ß], and give their AI teams something to experiment on. But feeding gazillion companies' internal (and workspace-joined!) chats to a global model? Hell no.

Unsurprisingly, we opted out a few hours ago.

ß: a smart, context-aware autocomplete for those who need to type a lot on their phones would not be a bad idea. The current generation of autocorrupt is obnoxious.

ptman
1 replies
10h13m

We really need to start using self-hosted solutions. Like matrix / element for team messaging.

It's ok not wanting to run your own hardware at your own premises. But the solution is to run a solution that is end-to-end encrypted so that the hosting service cannot get at the data. cryptpad.fr is another great piece of software.

neop1x
0 replies
9h21m

Zulip (https://zulip.com/) seems to be a great self-hosted python-based alternative to Slack/Teams.

nomad-nigiri
1 replies
19h5m

Isn’t Salesforce’s primary value proposition trust?

tbdfm
0 replies
17h55m

Salesforce's primary value proposition is that software quality doesn't matter.

musha68k
1 replies
10h16m

This is as systemically concerning as the data practices seen on Discord with integrations like statbot.net, though at least Slack is being transparent about it. Regardless, I find all of this highly problematic.

8372049
0 replies
2h37m

Disord's TOS used to say "we may sell all your convos, including your private ones". Then some time later, they suddenly they changed it to noooo, we would never sell aaanything, and didn't even update the "last changed" date. I deleted my Discord account and stopped using them immediately after I noticed the TOS, but them sneakily trying to cover it up later completely ruined any lingering trust I might have had in them.

And this is just one of many, many problems associated with the platform.

blackeyeblitzar
1 replies
18h35m

I can’t believe Slack added a bunch of AI features, without having admins opt into enabling them, and then put out a policy that requests that you send an email to have an opt out from your data being used for training. All of this should be opt-in and should respect the administrator’s prerogative. Very irresponsible for Salesforce (parent company) and I’ll be reconsidering if we continue using them, if this is the low trust way in which they will operate. We don’t have time to keep policing these things.

paxys
0 replies
18h27m

I would suggest not getting your knowledge of the world from the titles of upvoted HN posts and instead do your own research.

OG_BeerMe
1 replies
7h32m

You can opt-out

Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.

tirpen
0 replies
6h40m

If a company opts out, do you guarantee that all information from their instance that you have already used for training is somehow completely removed from the "global models" you have used it to train?

If not, it's not really an opt-out, is it? The data remains compromised.

yorwba
0 replies
9h28m

"We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data."

I think this deserves more attention. For many tasks like contextual recommendations, you can get most of the way by using an off-the-shelf model, but then you get a floating-point output and need to translate it into a binary "show this to the user, yes or no?" decision. That could be a simple thresholding model "score > θ", but that single parameter still needs to be trained somehow.

I wonder how many trainable parameters people objecting to Slack's training policy would be willing to accept.

vertex17
0 replies
7h32m

Products that have search, autocomplete, etc… use rankers that are trained on System Metadata to build the core experience.

Microsoft Teams, Slack, etc… all do the same thing under the hood.

Nobody is pumping the text into an LLM training. The examples make this very clear as well.

Comment section here is divorced from reality.

ugh123
0 replies
13h21m

Whatever the models used, or type of data within accounts this operates on, this clause would be red lined in most of the big customer accounts that have leverage during the sales/renewal process. Small to medium accounts will be supplying most of this data.

tazer
0 replies
7h39m

Does this mean that even if you’re not using Slack AI they are training on our data? If not opted out?

rank0
0 replies
16h16m

Even if salesforce has the purest intentions of following policy your data is still at risk.

In real life policies have to be enforced and it's not always technically feasible to do so. It doesn’t even have to be calculated or malicious!

prakhar897
0 replies
13h59m

Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com

Sounds like an invitation for malicious compliance. Anyone can email them a huge text with workspace buried somewhere and they have to decipher it somehow.

Example [Answer is Org-12-Wp]:

"

FORMAL DIRECTIVE AND BINDING COVENANT

WHEREAS, the Parties to this Formal Directive and Binding Covenant, to wit: [Your Name] (hereinafter referred to as "Principal") and [AI Company Name] (hereinafter referred to as "Technological Partner"), wish to enter into a binding agreement regarding certain parameters for the training of an artificial intelligence system;

AND WHEREAS, the Principal maintains control and discretion over certain proprietary data repositories constituting segmented information habitats;

AND WHEREAS, the Principal desires to exempt one such segmented information habitat, namely the combined loci identified as "Org", the region denoted as "12", and the territory designated "Wp", from inclusion in the training data utilized by the Technological Partner for machine learning purposes;

NOW, THEREFORE, in consideration of the mutual covenants and promises contained herein, the receipt and sufficiency of which are hereby acknowledged, the Parties agree as follows:

DEFINITIONS

1.1 "Restricted Information Habitat" shall refer to the proprietary data repository identified by the Principal as the conjoined loci of "Org", the region "12", and the territory "Wp".

OBLIGATIONS OF TECHNOLOGICAL PARTNER

2.1 The Technological Partner shall implement all reasonably necessary technical and organizational measures to ensure that the Restricted Information Habitat, as defined herein, is excluded from any training data sets utilized for machine learning model development and/or refinement.

2.2 The Technological Partner shall maintain an auditable record of compliance with the provisions of this Formal Directive and Binding Covenant, said record being subject to inspection by the Principal upon reasonable notice.

REMEDIES

3.1 In the event of a material breach...

[Additional legalese]

IN WITNESS WHEREOF, the Parties have executed this Formal Directive and Binding Covenant."

persedes
0 replies
17h48m

can't wait for someone to recreate the secret keys that were shared via slack using those models.

oytis
0 replies
10h43m

So much to "if you are not paying you are the product". There is nothing that can stop companies from using your sweet sweet data once give it them.

nxpnsv
0 replies
11h2m

How do the self hosted alternatives compare?

nutanc
0 replies
15h7m

We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.

Sick and tired of these default opt in explicit opt out legalese.

The default should be opt out.

Just stop using my data.

nextworddev
0 replies
17h7m

Reminder that Slack is owned by SalesForce

mvkel
0 replies
3h41m

How does one technically opt-out after model training is completed? You can't exactly go into the model and "erase" parts of the corpus post-hoc.

Like when you send an email to feedback@slack.com with that perfect subject like (jeez, really?) what exactly does the customer support rep do on their end to opt you out?

Now is definitely the time to get/stay loud. If it dies down, the precedent has been set.

mvkel
0 replies
16h44m

we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data

They don't "build" them this way (whatever that means) but if training data is somehow leaked, they're off the hook because they didn't build it that way?

morkalork
0 replies
18h25m

I pity the users who have to put up with an AI trained on my slack conversations ¯\_(ツ)_/¯

luckyshot
0 replies
11h8m

So if you want to opt out, there's no setting to switch, you need to send an email with a specific subject:

Contact us to opt out. [...] To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” [...]
lars_francke
0 replies
3h38m

I contacted support to opt out.. Here is the answer.

"Hi there,

Thank you for reaching out to Slack support. Your opt-out request has been completed.

For clarity, Slack has platform-level machine learning models for things like channel and emoji recommendations and search results. We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of customer data. Our published policies cover those here (https://slack.com/trust/data-management/privacy-principles), and as shared above your opt out request has been processed.

Slack AI is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data. Slack AI uses LLMs hosted directly within Slack’s AWS infrastructure, so that customer data remains in-house and is not shared with any LLM provider. This ensures that Customer Data stays in that organization’s control and exclusively for that organization’s use. You can read more about how we’ve built Slack AI to be secure and private here: https://slack.engineering/how-we-built-slack-ai-to-be-secure....

Kind regards, Best regards,"

lagniappe
0 replies
16h43m

Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.

This is not ok. We didn't have to reach out by email to sign up, this should be a toggle in the UI. This is deliberately high friction.

kyleee
0 replies
17h19m

This should be opt in

jonnycomputer
0 replies
17h1m

This is, once again, why I wanted us to go to self-hosted Mattermost instead of Slack. I recognize Slack is probably the better product (or mostly better), but you have to own your data.

jonnycomputer
0 replies
17h0m

It should be opt-in. Not opt-out.

jjgreen
0 replies
20h5m

“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

jgalt212
0 replies
7h3m

I honestly don't know what special kind of idiot OKd this project. In any finance org, or finance dept of public cos, these chats will include material non public information. If such information leaks across internal Chinese walls or externally, Slack opens itself up to customer litigation and SEC enforcement actions.

hyping9
0 replies
5h18m

Imagine thinking content you post online is NOT used to train AI data, in 2024. Seriously, just imagine for a second being that befuddled and out of touch.

gtirloni
0 replies
16h24m

> To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information)

> We have technical controls in place to prevent access. When developing AI/ML models or otherwise analyzing Customer Data, Slack can’t access the underlying content

*> you want to exclude your Customer Data from helping train Slack global models, you can opt out.

Yeah...

frexs
0 replies
10h42m

To develop AI/ML models, our systems analyse Customer Data (e.g. messages, content and files) submitted to Slack

This Is Fine.

ellisnguyen
0 replies
9h57m

Can we organize some sort of boycott somehow somewhere? Something in court possible?

This is not just some analytics data or email names, this is potential leakage of secrets and private conversations for thousands of companies and millions of individuals.

Ridiculous.

ds
0 replies
16h59m

Just another read to mass delete your Slack DM's before you quit your job/move to another job.

https://redact.dev (my startup) makes this easy.

curious_cat_163
0 replies
16h9m

The nerve.

cleansy
0 replies
6h54m

If you send the opt out message to slack, take the second and include ceo@salesforce.com. Helps to get it done faster in most cases

cess11
0 replies
7h34m

Don't see what they're paying their customers for this data. Is it mentioned elsewhere?

barrenko
0 replies
11h47m

"Interacting" with HR is just going to get weirder, isn't it?

awinter-py
0 replies
17h5m

if shitty TOS turns out to be the thing that bends the arc of history from AI to privacy, it would make me so happy

austinkhale
0 replies
16h5m

Wow. I understand business models that are freemium but for a premium priced B2B product? This feels like an incredible rug pull. This changes things for me.

arshakarap
0 replies
4h14m

That's risky. Zoom wouldn't approve.

Stem0037
0 replies
9h27m

While Slack emphasizes that customers own their data, the default of Customer Data being used to train AI/ML models (even if aggregated and disassociated) may not align with all customers' expectations of data ownership and control.

IceHegel
0 replies
15h57m

Canceling my company’s slack as we speak. Not cool.

Ekaros
0 replies
8h38m

I would expect this from free service, but from paid service with non trivial cost... It seems insane... Maybe whole model of doing business is broken...

ENGNR
0 replies
6h22m

We’ll only use it for…

choosing an emoji, and…

a fun little internal only stock picker tool, that suggests to us some fun stocks to buy based on the anonymised real time inner monologue of hundreds of thousands of tech companies

Dwedit
0 replies
4h43m

Unlike public web data, slack data is not public. This is a problem.

CatWChainsaw
0 replies
18h11m

Opt-out and arbitration should both be illegal.

BillFranklin
0 replies
11h2m

Slack has been using customer data for ML for years. Look at their search feature - it uses learning to rank, a machine learning approach that tracks content, clicks etc.

It sounds like the worry is this overfit generative AI will spew out some private input verbatim… which I can see happening honestly. Look at GitHub copilot, it’s almost a copy paste machine.