For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data
This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.
It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.
They need to reword this. Whoever wrote it is a liability.
Especially when a few paragraphs below they say:
So Customer Data is not used to train models "used broadly across all of our customers [in such a way that ...]", but... it is used to help train global models. Uh.
Why are these kinda things opt-out? And need to be discovered..
We're literally discussing switching to Teams at my company (1500 employees)
You’d be better off just not having chat than switching to Teams.
But the business will suffer by most likely being less successful due to less cohesive communication.. it's "The Ick" either way.
The idea that Slack makes companies work better needs some proof behind it, I’d say the amount of extra distraction is a net negative… but as with a lot of things in software and startups nobody researches anything and everyone writes long essays about how they feel things are.
Distraction is not enforced. Learning to control your attention and how to help yourself do it is crucial whatever you do in whatever time and in whatever technological context or otherwise. It is the most long term valuable resource you have.
I think we start to recognize this at larger scale.
Slack easily saves a ton of time solving complex problems that require interaction and expertise of a lot of people, often unpredictable number of them for each problem. They can answer with delay, in a good culture this is totally accepted and people still can independently move forward or switch tasks if necessary, same as with slower communication tools. You are not forced to answer with any particular lag, however slack makes it possible when needed to reduce it to zero.
Sometimes you are unsure if you need help or you can do smthing on your own. I certainly know that a lot of times eventually I had no chance whatsoever, because knowledge requires was too specialized, this is not always clear. Reducing barriers to communication in those cases is crucial and I don't see Slack being in the way here, only helpful.
The goal of organizing Slack is such that you pay right amount of attention to right parts of communication for you. You can do this if you really spend (hmm) attention trying to figure out what that is and how to tune your tools to achieve that.
That’s a lot of words with no proof isn’t it, it’s just your theory. Until I see a well designed study on such things I struggle to believe the conjecture you make either way. It could be quite possible that you benefit from Slack and I don’t.
Even receiving a message and not responding can be disruptive and on top I’d say being offline or ignoring messages is impossible in most companies.
Your idea also comes with no proof, just your personal experience.
Which is extremely clear from what I’m saying, it’s completely anecdotal.
This is your choice to trust only statements backed by scientific rigour or trying things out and applying to your way of life. This is just me talking to you, in that you are correct.
Regarding “receiving a message”: my devices are allowed only limited use of notifications. Of all the messaging/social apps only messages from my wife in our messaging app of choice pop us as notifications. Slack certainly is not allowed there
Good point, could be that it reduces friction too far in some instances. However, in general less communication doesn't seem better for the bottom line.
I'm not sure chat apps improve business communications. They are ephemeral, with differing expectations on different teams. Hardly what I'd label as "cohesive"
Async communications are critical to business success, to be sure -- I'm just not convinced that chat apps are the right tool.
From what I’ve seen (not much actually) Most channels can be replaced by a forum style discussion board. Chat can be great for 1:1 and small team interactions. And for tool interactions.
we use Teams and it's fine.
Just don't use the "Team" feature of it to chat. Use chat groups and 1-to-1 of course. We use "Team" channels only for bots: CI results, alerts, things like that.
Meetings are also chat groups. We use the daily meeting as the dev-team chat itself so it's all there. Use Loops to track important tasks during the day.
I'm curious what's missing/broken in Teams that you would rather not have chat at all?
If you switch to Teams only for this reason I have some bad news for you - there’s no way Microsoft is not (or will not start in future) doing the same. And you’ll get a subpar experience with that (which is an understatement).
I think a self hosted matrix/irc/jitsi is the way to do it.
We've been using Mattermost and it works very well. Better than Slack.
The only downside is their mobile app is a bit unreliable, in that it sometimes doesn't load threads properly.
The Universal License Terms of Microsoft (applicable to Teams as well) clearly say they don't use customer data (Input) for training: https://www.microsoft.com/licensing/terms/product/ForallOnli... Whether someone believes it or not, is another question, but at least they tell you what you want to hear.
What if they exfiltrate customer data to a data broker and they buy it back?
It's not customer data anymore.
I would guess Microsoft has a lot more government customers (and large customers in general) than Slack does. So I would think they have a lot more to loose if they went this route.
Ugh Teams = Microsoft. They are the worst when it comes to data privacy. I'm not sure how that is even a choice.
Teams have better voice/video. But chat is far worse, absolutely shit, though Slack seems to be working to get there.
Obviously because no one would ever opt in.
I'd make sure to do an extended trial run first. Painful transition.
Monies.
Considering what Microsoft does with its "New and Improved(TM)" Outlook and love for OpenAI, I won't be so eager...
To me it says that they _do_ train global models with customer data, but they are trying to ensure no data leakage (which will be hard, but maybe not impossible, if they are training with it).
The caveats are for “local” models, where you would want the model to be able to answer questions about discussions in the workspace.
It makes me wonder how they handle “private” chats, can they leak across a workspace?
Presumably they are trying to train a generic language model which has very low recall for facts in the training data, then using RAG across the chats that the logged on user can see to provide local content.
My intuition is that it's impossible to guarantee there are no leaks in the LLM as it stands today. It would surely require some new computer science to ensure that no part of any output that could ever possibly be developed isn't sensitive data from any of the input.
It's one thing if the input is the published internet (even if covered by copyright), it's entirely another to be using private training data from corporate water coolers, where bots and other services routinely send updates and query sensitive internal services.
There is a way. Build a preference model from the sensitive dataset. Then use the preference model with RLAIF (like RLHF but with AI instead of humans) to fine-tune the LLM. This way only judgements about the LLM outputs will pass from the sensitive dataset. Copy the sense of what is good, not the data.
Hope it's not doublespeak, ambiguity leaves it grey, maybe to play.
so if I don't want slack to train on _anything_ what do I do? I still suspect everything now
Opt out is such bullshit.
Sounds like it’s been written specifically to avoid liability.
I'm sure it was lawyers. It's always lawyers.
Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.
Hm, now you mention it, I don't think I've ever seen this specific example.
Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg
Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.
Nah. Whoever decided to create the reality their counsel is dancing around with this disclaimer is the actual problem, though it's mostly a problem for us, rather than them.
It’s a problem for them if it looses customer trust / customers.
if they lose enough, they will "sorry we got caught"
if they don't, they will not do anything
If it impacted their business significantly, it would restore some of the faith I've lost in humanity recently. Frankly, I'm not holding my breath.
I'm imagining a corporate slack, with information discussed in channels or private chats that exists nowhere else on the internet.. gets rolled into a model.
Then, someone asks a very specific question.. conversationally.. about such a very specific scenario..
Seems plausible confidential data would get out, even if it wasn't attributed to the client.
Not that it’s possible to ask an llm how a specific or random company in an industry might design something…
exactly. a fun game to see why it is so hard to prevent this
https://gandalf.lakera.ai/
Sometimes the obvious questions are met with a lot of silence.
I don't think I can be the only one who has had a conversation with GPT about something obscure they might know but there isn't much about online, and it either can't find anything... or finds it, and more.
Seems like time to start some slack workspaces and fill them with garbage. Maybe from Uncyclopedia (https://en.uncyclopedia.co/wiki/Main_Page)
The Riders of the Lost Kek dataset is an excellent candidate https://arxiv.org/abs/2001.07487
I think it's as clear as it can be, they go into much more detail and provide examples in their bullet points, here are some highlights:
Our model learns from previous suggestions and whether or not a user joins the channel we recommend. We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data.
We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy. Simply put, our model can't reconstruct the search query or result. Instead, it learns from team-specific, contextual information like the number of times a message has been clicked in a search or an overlap in the number of words in the query and recommended message.
These suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.
To do this while protecting Customer Data, we might use an etrnal model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.
Wow you're so right. This multi-billion dollar company should be so thankful for your comment. I can't believe they did not consult their in-house lawyers before publishing this post! Can you believe those idiots? Luckily you are here to save the day with your superior knowledge and wisdom.
If you trained on customer data your service contains custom data.
- Create a Slack account for your 95-year-old grandpa
- Exclude that one account from using the models, he's never going to use Slack anyway
- Now you can learn, memorise, or reproduce all the Customer Data you like
Whatever lawyer wrote that should be fired. This poorly written nonsense makes it look like Slack is trying to look shady and subversive. Even if well intended this is a PR blunder.
The problem is this also covers very reasonable use cases.
Use sampling across messages for spam detection, predicting customer retention, etc - pretty standard.
Then there's cases where you could have models more like llms that can output data from the training set but you're running them for that customer.