return to table of content

Protecting your email address via SVG instead of JavaScript

yreg
27 replies
1d8h

Email addresses published on webpages usually need to be protected from email-harvesting spambots.

Do they though?

I have had my email address published on my website in a <a href="mailto:… for like 20 years and I don't get spam that would get through the spam filter.

I use both Gmail and (for some other addresses) a webmail hosted by a local company which uses some other filter. Both work well, so it's not something only Google can do.

digging
5 replies
1d3h

My preference is to not have my email harvested at all when possible, even if I don't personally see the spam emails. (I'm not saying it's a critical privacy/security issue, but a preference.)

jakubmazanec
3 replies
1d2h

So then you never use your email, right?

digging
1 replies
1d1h

What?

jakubmazanec
0 replies
9h38m

Your email address cannot be (and isn't) secret, if you give it to other people (regular people, i.e. friends, colleagues, etc.) so they can send you emails. If you don't want your email harvested, you can never use it (at least to receive emails).

r-w
0 replies
1d

I think they’re obliquely referring to the scanning practices of major providers like Gmail, which most people use to filter their spam.

doug-moen
0 replies
15h46m

My experience is that you send email to someone whose Op Sec is not as good as your own, then your email will be harvested at the point when that person's address book is harvested. I don't know all the details of how these harvests occur, but using a shady mobile app with Contacts permission would be enough.

a_random_canuck
5 replies
1d2h

They do. My wife lost her 10-year-old Instagram account to a well crafted phishing attack against an email she had published…

Instagram/Meta’s customer support is absolutely atrocious and disgraceful on this front. They basically treat my wife like she’s also a spammer and there’s no way to recover the account or undo any of the changes the spammers made.

It’s hilarious how they ask you to “appeal” a ban by clicking a single button without giving any chance to rectify what the spammers did to her account. Of course their automated bots just reject your appeal almost instantly. Shameful.

qingcharles
0 replies
1d1h

Clicking the appeal button is like a trap to permanently ban your account.

You can get it back by paying off a Meta employee through a site like Swapd. It's either that or get your comment to the front page of HN. Those are the only two customer support channels for Meta or Google.

hoherd
0 replies
1d2h

This gave me "Press F to appeal ban" images.

dgb23
0 replies
1d2h

This could happen to anyone. You’re tired or thinking of something else, the attack weirdly aligns and you don’t notice it until it’s too late.

crtasm
0 replies
1d2h

Does her email show up on any leaks on https://haveibeenpwned.com/ ? I'm wondering if not publishing it would have made any difference to receiving phishing messages.

chefandy
0 replies
1d2h

Would such an attacker be stymied by this? It seems like automated email harvesting wouldn't be a big time saver for any attack that required a well-crafted anything. I don't know anything about that particular attack, though.

crazygringo
2 replies
1d1h

Exactly.

I definitely recall in the early 2000's it absolutely did lead to spam, and e-mail obfuscation techniques were a real thing that genuinely helped.

But by 2015 or so it didn't matter at all anymore, in my personal experience. It didn't even lead to spam that needed to filtered. Spammers just stopped looking for e-mails that way.

Which makes perfect sense -- most people don't have their e-mail address listed anywhere online in the first place, but you can purchase gigantic lists of e-mail addresses. That either originate from companies that sell their own user lists, or people who hacked the companies' servers.

These days if you want to send spam, trawling the web for e-mails makes zero sense. It's practically the least efficient thing you could do.

treflop
0 replies
1d

I’ve been having all my email addresses posted plain text since like 2005 and I’ve signed up on like every website imaginable (my password manager has over 2,000 entries) and I’ve never had a spam problem, at least on Gmail.

r-w
0 replies
1d1h

Unless you’re the one trying to sell them, in which case that’s part of doing business :)

xyst
1 replies
1d7h

this used to be a problem in the early 00s. I don’t think spam filtering was as good back then so protecting your public email from spam was necessary.

Also this was a time when mail boxes were often allocated 10-25 megabytes. So spam bots could easily flood your email.

WirelessGigabit
0 replies
1d

When I signed up for Hotmail it was 2MB.

Then on April 1st, 2004 Google launched wasn't an April 1st joke... GMail with 1GB! I remember getting a beta invite and inviting others.

qingcharles
1 replies
1d1h

I have two people I designed web sites for in the last year and I put both their email addresses in the footer and neither one of their accounts has received a single spam message in all of that time (not even something dropped into the Spam folder). Both sites are popular and have thousands of visitors and get scraped by every search engine and AI bot you can think of.

r-w
0 replies
1d1h

Interesting. Maybe footer emails tend to be support contact addresses rather than personal inboxes. Otherwise I’d find that discrepancy very surprising.

paradox460
1 replies
1d1h

The practice of email address "obfuscation" feels like a relic of a bygone era, one that was never actually sound in its methodology, but spread. A form of cargo-cultism has kept it alive

SoftTalker
0 replies
1d1h

Yeah just looking at this, it appears to add about 1K of overhead and at least one additional http request for something that ultimately boils down to a mailto: link, so it can still be scraped, and just adds bloat to your web page.

zufallsheld
0 replies
1d3h

I host my own Mailserver and all addresses that are publicly visible get spam, e.g. my blog or my mail that was visible on github.

nozzlegear
0 replies
1d3h

Same here, I've had my email plainly visible on my website in mailto links and on Github, and I don't get any spam that breaks through Fastmail's spam filters.

dhosek
0 replies
23h13m

My thoughts exactly. On the other hand, an email address I used with Usenet ca 1999–2001 has had a consistent flood of spam. I think most spammers are using the same 20+-year-old list of emails.

The email address on my website doesn’t even get stuff that goes to the spam filter. Nothing, nada zilch.

I do think that there are some mailing lists that get generated by trying to guess emails, brute-forcing gmail addresses by trying dictionary attacks of the FIRSTNAME.LASTNAME variety or 1–10 letters. I get a tiny amount of spam sent to a domain@domain.com address I have, but that’s typically on the order of one message a year.

And all else aside, the overall volume of spam email has declined dramatically, even ignoring the effect of the gmail spam filter. I’m guessing that email as a spam vector just doesn’t make sense anymore and most of what goes out is a mix of 419 scammers trying to make their quotas and would-be scammers who’ve been scammed into buying that 20-year-old list of emails.

adrianpike
0 replies
1d2h

I've also had my email posted in mailto's in a half dozen places for... a long time. I remember in the early 00's when I'd cargo cult the old "type the whole email out as adrian at adrianpike dot com" thing on forums thinking it would work as some mystical talisman, and it turns out considering emails to be secret isn't worth the time.

Animats
0 replies
19h17m

Agreed. I have several web sites with publicly visible email addresses and they don't get much spam.

The spam I get is rather mis-targeted. For a while I was getting spam for equipment which would be useful were I a bulk producer of olive oil. "We have 15 years of experience in the research, development and production of automatic edible oil filling equipment...." There are the usual fake financing deals: "We’ve pre-approved your business for financing..." Whatever sends that crap doesn't look at the web site at all.

When I get spam from Gmail or Outlook accounts, I report it, so they will get a strike against their account. I don't hear from those people again.

All other spam is so obviously bogus that simple filters are dumping it into a junk folder. Most of it seems to be phishing emails. "You have won a (some tool)..." seems to be popular this week.

4u00u
0 replies
1d1h

very recently, within a day of publishing an email on a footer of a page i got a phishing email that was not filtered by spam and looked very genuine

throwaway11460
10 replies
1d10h

Don't have time to test myself right now - what about accessibility, can a screen reader read it?

gostsamo
7 replies
1d10h

I tested and seems accessible on the live demo. Not sure if is as protected as the author claims though, but it might throw some bots for a spin.

rrr_oh_man
4 replies
1d9h

Man, I’ve always wondered how to test apps with a (simulated) screen reader, but never got too far

rrr_oh_man
0 replies
18h24m

Added this. Thank you a lot!

gostsamo
1 replies
1d9h

My secret is that I'm not simulating. Being blind forces you into it. :D

For testing purposes, the nvda screen reader is free and open source. I'm not sure if there is a driver for it to have an api access to what it would output, but it might be a fun project to try for a11y testing purposes.

rrr_oh_man
0 replies
18h23m

Thank you! And sorry for all the shitty code I produced over the years.

dylan604
1 replies
1d4h

but it might throw some bots for a spin.

Until some bot dev sees this, accepts the challenge, and then solves it as a function within their package that never needs updating again because it is now done. So, live it up while it is not solved. After that, just shrug your shoulders at yet another idea no longer being useful

gostsamo
0 replies
1d3h

The key in this case is that this is not a problem for me even if someone implements such a protection.

The rest is mice and traps.

Operyl
1 replies
1d10h

Given the entire bottom section, it seems like accessibility was taken into account here.

throwaway11460
0 replies
1d10h

Unfortunately then I think it won't help at all - going through the accessibility tree is a standard web crawling play.

okasaki
10 replies
1d10h

Is it still necessary to obfuscate email addresses? Mine isn't and I get around 50 generic spam emails per month to gmail.

ale42
5 replies
1d10h

I think that nowadays most spam lists come from data breaches and address-collecting malware. It's cheaper than running a bot to scan the web for addresses. We get spam on addresses that were never published online.

RaoulP
4 replies
1d10h

I think so too. And I think the majority of data breaches that have lead to spam for me are from ages ago, from random services I signed up for as a teenager.

For a few years after that I did the "+" Gmail alias thing, to try to filter and catch companies. But I realised that's easy and obvious to strip, so it wasn't worth the effort (although I have caught PayPal leaking my email somehow).

ale42
3 replies
1d10h

If you self-host your email, you can use "." as a delimiter instead of the "+". People would already need to know they can strip that part...

RaoulP
2 replies
1d9h

Sounds good! I might go even further and just use a custom address for each service, i.e. paypal@example.com or something.

But self-hosting email is an adventure I'm nervous to embark on.

samatman
0 replies
1d2h

Anecdotally, sending mail to example.com from example@mydomain.com can cause a whole host of human-factors problems which can be eliminated with something like RaoulPtoExample@mydomain.com.

nobody9999
0 replies
1d9h

Sounds good! I might go even further and just use a custom address for each service, i.e. paypal@example.com or something.

Which is exactly what I do. As soon as I see spam sent to any particular email address, I know who it is that leaked the address and I can block it without issue.

But self-hosting email is an adventure I'm nervous to embark on.

Why are you nervous about it? I've been doing so for decades and haven't had many issues at all. There are a bunch of all-in-one solutions like mailinabox[0] (I roll my own, but as I said, I've been doing this for decades) and others which would likely make things simpler for you. Go for it! You won't be disappointed.

[0] https://en.wikipedia.org/wiki/Mail-in-a-Box

martyvis
1 replies
1d10h

Is that all. I get around 70 genuine spam emails to my Gmail account every day now (all detected correctly by Gmail)

tempestn
0 replies
1d10h

I get a similar volume, and gmail likely detects almost all of them. Problem is, it also falsely detects the occasional non-spam message, so I do need to periodically scan through the spam box, which is a bit of a pain when it contains hundreds or thousands of emails.

sitzkrieg
0 replies
1d8h

it isnt but people like to make a problem of it with elaborate whatifs

RaoulP
0 replies
1d10h

I think this is a valid question. I see lots of effort at obfuscation but don't know if there's still a need.

I barely get spam and have a bigger issue with false positives in my spam folder. On the other hand I don't think there are many pages on the web that display my email address, so I'm curious about others' experience.

cwillu
10 replies
1d10h

Email is still plain-text within an xml document referenced in the page source.

shanehoban
4 replies
1d9h

Try to query it though via document.querySelectorAll('a') for example. It's a good first line of defense as a lot of scraping techniques do this approach.

However, if you have a headless browser setup for scraping, and simply fetch the current URL while on the page[0], you can get the plain text, and do a regex search for email addresses which will get you the email address - albeit this is a strange approach to take I admit.

[0]: fetch('./').then((res) => res.text()).then((text) => console.log(text))

nolok
2 replies
1d8h

It's a good first line of defense as a lot of scraping techniques do this approach.

Most basic scrappers, the ones that are not for your testing or devtools or automation or ... Actually use basic text, without any interpretation. They grep the source code, they don't run a dom and javascript engine, because it's a major difference in computing needs and speed.

I am not saying there is no evil scrapper doing dom evaluation, there are tons, I am reacting to your "FIRST line of defense", that one is scrambling the raw text, which is why we got there.

What parent is saying, is that this is trying to upgrade the defense that we have generated to stop the threat that evolved, but it forgot why we got there and thus makes itself vulnerable to the original threat.

cqqxo4zV46cp
0 replies
1d7h

If they’re saying it, I think that they’re wrong. One of those naively written scrapers won’t pick up an email address ‘protected’ in this way. It’s simply continuing the game of cat and mouse.

animuchan
0 replies
1d7h

Absolutely. The basic tools just fetch sites recursively and use regular expressions. The advanced tools are Chromium-based, so will render SVGs just fine (and then potentially run OCR / AI to extract text even from JPEGs).

This technique protects from a "neither here nor there" subset of programs, I wonder how large is that set in practice.

nkozyra
0 replies
1d6h

You can just query for all the image elements and then read any svg using the document model.

This is trivial to overcome for most basic scrapers and not much harder even if you try to obfuscate with paths for more sophisticated ones.

_joel
3 replies
1d10h

The idea being that spam bots don't parse svg's looking for email addresses, just the page html. I'm not sure how effective this really is with modern spam protection, however.

turboturbo
2 replies
1d9h

The idea also seem to be that spam bots don’t look for `href="mailto:something"` in the DOM

rrr_oh_man
0 replies
1d9h

That seems surprising, tbh

edave64
0 replies
1d8h

The mailto is inside the SVG, not the HTML document. So that's not "also" it's the same idea of bots not looking at the svg at all

majestic5762
0 replies
1d10h

yeah, useless stuff portrayed as smart

fp64
6 replies
1d10h

I don't get it, I can just curl the svg and grep for mailto?

rany_
5 replies
1d10h

Yes, but these scrapper bots aren't that sophisticated.

winternewt
1 replies
1d10h

But they will be as soon as this sees widespread use.

_joel
0 replies
1d10h

it won't be widespread imho, not when you share you email address with other parties that then lose/sell your details. fastmail like 'temporal' email addresses could help, however.

fp64
1 replies
1d8h

Crawl every link, now including SVG, and grep all 'mailto:' does not sound super sophisticated?

    wget --recursive --quiet $BASE_URL && grep -roh 'mailto:\([^"]*\)'
works on the example and just prints the email

planede
0 replies
1d8h

I think the idea is that email scraper bots typically don't bother downloading images referenced by <img> tags.

amsterdorn
0 replies
1d8h

Querying DOM nodes is inherently more complicated than a regex on unparsed HTML.

throwaway598
5 replies
1d6h

My domain: 24 years registered to me. A .com.

My email address: Listed at the top of the front page. In a H3 tag.

This email address's spam problem: Not a problem. 15ish per day get to me including Junk folder. Thanks Purelymail.

What is a problem: Transactional email unrelated to transactions, Promotional email which is newsletter junk spam, Social networks complaining of not being used.

zufallsheld
1 replies
1d3h

15 spam mails do seem quite much to me. I blacklisted addresses for less.

anamexis
0 replies
16h15m

If they're getting filtered, who cares?

SoftTalker
1 replies
1d1h

Social networks complaining of not being used

This is my biggest one. I get more spam from Facebook begging me to log in than I do from almost anything else. I haven't used the account in about 7 years, you'd think they'd figure it out.

kevincox
0 replies
1d

you'd think they'd figure it out.

Cost of sending spam: Effectively zero.

Cost of pissing off inactive user: Essentially zero.

Cost of convincing inactive user to come back: Positive.

Add in a bunch of other factors like some product manager twisting stats to make it look like they are getting users back even if they really aren't and you see why it happens.

speckx
0 replies
4h45m

I have a few old domains I registered in the late 90s, and some of them still have the mailto with my email, some I rarely get any spam, and others it's dozens a day. SpamAssassin does a great job of caching the spam.

dannyobrien
5 replies
1d10h

I would like to push back on the idea that you should obfuscate your email address at all.

My email addreas is danny@spesh.com. I get a lot of spam -- possibly, since I have been distributing that address deliberately on the web and inadvertently in hacked datadumps, a near maximum amount of spam.

But the benefits of having people easily find a way to contact me directly has for me far outweighed the (largely solved) challenge of discarding automated spam.

Publish your email address! It's okay! Very little bad will happen, and people will be able contact you without going through some strange social media intermediary!

parasti
3 replies
1d10h

This is appropriate advice for the average HN reader. For everyone else, probably not. I've seen first hand otherwise intelligent people being unable to discern an obvious (to me) online scam from a legitimate business. These are the people spammers are targeting. These are the people that need to obfuscate their email address.

_joel
1 replies
1d10h

So you're saying the same people unable to discern a spam email knows how to embed a mailto: link in an XML document and write webpages. Ok.

parasti
0 replies
1d9h

Never said that. I'm a web developer. People ask me to add their emails to web pages. Comment quality on here seems to have taken a dive.

richrichardsson
0 replies
1d9h

Even sophicasted users can slip up in the right circumstances.

Personal anecdote: one morning, whilst still quite sleepy received a very well crafted Namecheap phishing expedition. I half knew the product they were claiming was lapsed was actually fine, but I had just recently renewed so I thought perhaps there had been a problem I missed, and it was convincing enough that I clicked the link before doing the normal sanity checks. Thankfully the address it went to didn't resolve. Hopefully I would have noticed the obviously incorrect URL before I entered any details, and I have 2FA enabled, but still, I should and do know better, it was just perfect timing for a well crafted attack...

SushiHippie
0 replies
1d9h

in hacked datadumps

https://haveibeenpwned.com/

45 data breaches and 7 pastes

Wow, I don't know if I've ever seen a real address in so many breaches haha

muzster
4 replies
1d7h

Heavily guarded fortress would indicate something of value inside, and the big crooks may spend a little more effort. In the age of AI, this becomes even easier.

   {
     "model" : "gpt-4-turbo",
     "messages" : [ 
       {
         "role" : "system",
         "content" : [ {
          "type" : "text",
          "text" : "return a json array of all valid emails found in the image."
          } ] 
       }, 
       {
         "role" : "user",
         "content" : [ {
           "type" : "image_url",
           "image_url" : {
           "url" : "data:image/png;base64,{{ INSERT_BASE64_PNG_DATA }}"
         }
       } ]
     } ],
      "temperature" : 0.5,
      "max_tokens" : 2048,
      "top_p" : 1.0,
      "frequency_penalty" : 0.0,
      "presence_penalty" : 0.0
    }
Edit: Converting web page to an image is trivial.

zipping1549
1 replies
1d7h

It won't make sense cost wise though

omneity
0 replies
1d5h

Except the cost is only going down over time

internetter
1 replies
1d4h

We've had OCR for decades before GPT. I suspect GPT might perform worse than OCR. What a waste.

muzster
0 replies
1d

Agreed - it's a waste. GPT is not too bad at reading text from image and with the added bonus that you can reason with it.

miki123211
4 replies
1d9h

While there's nothing stopping this technique from being accessible in principle, the example given in the article is a really bad one.

The article uses "Email us!" as the label on the svg and a elements, which effectively hides the actual email address from screen readers. Using aria labels in this way is a really bad practice, a screen reader user should have the same experience as anybody else unless there's a very good reason to do otherwise, and if you think your reason is a good reason, you're probably wrong.

The proper way to do this would be to put the actual email address in the labels,.

47282847
2 replies
1d9h

Isn’t the whole point of the exercise to not have the document contain the email address in a (machine-)readable format?

janosdebugs
0 replies
1d6h

The NVDA screen reader reads this text as: "This is my email frame link email us." That is by no means equivalent to actually seeing the email address. I found that HTML entity encoding every single character of the link takes care of any spam problem already and is much more accessible.

Doe-_
0 replies
1d8h

The email address wouldn't be in the document directly, only in the SVG. Whether the title of the SVG contains "Email us" or the email address wouldn't affect how it works.

If the scrapper is searching the DOM rather than simply downloading the webpages, then the email will found regardless.

matteason
0 replies
1d8h

This can also affect voice dictation software like Dragon - if a user says 'Click myemail@mydomain.tld' it won't activate the link as Dragon is expecting 'Click email us', as that's now what the browser exposes as the link text.

That point might be academic anyway as I'm not sure Dragon would activate a link inside an SVG

magnat
4 replies
1d10h

even when a human visitor has their JavaScript turned off, the email address displayed on the page remains usable

NoScript on Firefox with default settings don't render <object> tags (replaces them with placeholders), so this technique doesn't work here.

https://imgur.com/2tCAgAf

Laaas
1 replies
1d9h

uBlock Origin can block JS too FWIW. There’s a convenient button for it in the extended menu.

brettermeier
0 replies
1d7h

Thank you, didn't know that!

yau8edq12i
0 replies
1d1h

That's a different thing, though. Not sure why you'd make this point.

jaeh
0 replies
1d8h

it's the same in chromium.

donatj
4 replies
1d3h

A friend of mine is an absolute wizard and has been building essentially “responsive images” as SVGs with JS inside. They adapt to their size programmatically. It’s… interesting.

The fact that SVGs can even have JS embedded feels both untapped and kind of dangerous.

soperj
1 replies
1d3h

SVGs are responsive out of the box? I'm confused about what the Javascript would be doing to help that situation within the svg.

asynchronous
0 replies
1d3h

I think they’re talking about dynamically actually changing the image itself, not just resizing

johnny99k
0 replies
1d2h

This has been known in the security community for quite some time.

alemanek
0 replies
1d2h

That sounds super interesting. Does your friend have a GitHub or site that shows what they’re doing on that front. If so could you post link.

This is super far out of my wheelhouse technically as a backend engineer but it sounds really cool.

mediumsmart
3 replies
1d7h

this works if you write it into the html on fullmoon tuesdays :

<a href="&#109;&#x61;&#105;&#x6c;&#116;&#x6f;&#58;&#x73;&#111;&#x6d;&#101;&#x2e;&#100;&#x75;&#100;&#x65;&#64;&#x74;&#104;&#x65;&#46;&#x6f;&#116;&#x68;&#101;&#x72;&#100;&#x75;&#100;&#x65;&#115;&#x2e;&#115;&#x69;&#116;&#x65;">&#115;&#x6f;&#109;&#x65;&#46;&#x64;&#117;&#x64;&#101;&#x40;&#116;&#x68;&#101;&#x2e;&#111;&#x74;&#104;&#x65;&#114;&#x64;&#117;&#x64;&#101;&#x73;&#46;&#x73;&#105;&#x74;&#101;</a>

kevin_thibedeau
1 replies
1d4h

That works for humans. There's no reason to believe bots aren't handling entity parsing.

robszumski
0 replies
1d4h

In my experience they haven't been in the past, but LLMs change the game by doing it by default.

rishikeshs
0 replies
23h20m

how des this work

kees99
2 replies
1d5h

Not only "protecting your email" is pointless like others have already pointed out, it's actively harmful.

There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.

digging
1 replies
1d3h

There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.

Do you have one on hand? That sounds absurd and I've never seen it

tentacleuno
0 replies
1d2h

Mastodon instances fronted by Cloudflare (with Email Protection on) are good examples.

dartos
2 replies
1d4h

Idk LLM powered scraping can pull the email out of this without any issue

stkdump
0 replies
1d4h

It even uses the exact same syntax as in html, so as long as svg content isn't specifically excluded, normal web scraping would just work without modification.

judge2020
0 replies
1d4h

Perhaps, but I think OCR is more likely.

butz
2 replies
1d3h

I assume that nowadays emails are pulled directly from hacked mailbox contacts list. Nobody has the time to go through each individual website and collect emails one by one.

Tagbert
0 replies
1d3h

No body. Web crawler bots.

Closi
0 replies
1d3h

I assume that emails are pulled from every method available.

zaxomi
1 replies
1d6h

Cool.

1 hour later.

Spam-scraper updated to support this.

mrbluecoat
0 replies
1d2h

Exactly

seanvelasco
1 replies
1d6h

i bought an premium .app domain a few months ago. not published in websites yet. no history of previous owners. just a fact that it's listed as a premium domain on registrars.

first emails I received after the gmail welcome email were b2b sales from construction companies (i'm not in this field), shopify optimizations (i don't run one), agencies suggesting how i improve the ui/ux of my site (no website yet).

thankfully, they're all in the spam folder. i'm using google workspace.

i believe these spammers get their leads on newly-registered domains. so, how do we protect ourselves from that?

hu3
0 replies
1d5h

I believe the only effective protection against these fresh domain spammers is what you did:-some pretty good anti-spam mechanism such as Gmail.

robbyiq999
1 replies
1d4h

How about posting 2 email addresses, a hidden one, and the actual one. Using the hidden one to filter the actual one

JohnFen
0 replies
1d4h

This has been my approach since the mid '90s. It works very well.

nojs
1 replies
1d2h

This is a cool trick. The email is in cleartext in the source, meaning mailto works and copy-paste works. But most scrapers probably skip the .svg file.

pdonis
0 replies
1d2h

> most scrapers probably skip the .svg file

But they won't as soon as they realize it's just easy to parse text that contains data they're looking for.

janmo
1 replies
1d8h

Here is what I do:

<span class="contact-email">rea<span class="hidden">nospam</span>l@mai<span class="hidden">sjs</span>l.com</span>

I still receive "spam" tho, but it seems they manually collected the email because what I receive are B2B proposals clearly targeted at the topic of my website.

jszymborski
0 replies
1d3h

If the scraper uses a headless browser, I think that it might defeat your method. That said, using a headless browser to crawl for emails is relatively expensive so perhaps the spam is not from your site.

geuis
1 replies
1d9h

Why? What's the point?

All you're doing I making it slightly more difficult for the people that want to contact you to do so.

OCR has been a thing for years.

Just put your email out there. That's what spam filters are for.

charles@geuis.com. There. Scrape it. Spam it. I don't care.

Edit:

Yes, thank you for signing me up for the DNC (already a member), some random Trump org, something about Scientology, and another random christian-based website. Honestly, I'm kind of sad at the lack of originality given the otherwise extremely ingenious community we have here.

Maxatar
0 replies
22h47m

But you just proved the point. You might not care to be signed up for some random Trump org, Scientology, or whatever, but other people do care and if you want to author a website that responsibly uses people's emails without subjecting them to unnecessary spam, then it's worth taking these techniques (not necessarily this specific one) into consideration.

While OCR does exist it's incredibly expensive compared to text scraping. The main way to combat spam is to make the cost of spamming more expensive than the benefit.

zigzag312
0 replies
1d8h

Sadly stackoverflow closed the discussion. Even though discussion is both interesting and valuable.

ceving
1 replies
1d9h

It does not work if you change the font-size.

FrostKiwi
0 replies
16h25m

You are not referring to page zoom or dpi I presume, which works.

I added `style="height: 2em"` to the `<object>` to fit it within my use-case. Should be able to adapt to current font size that way I think.

brap
1 replies
1d9h

I’ve been using the same gmail address for like 20 years.

I don’t think I got a single spam email in the last 5-10 years.

SMS, on the other hand…

rvnx
0 replies
1d9h

A couple of modern spammers send you spam from Gmail and say “I included my colleague in CC please hit ‘reply all’ if you are interested”

SahAssar
1 replies
1d10h

If concealing it in an object tag works then you could just have the object tag show it as plain text or html, right? Not sure why its an svg.

juped
0 replies
1d10h

probably because the scraper has "that's an image, skip it" logic

CM30
1 replies
1d3h

I think the main thing people forget with stuff like this is that yes, all these setups are possible (or even trivial) to bypass, but you're not really dealing with a dedicated adversary that's targeting you in particular.

Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses, so the main trick to dealing with them would be to do something other sites/services don't. If you or your company become successful enough that people are actually targeting you in particular, then congrats, you're probably in a good place anyway.

cmiller1
0 replies
1d1h

Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses

But this is also sort of a security through obscurity approach, if enough people adopt one of these methods of obfuscation then the spammers absolutely will change their tools.

xyst
0 replies
1d7h

Kind of neat but I would rather just have a “throwaway” email if I was sharing globally.

In my case, I setup an email alias with a sieve rule (if email sent to alias move to “public inquiry” folder). Prior to processing rule, spam assassin takes care of the non technical folks that couldn’t be bothered to run their spam campaign through spam assassin testers. Or even nontechnical folks that wouldn’t know how to setup their domain for sending email (spf, dkim, dmarc, …)

sircastor
0 replies
1d3h

I think I get more unsolicited email from related businesses trying to get a foot in the door with my company - I assume they're connecting dots either from LinkedIn or Github (probably both). This is an interesting solution to the problem, but I don't genuinely think that anyone is scraping websites for email addresses anymore. I don't think it's cost effective for the modern spammer.

saint-loup
0 replies
1d1h

At that point, isn't adding a good old contact form a simpler solution? You can link it with your email address or other channels. It can even works with static websites, I hooked up mine with Nextcloud Forms.

I appreciate the hacker creativity at display here, but as other said obfuscating an email address raises accessibility issues. Hiding content from some programs and not others (spam bots vs assistive technologies) seems inherently a losing game, for you or for users.

replete
0 replies
1d3h

<a href="{rewritten by js}">domain.com</a> a::before { content: "username@" }

readmemyrights
0 replies
1d2h

Funny I'm seeing this now, I've finally ade the first tentative steps into making a website, and noticed that pandoc has an --email-obfuscation option and the whole topic was on my mind. I don't remember the last time I received an actual spam email (not counting desparate marketters trying to remind me of that one website I tried ages ago). Funnily enough, the new frontier seems to be what's app and SMS of all things. A month or two back I got a job offer from an indonesian phonenumber from what's app, and then something similar directly to my SMS. I didn't publish my phone number anywhere online, the closest thing to making it public was joining my college's what's app group and giving my phone number to a bank for a student credit card, and honestly I wouldn't put leaking them to some spam agency beyond either.

I'm using voice over on MacOS chromium and I have the same experience as the NVDA user, although if I interact with the "link" I'll eventually find the email. If I wasn't aware of the ofuscation however I probably would just think the webpage was weird, saying "this is an email" but actually giving a mailto: link. In general, if you're doing something special to improve accessibility then odds are you're doing it wrong, and if it's anything web related the odds are at least 90%. Most accessibility issues on the internet are developers trying to be smart by using ARIA labels or such which usually just make it worse. The example I have to deal with most often are manpages on man.openbsd.org. All of their cross references to other manpages say something like "openssl, section 1" instead of "openssl(1)", which is what's displayed on the screen and what the browser's find command sees while searching.

For completeness, I also tried the page with various terminal browsers, specifically lynx, felinks, w3m, and edbrowse. None, and I mean NONE of them could display the svg properly, they couldn't even recognise it as an image.

portaouflop
0 replies
1d4h

Maybe I’m too stupid but I don’t get why you would want to do this at all. Had my email in plaintext on the website for ages and never had an issue with spam…

perilunar
0 replies
10h50m

If you have an email address in the HTML of a page served by Cloudflare, they will obfuscate it and add their own decoder script.

nloomans
0 replies
1d8h

I tested the example using the TalkBack screenreader on Android. With Firefox I was able to select and click on the link, but it did not announce the email address. With Chromium it completely ignored the existence of the SVG email. I was unable to select it and it was like the email wasn't there at all.

So yeah, I wouldn't call this accessible.

niutech
0 replies
21h9m

This requires loading an external SVG file, better use an inline version:

    <object data="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%20200%2024%22%3E%3Ca%20href%3D%22mailto%3Amyemail%40mydomain.tld%22%3E%3Ctext%20x%3D%2250%25%22%20y%3D%2250%25%22%20dominant-baseline%3D%22middle%22%20text-anchor%3D%22middle%22%3Emyemail%40mydomain.tld%3C%2Ftext%3E%3C%2Fa%3E%3C%2Fsvg%3E" type="image/svg+xml"></object>
Also have a look at this: https://spencermortensen.com/articles/email-obfuscation/

kindawinda
0 replies
1d1h

google might start indexing your email

kelnos
0 replies
1d1h

I gave up on this sort of thing. Spam filters are good enough nowadays that I don't think I see an increase in spam by having my email address publicly available without obfuscation. (That is, an increase beyond other spam sources, like crappy companies who have my email address for a legitimate purpose, but sell it to third parties.) In general I see less than 1 spam email hit my inbox per day, and that's fine.

Granted, this may depend on email provider and spam filter, so YMMV, but it hasn't been an issue for me.

karol
0 replies
1d10h

Spam filters work in 2024.

Does the fact someone independently discovers Gauss method to sum up all the numbers 1...100 today make it worth sharing?

My point is that this is a primitive and easy to break workaround and better methods exist.

iforgotmysocks
0 replies
1d5h

I just have a simple contact page that sends message to discord webhook

hhsectech
0 replies
1d7h

Interesting idea...but could a crawler not just incorporate some AI like LLava2 or convert the SVG to a JPG and use OCR to get the email addresses out?

It just seems like this adds a couple of steps to existing crawler scripts.

helsinkiandrew
0 replies
1d4h

Don't modern spam filters filter out most mails received this way and most spammers purchase lists for a specific targeted domains - house owners, porn users, dentists etc. rather than blindly scraping the web?

franky47
0 replies
1d6h

Ironically, the only spam I receive these days comes from the address I used here for the "Who wants to be hired" threads.

emayljames
0 replies
1d6h

a much easier way is to convert the email address into html entities. It then displays and can be copied, but the actual source code doesnt have the email address.

dxs
0 replies
1d5h

This is fun [2008]: https://web.archive.org/web/20180908103745/http://techblog.t...

"Nine ways to obfuscate e-mail addresses compared

"When displaying an e-mail address on a website you obviously want to obfuscate it to avoid it getting harvested by spammers. But which obfuscation method is the best one? I drove a test to find out."

dns_snek
0 replies
1d8h

Is there really a point to any of this? It's a fun exercise, but also a complete waste of time if you're actually trying to hide from spammers. You're making a piece of information public by sharing it with the entire world, yet somehow expecting it to only stay accessible to the "good guys".

Unless you change your email address at least monthly, all it takes is for one person or company to share your contact with someone else or enter it into a database/CRM, or one service to get breached, then your email address is on a list that eventually gets propagated to every spammer worldwide. If you use that email with any regularity, the chance of those things happening can be rounded up to 100%.

If hiding your email address from scrapers actually worked, spam wouldn't exist. I never published my personal contact anywhere, yet I get dozens of spam emails per week. They all get filtered as spam, it's not a big deal.

cantSpellSober
0 replies
1d5h

Can't be copied and pasted.

It's your domain, why not just have "contact@example.com" for incoming mail instead?

(Novel approach, thanks for sharing!)

_blk
0 replies
1d2h

Seems like a great solution but I'd like to embed the data directly rather than linking an external file. Then one issue I see is that dumb scrapers just look for the email address (also in the embedded SVG, which they might not for external <object> or <img> files.) But for direct embeds, if the string is not otherwise encoded, that could potentially leak the email address.

While this obviously (re)introduces JS into the mix, how would a simple compressed string fare against base64 svg embedding?

``` const compressedBase64Svg = '...';

function decompressAndInsertSVG(encodedData) { const decodedData = atob(compressedBase64Svg); const decompressedSvg = decompress(decodedData); const svgContainer = document.getElementById('svgContainer'); svgContainer.innerHTML = decompressedSvg; }

decompressAndInsertSVG(encodedSVG); ```

Etheryte
0 replies
1d9h

While the specific claim made about copying is true, you can right click and select copy email address, simply selecting the text and doing copy does not work. Similarly if you do select all into copy etc, so all in all, I wouldn't expect a regular user to be able to successfully copy this.

CodeWriter23
0 replies
1d1h

Seems kind of easy to defeat, just read the SVG to extract the email address from the mail to: link contained therein. Bonus the harvesting bots will now download all SVG files going forward.

ChrisMarshallNY
0 replies
1d5h

That's a pretty cool trick.

I was not aware that we could embed CSS in SVG.