return to table of content

Show HN: I made tool that let's you see everything about any website

Syntaf
3 replies
21h51m

I’m a bit confused on the “Threats” section, entering my member management startup https://embolt.app shows malware detected with a timestamp dating back to 2018 (we launched this year).

I checked out another startup I know of (https://highlight.io) and it listed the same results.

Maybe I’m misinterpreting what this section means?

nativeit
1 replies
16h58m

Off-topic: Embolt looks interesting. I am an I.T. consultant and sysadmin for a chiropractic trade association, and have been looking for something to help manage and grow their memberships. Do you offer any evaluation plans or free tiers (or are the transaction fees the only monetization involved)?

FYI, your account creation page is pushing a special offer that expired 6/2/2024.

Syntaf
0 replies
14h26m

Oops looks like we didn’t expire our offer banner correctly — thanks for the heads up.

We definitely offer evaluation plans & free tiers, if you want to give me a shout over at grant@embolt.app; I can help set you up with an account to try us and we’ll see if we can help!

ramon156
0 replies
20h30m

Should display "City: Home (:"

jenscow
0 replies
22h22m

that's my ip address!

SahAssar
0 replies
5h34m

The request is coming from inside the house!

quyleanh
2 replies
1d

Let’s add a function which lists up all sub domain.

leobg
0 replies
23h55m

+1. And reall well done. I like how you can scroll through to get a good overview without any one section being too long to break that high level overview flow. Excellently executed.

SahAssar
0 replies
5h23m

That's usually not possible. If they are not listed in the cert SAN (often you'd just have a wildcard for subdomains there) you'd need to enumerate them all which is not feasible.

8organicbits
2 replies
22h44m

Beautiful! Thanks for open sourcing this!

I've been working on a project [1] that probably wants to become a live crawler like this, but it's currently batch based. I'm focused on RSS feeds and microformats [2]. Can you share any details on what kind of performance / operational costs you're seeing while you're on the HN front page? The fly.toml looks like $5/month could suffice?

[1] https://alexsci.com/rss-blogroll-network/

[2] https://microformats.org/wiki/Main_Page

spaceship__sun
1 replies
19h33m

I'm not OP; I received ~100 thousand requests being on the front page once. It was an AI app and I quickly got rate limited to GCP Vertex AI lol.

SahAssar
0 replies
5h30m

100 thousand requests

Over how long time? Even if it's just over an hour that's just under 30 rps, over a day it's a little over 1 rps.

zerkten
1 replies
14h6m

Would be nice to be able to compare results between dates.

amadeusw
0 replies
13h59m

Agreed. Also, that could be a paid feature

valleyer
1 replies
1d

"Energy Usage for Load" is specified in "KWg". What does that mean? Is it a typo for "kWh"?

ffhhj
0 replies
23h17m

it's nuclear powered, g is for grams of uranium

butz
1 replies
23h9m

Neat, bonus points for colorful log messages in console. One thing though: any ideas what is causing horizontal scrollbar to appear in Firefox? I observe this issue on several websites, but never figured out the issue.

djbusby
0 replies
18h17m

Gotta use the inspector and find the too-wide element. That or a CSS rule for overflow is causing it.

KomoD
1 replies
1d6h

Maybe I'm misunderstanding but I think there's been a mistake with the "Bad URLs Count", it shows a date instead of what I'd expect (a number)

lissy93
0 replies
1d5h

oooh let me look into that. You're right; it should be a number.

Aachen
1 replies
17h50m

Every section has little (i) icons and all of them are useless.

For my site it shows under "Site Features" a "root authority". Okay that's new to me, let's see what that means. The full explanation is: "Checks which core features are present on a site." That's like answering "water" when someone asks "what's water?"

The use cases section of the info is similarly useless and additionally hyperbolic in most instances, such as: "DNSSEC information provides insight into an organization's level of cybersecurity maturity and potential vulnerabilities". If DNSSEC for one domain can tell me about the overall security maturity of an organisation as well as reveal potential vulnerabilities, please enlighten me because that'd be very useful for redteaming assignments

The thing detects January 1st 2008 as the page's content type, which makes no sense (checked with curl, that's indeed incorrect)

Server location is undefined at the top of the page (first impression; the section with the map) but later in the server info section it guesses a random city in the right country

It reports page energy consumption in KWg. Kelvin×Watt×grams, is this a typo for kWh? One kWh is about as much energy as 50 smartphone batteries can hold, as if a page (as measured by its size in bytes) would ever use that amount of energy. You can download many 4k movies on one smartphone charge (also when considering the power consumption of routers), surely that's not the unit being used to judge html weight?

The raw json results, where I was hoping fields might have clearer (technical) labels than the page, remains blank when trying to open it

Overall, I'm not sure what the intended use of this site is. It presents random pieces of information with misleading contextualisation and no technical explanation, some of which show incorrect values and many of which don't work (failing to load or showing error values like undefined). Maybe tackle it in sections, rethinking what the actual goal is here and, once you've identified one, writing that goal into the "use cases" section and implementing it, finally writing in the "what is this" section what it is the site is checking for, then repeat for the next useful piece of information you can come up with, etc.

nativeit
0 replies
16h0m

Well this was an entertaining 15-minute rabbit hole.

The energy consumption metric (KWg) should be more clearly defined with some context info, as it's not even remotely standardized, or even commonly used--it took some effort to track down what it's actually measuring. According to another site[1] dedicated to sustainability, "KWg" is "kilowatts consumed per gigabyte" (presumably per gigabyte transferred), so should probably be marked as "kWGB", if it's going to exist at all.

The data seems to be drawn from the Website Carbon Calculator API, which states that "If your hosting provider has not registered the IP address of your server with The Green Web Foundation, we will not be able to detect it."[2] I visited the Green Web Foundation's website[3], which appears to provide exact same services and data as the Website Carbon Calculator, which is an ironically wasteful endeavor--I'm making requests to three separate endpoints just to get an apparently arbitrary number back? I ran the test on my website, and it correctly identified my host, and strangely did not offer any kind of quantitative values and instead just gave a binary "Green" or "Not Green" determination and badge. It did at least provided some additional context, in the form of OVHCloud's Universal Registration Document[4] from FY2023, which includes a chapter on sustainability efforts, and while that was far more helpful than anything else this exercise had revealed up to this point, it notably did not provide any "kWGB" measurements, or any other site-specific energy consumption data that I could find which would facilitate calculating any sort of energy per-unit data, especially not that could then be attributable to and/or derived towards a single website that's being served from a virtual machine running on a dedicated baremetal server in one of their global data centers.

Tldr; I'm fairly certain this is just meaningless filler data from a service that's probably just a corporate green-washing badge backed by little more than the faint whiff of due diligence.

EDIT: Formatting

---

1. https://s2group.cs.vu.nl/2022-08-04-web-emissions/

2. https://www.websitecarbon.com/faq/

3. https://www.thegreenwebfoundation.org/green-web-check/

4. https://corporate.ovhcloud.com/sites/default/files/2023-11/o...

whydoineedthis
0 replies
15h37m

It doesn't work very well. I put in my own web address, which is definitely behind cloudfront, and it said it's unprotected, as well as a bunch of other vulns it doesn't have.

thwarted
0 replies
20h56m

Checking two websites/domains I'm responsible for, this information is really confusing or just plain wrong. The "DNS Records" card for MX is not the IP addresses of the actual MX records (nor am I sure why it would be -- why wouldn't the MX records be shown here?). "DNS Server" is the addresses of the webservers, not the DNS servers for the domain from whois or from the SOA record. It can show certificate information, but not the cipher suites? Traceroute fails because traceroute isn't available/isn't in path (the error shown is "/bin/sh: line 1: traceroute: command not found"). Firewall seems to be looking specifically for a web-application-firewall, but "firewall" is a somewhat generic term that includes a number of different technologies. Email configuration is wrong, probably because a website is not the same as a domain -- I don't have SPF or DKIM records for the www subdomain, because that's not where we send email from. The "Redirects" card says it followed one redirect, but there is no redirect on the address I provided.

Does this come down to trying to stuff a bunch of stuff for domains into a presentation and information gathering method for websites?

For cases where it can not be determined, it would be best to say "can not be determined" rather than "No", because the last thing anyone needs is some PHB giving people grief because, for example, the WAF in use doesn't expose itself to this detector.

thepra
0 replies
11h5m

In the tech-stack it gives me "Chromium not found"...

swiftcoder
0 replies
10h55m

Seems like the hostname section detected a different site entirely to the one I input (some site that shared the same IP long ago?), and the mail section failed to detect my (valid, according to gmail) DKIM records entirely...

simple10
0 replies
20h24m

The docker version[1] worked better for me to test out. The free website version does not have all the features (like Chromium) enabled which is why some of the report data is missing or incorrect.

Looks like a super promising project! Thanks for building and sharing.

[1] https://hub.docker.com/r/lissy93/web-check

scubbo
0 replies
22h58m

Very cool tool!

mutant
0 replies
15h15m

This service is scraping data from somewhere else, it reports us on Amazon and we migrated to gcp a year ago.

mike-cardwell
0 replies
7h43m

It says my domain "grepular.com" doesn't have dnssec. It does. It also says I don't use DKIM or DMARC. I do.

johng
0 replies
23h10m

This is really neat, kudos!

jacobprall
0 replies
22h47m

I enjoyed the UI, cool aesthetic.

j1elo
0 replies
19h26m

For some reason the Quality check was always failing with an error 403, even though I had followed the link to create a Google API key and passed it as an env var to the Docker container.

Ended up cloning the project to see by myself what URL it uses... turns out that the Google API was returning a JSON document with instructions to enable the PageSpeed Insights API! I'd never used Google Cloud before, so I had been a bit clueless until that point :-)

My suggestion is that the "Show Error" button showed the actual output of the API calls, because otherwise this very useful JSON from Google was being lost in translation.

Now that I checked the code it's clear that there are actually 2 things to enable that are accessed with the API key:

* PageSpeed Insights API: https://console.cloud.google.com/apis/library/pagespeedonlin...

* Safe Browsing API: https://console.cloud.google.com/apis/api/safebrowsing.googl...

So I'd suggest adding this info to either or both of the README and the app itself.

Otherwise, a very very cool project! I've been checking several of my sites for the last hour.

iso8859-1
0 replies
19h3m

How do I see how a site is handled in DNS?

For example https://www.whatsmydns.net/#A/www.bispebjerghospital.dk shows that the address is only resolvable from some locations.

I contacted the hostmaster and they admitted they have blocking in the DNS server.

Would be nice to see this also on this site.

g4zj
0 replies
23h26m

The AAAA record listing seems to only display the A record value(s).

fguerraz
0 replies
11h44m

So broken that it’s probably just a tool to collect URLs

efilife
0 replies
21h2m

A correction to the post's title: https://youryoure.com/?apostrophe Should have been lets. Those are two different words with different meanings!

Great site btw

daflip
0 replies
10h9m

If the scheme is not lowercase it seems to erroneously detect malware and provides a zip file url for some malware which does not exist on the page. Seems like a bug !

example URL "with" malware: Https://cnn.com example URL without malware: https://cnn.com

compootr
0 replies
11h21m

everything about any website

you're missing subdomains & certs, a very crucial part of investigations imo

brightmood
0 replies
23h40m

I have a issue with the website background - on a high refresh rate display with 240Hz, the background animation is incredibly fast and its super distracting.

breck
0 replies
1d2h

Hey that was a pleasantly great experience.

I don't have anything to add. Nicely done.

Thanks!

banku_brougham
0 replies
23h37m

Amazing! Reminds me that I need to learn a bunch of stuff I know nothing about.

andrew_shay
0 replies
12h53m

Very cool

SahAssar
0 replies
5h11m

Looks nice, some feedback though:

It shows my dnssec as not present even though https://dnssec-analyzer.verisignlabs.com/ which it links to shows all green for my test site.

The DNS records panel seems a bit broken, it shows my SPF record as the NS ("NS v=spf1 mx -all").

The Server Records panel has a "ports" entry, but that only shows the first open port (for me 22).

When showing Response Time its pretty critical to show where you requested it from. Since you're showing the "location" of the server you could even subtract/show what part of the response time is due to distance latency (or ping the server and use the RTT).

It'd be useful to show things like what protocol is used (http, h2, h3), what cipher was used, etc.

Global Ranking chart should perhaps be inverted? Currently it goes down the more popular the site becomes.

TLS Security Issues & TLS Cipher Suites just send undefined to the tls-observatory site (https://tls-observatory.services.mozilla.com/api/v1/results?...).

HSTS without subdomains shows as "No", there should probably be different levels for "none", "without subdomains", "without preload", "with preload" "in the preload list".

PaulDavisThe1st
0 replies
21h24m

Seems like there may be some issue with the crawl rules. What is it looking for that would to the error "t.robots is not defined" ?

Ahmd72
0 replies
20h58m

I have been using this and I have got to say this is one of the best open source projects at least for me as I need to look up URLs reputation and this is highly helpful in how everything is organized as cards. One screen to get all the helpful information you need. I'm looking forward to the API version and if I could use this as a replacement for VT. I did notice one thing sometimes when you lookup a URL you don't get back any response and when you check network activity tab on a browser you see the requests are getting rejected

6510
0 replies
13h46m

typing example.com should be fine, I tried www.example.com which also didn't work, it had to be https:/ /www.example.com (I didn't try https:/ /example.com )