HN comments for: Mapping almost every law, regulation and case in Australia

19 replies

14h53m

2024-03-22 08:30:52 UTC

This is great. This sentence struck a chord with me in particular:

    Imagine applying these techniques on the Common Crawl
    You would be able to produce a ... map of the internet.

Making maps of things not usually on maps has been my passion for years. And I made many of them. One of the more popular ones that some of you might know is the Music-Map:

https://www.music-map.com

I have had the urge to make a map of the web for quite a while. Already registered the web-map.com domain for it. I did some experiments, built a custom crawler and an algorithm which finds related websites fast. It showed that the project would be feasible.

But I hold back on doing it, because I already run multiple experimental maps and have yet to come up with a business model for "making maps of everything".

rmnclmnt

4 replies

13h3m

2024-03-22 10:21:14 UTC

So cool, thanks for sharing! I see you've also done it for movies, which is pretty cool and useful.

I could not find any technical details on the input data / feature extraction / clustering method used in these tool. Do you mind sharing what you have used so far?

3 replies

12h45m

2024-03-22 10:38:49 UTC

The Music-Map and the Movie-Map are based on user preferences. The Music-Map is based on https://www.gnoosic.com and the Movie-Map on https://www.gnovies.com, two AI projects I started before the maps.

The AI and the mapping algorithm are my own developments. I was mostly inspired by thinkers like Douglas Hofstadter and John R. Koza.

tomthe

2 replies

10h19m

2024-03-22 13:04:41 UTC

It is really cool and useful. Interesting that you were able to gather enough data from users to make it work. I guess it was much less useful in the beginning?

I thought of making something similar with data from https://musicbrainz.org/

1 replies

9h23m

2024-03-22 14:01:20 UTC

Yes, in the beginning pretty much everybody hated it and thought the project was nuts. I got pretty much no positive feedback but lots of negative. I was like "But it's learning! It's learning!" :) Strangely, that convinced almost nobody, even among my friends.

Now that many millions of people have used it, I get a lot of great, often enthusiastic feedback on how Gnod makes the best recommendations.

That teached me that you can't convince people with just an idea. For most people, you have to deliver something which is already useful to them.

Citizen_Lame

0 replies

7h35m

2024-03-22 15:48:48 UTC

Your effort is appreciated, but recommendations miss the mark by a considerable margin, to say at least.

snats

2 replies

10h12m

2024-03-22 13:12:20 UTC

I built a map of all the PDF urls on the internet recently.

I used a tiny embeddings model and PCA for dimensionality reduction.

https://weblog.snats.xyz/posts/2024/03/20/

ubutler

1 replies

9h37m

2024-03-22 13:46:32 UTC

Interesting, did you try also using PaCMAP or UMAP for dimensionality reduction? It might result in a more meaningful representation of their underlying semantic structure: see the 'mammoth' example in my article.

snats

0 replies

7h4m

2024-03-22 16:19:53 UTC

No! I only tried PCA, but I still have the embeddings.

I'll try later and post results.

vsnf

1 replies

14h4m

2024-03-22 09:19:34 UTC

I had something similar once - it was a graph of connections between all the artists in my Spotify library to see who had collab'd with who. It was a lot of fun to see just how distantly connected two artists were through a long chain of collabs and collabs. Of course, like most human connection maps, it mostly came down to a handful of super-connectors who collaborate with hundreds of people, who in turn collaborate with their own niche groups. But there were some interesting groups revealed by it.

dylan604

0 replies

3h20m

2024-03-22 20:04:29 UTC

I was halfway expecting a 6-degree to Kevin Bacon reference here. Disregarding the actual Bacon, I was almost hoping a similar effect from any 2 artists can be connected in 1 Bacon or less

itshossein

1 replies

7h46m

2024-03-22 15:37:30 UTC

Great job! There is a form to report typos. Anywhere for duplicates and more complicated errors?

0 replies

7h34m

2024-03-22 15:49:53 UTC

What is the difference between a typo and a duplicate? If you mean that two ways of writing the same name are both legit, then you have to decide on one being the more "correct" one. After a while Gnod will figure out which one is the more common name.

And "complicated errors"?

alwyn

1 replies

12h55m

2024-03-22 10:28:46 UTC

You are the creator? Thank you for what you do! I've used it with pleasure for many years.

0 replies

12h49m

2024-03-22 10:34:39 UTC

Yes. Happy you like it!

Groxx

1 replies

10h5m

2024-03-22 13:19:26 UTC

You might also like: https://everynoise.com/

quenix

0 replies

4h23m

2024-03-22 19:00:39 UTC

This one is mesmerizing. Highly recommend checking it out.

ubutler

0 replies

13h40m

2024-03-22 09:44:18 UTC

Thanks for sharing that map, I’m going to start using it to discover new artists :)

I’d love to see a semantic map of the internet, I’m considering having a crack it as well, but it’d be a monumental task. There is this cool map but it’s quite dated: http://internet-map.net/

mbo

0 replies

8h59m

2024-03-22 14:24:44 UTC

I did something similar for fragrances a little while ago: https://observablehq.com/@55th/every-fragrance-at-once

jcul

0 replies

12h28m

2024-03-22 10:56:13 UTC

Very cool. I've immediately found some music I really like that I've never heard before.

epgui

4 replies

8h30m

2024-03-22 14:54:17 UTC

I think visualizing it like this is very strange. I am not a legal expert but I have read a lot of law textbooks.

Normally, I’d expect blackletter law to form a somewhat sparse, tentacle-like structure.

Case law (or “cases” or “jurisprudence”) is by its nature largely interstitial: it consists of judges “filling in the holes” that are left by any unclear meaning (requiring interpretation) of blackletter law, or in some cases by the absence of such.

Having case law and blackletter law form two distinct clusters makes no sense to me: I really think it’s a domain modelling error. It’s what I would expect to see if one applied a text similarity measure naively to some data set, without regard for the domain models.

ubutler

3 replies

8h10m

2024-03-22 15:14:28 UTC

As I note in my article, the language and style employed in Australian judgments is different from that employed in statute. Furthermore, in common law countries like Australia, you have many legal concepts that have developed independently of statute and either remain independent or have been formalised into statute (see, eg, torts: https://www.alrc.gov.au/publication/traditional-rights-and-f...).

epgui

2 replies

7h34m

2024-03-22 15:50:21 UTC

I understand that, but there is a difference between text similarity and semantic similarity. You claim to have performed semantic clustering, but what I am seeing, and what you are saying in your response to my comment, has less to do with semantics and more to do with superficial textual encodings.

Case law and blackletter law will obviously look very different in terms of their textual representation, style, formatting, etc... And this will be true even when they pertain to the same ideas and the same concepts.

To state the obvious, semantics is about the meaning of things, not about style and not about specific word choices or specific syntactical forms (although sometimes these carry meaning as well).

ubutler

1 replies

7h28m

2024-03-22 15:56:23 UTC

Furthermore, in common law countries like Australia, you have many legal concepts that have developed independently of statute and either remain independent or have been formalised into statute.

This is the bigger point. In my own university studies, there was a clear segmentation between the common law and statute, although they are certainly interrelated.

It’s also worth noting that the boundaries between cases and legislation were not absolute, there were areas of the cases ‘mainland’ that contained legislation.

My point on the style was that in addition to differences in purposes, they are also textually different, which can indeed bleed into semantics.

epgui

0 replies

7h24m

2024-03-22 16:00:16 UTC

The point is not lost on me. Certainly tort law, contract law, administrative law, and many other areas of law aren't usually sourced from blackletter law as much as from jurisprudence or other sources of law.

I think this very point you're trying to make would be more persuasive if the analysis had modelled the relationships that do exist between blackletter law and case law. As we have already discussed, text similarity may not suffice to reveal these relationships. And while these relationships don't always exist, when they do exist they are very strong.

Simon_ORourke

3 replies

11h39m

2024-03-22 11:44:48 UTC

Does this make it any way easier to replace lawyers with an LLM or expert system?

ivyirwin

1 replies

11h22m

2024-03-22 12:02:04 UTC

Not OP but working on a project in similar domain (ndaok.com). The technology is definitely making it easier to replace lawyers. The biggest barrier right now is lawyers themselves. In fact our project stopped trying to sell to lawyers because it's almost like they purposefully refuse to adapt new technology. Instead we've had success with customers trying to find a way not to use lawyers when they are not needed.

Simon_ORourke

0 replies

8h26m

2024-03-22 14:58:25 UTC

trying to find a way not to use lawyers when they are not needed.

Kudos to you guys, the elimination of the need for lawyers is up there with any societal issue you care to name. It may do more for social justice than funding anything else

lmeyerov

0 replies

8h42m

2024-03-22 14:41:50 UTC

This is the heart of most real generative AI systems for reasoning about text: index data using this basic technique (chunked document embeddings), and when talking to the AI, the AI looks up documents from these clusters and loads them in as context for making the answer. Many ways to improve over this, but it's the heart.

In our case (louie.ai), users will have vector indexed their documents into a scalable database like OpenSearch/elasticsearch, or we help them do it, and they can talk to the data, visualize it, run analytics, etc. For example, "get everything on koala adoption from the last decade and draw as a clustered map" would generate a hybrid query to find "semantically similar" documents based on vectors and also symbolically on the time stamps, run it, and then decide to do the followup step of visualizing it using the same family of viz technique in the article. We haven't tried law yet, but already do this for areas like disaster, crime, & misinfo intelligence from social media & news. (Imagine: "Alert me when ..." or "summarize what...").

We find this approach fast and easy, but for very important questions, lower quality than we would like. Imagine a scenario like case law around koalas changing precedent over time. RAG using Langchain/LLMindex + OpenAI over a vector index doesn't solve that kind of thing out of the box. But they are solveable, and it's pretty fun to work through these kinda of issues :)

jasonjei

2 replies

5h54m

2024-03-22 17:30:15 UTC

I’ve noticed in many commonwealth countries there is no official codification of case law, administrative law, and statutory law passed by the legislative body and receiving assent from the executive branch.

The US being a hard fork of the commonwealth has the official US code and state codes—attempts to organize impacts of case law, admin law, passed law, etc—but Canada has pockets of codification (the Criminal Code), but not all acts of Parliament are organized in a single code. The UK as far as I can tell has no such thing in England or Wales. Hong Kong has some semblance of codification with the Basic Law and ordinances. Does Australia have codification at a federal or state level?

dragonwriter

1 replies

5h48m

2024-03-22 17:36:22 UTC

The US has the US code and state codes—it attempts to organize impacts of case law, admin law, passed law, etc

Um, yes and no.

“US Code” is statute law. The “Code of Federal Regulations" is admin law. There is no codification of case law; there are reporters, but they are just a flow of case results, similar to the sequential publication of statutes in places that don't codify statute law (and those that do, too, but for most purposes where they do the codification is more generally useful for most purposes.)

The states are generally similar: there is codification of statute and admin law, but not of case law.

jasonjei

0 replies

5h42m

2024-03-22 17:42:00 UTC

Got it. I was just wondering if other commonwealth countries had an equivalent of a US code or Code of Federal Regulations that documented law in a centralized store. IANAL, but law seems to have so many distinct sources—and more curiously, does codification help a lawyer with the job?

isoprophlex

2 replies

14h3m

2024-03-22 09:20:42 UTC

Thank you so much for replacing the interactive visuals with screenshots on mobile! Makes for a much better experience reading this on my phone.

ubutler

1 replies

13h3m

2024-03-22 10:21:06 UTC

I'm glad you appreciated that touch :) Seeing as 59% of my readers are on mobile, I thought it'd be better to have a static image rather than an interactive map which would be pretty unusable on a phone.

amand33p

0 replies

11h46m

2024-03-22 11:38:24 UTC

I second it. But there's a bug. If we reduce the browser window width, and re-increase it, the charts stay in non-interactive state.

dleeftink

2 replies

13h33m

2024-03-22 09:50:40 UTC

"we can also see that Australian case law is a continuum of sorts"

It definitely provides a pretty picture, but just wanted to emphasise the map !== territory addage. The continuum may rather be a function of the projection, chosen similarity metric and so on.

That does not mean we cannot learn from the map, but that the actual 'knowledge structure' of the sum of documents may not be a convenient continuum at all.

In any case, the way you've documented this project is remarkable, and it does provide a novel view of the Australian legal sphere. Thanks for sharing!

ubutler

1 replies

13h6m

2024-03-22 10:17:57 UTC

It definitely provides a pretty picture, but just wanted to emphasise the map !== territory addage.

You're right — my map does not necessarily represent the underlying semantic structure of Australian law, it is an approximation, one that is biased by the data I used (which as I mentioned, is missing laws and cases from a number of jurisdictions), the embedding model I selected and the dimensionality reduction model I used to project my embeddings into a two-dimensional space, to name a few.

Because I was writing for both legal and data science audiences, I tried to avoid sounding like my inferences are anything more than just inferences but without getting too technical and explaining the inherent limitations of any attempt to semantically map knowledge with today's technology.

I will just say though that, having studied law myself, Australian case law is indeed somewhat of a continuum. A single case may touch on many areas of law and there are no restrictions in terms of subject matter on what precedents a judge may draw upon in reaching a decision, apart from that they are both relevant and binding (or, if they are not binding, are not treated as such).

It was also interesting to observe how the final clusters that developed were uncannily similar to the way in which I was taught law at university. It goes to show, there's a lot of thought put into the design of our legal courses here in Australia. In fact, there are 11 subjects that are mandatory, known as the Priestley 11: https://en.wikipedia.org/wiki/Priestley_11. All of those are reflected on the map, although some have been rolled up into larger categories or divided by other means.

mistermann

0 replies

10h3m

2024-03-22 13:21:21 UTC

I think it can sometimes be useful to take this map != territory concept further - all instances of map != territory are not equal, some have the potential for higher utility than others. And, I would estimate that concepts/methodologies like this (anything that provides humans new ways to examine and conceptualize important matters) almost certainly have higher potential than standard, run of the mill instances of map != territory (the likelihood of us being able to find and harvest that utility is another layer of complexity, but then so is the notion that utility is often found not only in the destination, but also in the journey). (Unfortunately, modal logic notation seems to currently have no support for describing these sorts of concepts, at least according to ChatGPT).

The "so what?" of it is that if people (particularly smart ones) exclude these additional concepts from their logical consideration, it is possible that the idea could be dismissed, or have its potential importance estimated to be lower than it actually/potentially is, potentially leading to an outcome whereby this map or the underlying methodology (applied to other domains) is not maximally exploited to achieve positive outcomes.

bbor

2 replies

12h47m

2024-03-22 10:36:51 UTC

Amazing work. As someone doing self-funded web dev, how do you find the time to work on this? Is this a resume booster, a product/prototype, or just a passion of love? To say the least this is groundbreaking.

I love your technical explanations, even tho I started skimming there. It appears this is all built on modern embedding algorithms, plus traditional ML clustering magic. Now that you have the basic data, have you thought about using full generative models for semantic analysis? Ie “write summaries of this subset of cases and tag them with specific situations or intricacies”, and then do clustering on that? I feel like that’s the natural next computational step, and surely (hopefully?) what the many millions/billions of dollars worth of SWEs that were put to work applying LLMs to case law over the past year in America are up to.

The very best projects on here are ones where I’m tempted to ask to collaborate, even though I know I’m already booked up with work through the horizon! I’ll have to console myself with a comment and a very prestigious place in my “inspirations” bookmark folder :)

defrost

1 replies

12h41m

2024-03-22 10:42:34 UTC

The blog about might interest you:

https://umarbutler.com/about/

    I’m Umar Butler, an Australian data scientist, legal technologist and AI researcher. This is my blog where I write about law, technology, AI and everything in between.

    As part of my research into legal technology and AI, I have published, inter alia, the first dataset for training LLMs on Australian law, the largest open database of Australian law and the first open LLM for Australian law.

    I currently serve as the Assistant Director of Data Science at the Attorney-General’s Department. My work centres around the responsible use of AI to enable, accelerate and enhance public decision making and legal and policy analysis, in addition to consulting on the development of key AI policy.

bbor

0 replies

1h31m

2024-03-22 21:52:42 UTC

WOW ok, thanks so much for doing my homework for me! I guess I just have to look into a high level government position that encourages me to follow my own interests, easy peasy…

infostud

1 replies

14h43m

2024-03-22 08:40:47 UTC

Thank you for this effort. Did you access data from http://austlii.edu.au ?

ubutler

0 replies

13h0m

2024-03-22 10:23:31 UTC

Nope, the map is built atop the Open Australian Legal Corpus, which is the first open database of Australian law (you can read about how I built it here: https://umarbutler.com/how-i-built-the-largest-open-database...). Unfortunately, AustLII is free but not open-source (as in licensed under an open-source licence: https://austlii.edu.au/copyright.html).

defrost

1 replies

14h27m

2024-03-22 08:57:10 UTC

Really nice writeup, I appreciate the work you've put into that in both the descriptive analysis of the data and the technical breakdown of the process.

ubutler

0 replies

12h20m

2024-03-22 11:04:19 UTC

Thank you :)

chottocharaii

1 replies

8h31m

2024-03-22 14:52:33 UTC

"My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning."

I think Jade.io has had a go at this, IIRC. This isn't to detract upon your amazing work though, great stuff.

ubutler

0 replies

8h8m

2024-03-22 15:16:19 UTC

Thank you :). Would you mind sharing what you have in mind? I haven't come across a visualised semantic map of state and federal Australian laws, cases and regulations before.

boffinAudio

1 replies

14h31m

2024-03-22 08:53:23 UTC

This is really awesome, thanks for the work and thanks for sharing.

This is a really interesting form of mapping - would you consider doing it for the original occupant's languages, as well?

Australian law itself is fascinating - those outliers on the edges of some of the trails are very curious - is this indicating that some of this material is authored, possibly by the same people/groups whose ontology is transferred with each new document?

I'd love to see this semantic map for the original occupants languages.

It would also be interesting to see Australia's human rights proclamations and related legislature, as well as its military orders and authorizations for involvement in the 5-eyes catastrophe somehow, semantically, in this context.

defrost

0 replies

12h36m

2024-03-22 10:47:52 UTC

would you consider doing it for the original occupant's languages, as well?

Bit of a challenge as of the many languges, few are still actively spoken and, as oral unwritten languages, there's an issue with inconsistent European spelling creating text that truly native speakers still have to learn to read.

For your interest; Aboriginal Language Groups: https://mgnsw.org.au/wp-content/uploads/2019/01/map_col_high...

MisterDizzy

1 replies

6h52m

2024-03-22 16:32:18 UTC

Seems like quite a project. And very useful.

Australia is the perfect example of when too many well-meaning people who think they can solve everything with more government power are given too much capability to see their vision through to its logical conclusion. It ends up making most of the problems it tries to solve far worse, and nobody has the guts to pull the plug on the programs that aren't functioning.

techbrovanguard

0 replies

6h3m

2024-03-22 17:20:52 UTC

Clearly, the solution is more neoliberalism.

IIAOPSW

1 replies

7h28m

2024-03-22 15:56:28 UTC

I've been dealing with some matters in the Australian legal system, for a long while self represented and self taught but recently with a solicitor. I've read a number of acts for myself, procedural civil and criminal, and have even run into the invisible wall between legislation and case law.

This has been shockingly pertinent to my interests and I thank you for compiling it. My only gripe is that you didn't post it several months prior when it would have been most helpful to me ;)

ubutler

0 replies

7h22m

2024-03-22 16:02:06 UTC

I've read a number of acts for myself, procedural civil and criminal, and have even run into the invisible wall between legislation and case law.

Glad to hear it corresponded with your lived experience, it really was surprising to see how the map correlated with my own understandings of the law developed through my degree!

throwup238

0 replies

7h53m

2024-03-22 15:30:38 UTC

You need to get in contact with Rob Sitch. They can probably make a whole season of Utopia based around this!

sevenseventen

0 replies

7h32m

2024-03-22 15:51:59 UTC

Mapping the internet as a whole has been a thing for quite a while, going back to Kumar et al in 2000. https://scholar.google.com/citations?view_op=view_citation&h...

I recall at least one of those papers characterizing the shape as resembling a bow-tie.

This and other early contributions were looking at the link structure of the internet, not textual similarity, though.

sema4hacker

0 replies

6h21m

2024-03-22 17:03:28 UTC

Most of your work seems over my head, but doesn't the "mammoth" example indicate that by tweaking numbers you can end up getting just about any visual blob you want?

mmsc

0 replies

10h22m

2024-03-22 13:02:07 UTC

Cool stuff, reminds me of “a Canadian payroll dependency chart” https://news.ycombinator.com/item?id=38843388

jordanpg

0 replies

9h43m

2024-03-22 13:41:16 UTC

This is very cool, congratulations.

When I was in law school, I sometimes visualized the "common law" as a web of interdependencies. This is a similar visualization, although it doesn't quite capture the dependencies, at least as I have always imagined it.

For context, the common law refers to law made by (mostly) appellate judges. Sometimes it's built on top of statutory law (e.g., providing meaning, interpretation, or definition to statutory laws) and sometimes it's completely made up, when there is no law "on point." It's made up in the sense that it's constructed on top of a long trail of historical precedent, sometimes going all the way back to Victorian-era England or even older. Really.

(Aside: This is why certain individuals sound so silly when they rail against "judge-made law" in the US. Virtually all law in the US is "judge-made law.")

Anyway, the common law has always seemed to me to be amenable to representation as a graph-like structure where nodes are cases or precedents and the edges somehow encode the strength of the support for the precedent. I think judges might think twice about breaking from precedent (which can be virtuous or not, depending on your viewpoint) if they could see a visualization of how strong the precedent is.

This representation is a step in that direction and I hope your tech can be extended to other common law countries!

ivanoconnor

0 replies

13h34m

2024-03-22 09:49:39 UTC

Last year, I had a similar idea to "map out" case law and legislation in the UK — as usual, though, life got in the way and it's ended up joining my vast collection of half-finished projects. Having read your excellent writeup, I'm now feeling rather inspired to give it another try! :)

guidedlight

0 replies

12h4m

2024-03-22 11:20:08 UTC

*Except Victoria by the looks of it. :-(

green-eclipse

0 replies

8h7m

2024-03-22 15:16:39 UTC

Would be great to see some of Fisk's cases in here /s

feliixh

0 replies

4h50m

2024-03-22 18:33:33 UTC

Great job, I intend to reproduce this on a similar dataset I've been collecting!

I will say, it would be great to see the color labeling done on domain url alone, to see how much of the topography of the map is driven simply by the different formatting characteristics of the websites you're gathering data from.

contingencies

0 replies

12h6m

2024-03-22 11:17:43 UTC

The problem with Australian law, and I suspect most law, is that the practical problems of the actual system appear to be less about the theory and more about the absence of enforcement, oversight and due process.

amb1337

0 replies

14h33m

2024-03-22 08:50:43 UTC

This approach could be used to build a global map of AI and/or data privacy legislation and cases that would be potentially very valuable and useful, particularly for startups.

adammarples

0 replies

5h35m

2024-03-22 17:49:05 UTC

Would it be correct in saying that a semantic map, clustered by meaning, might be pushing it? If the data are word embeddings, then you'd hope that they have distilled the semantics in the raw text but as you said yourself, they are also heavily influenced by style and who knows what else, to the point that semantically identical but syntactically different texts might have different clusters? Think, if half of the texts were in French, would you keep the same semantic map or would you have a French continent and an English continent?

TheCaptain4815

0 replies

9h41m

2024-03-22 13:43:21 UTC

This is such an interesting use of semantic representation. I wonder if it could be used to map out cases vs outcomes, and determine sentencing outliers.

Hammershaft

0 replies

1h59m

2024-03-22 21:24:50 UTC

The central shape created by this dataviz could make for an interesting island shape.

6510

0 replies

4h50m

2024-03-22 18:34:15 UTC

This is great, with so many laws it is hard to get any kind of overview.