return to table of content

Mapping almost every law, regulation and case in Australia

mg
19 replies
14h53m

This is great. This sentence struck a chord with me in particular:

    Imagine applying these techniques on the Common Crawl
    You would be able to produce a ... map of the internet.
Making maps of things not usually on maps has been my passion for years. And I made many of them. One of the more popular ones that some of you might know is the Music-Map:

https://www.music-map.com

I have had the urge to make a map of the web for quite a while. Already registered the web-map.com domain for it. I did some experiments, built a custom crawler and an algorithm which finds related websites fast. It showed that the project would be feasible.

But I hold back on doing it, because I already run multiple experimental maps and have yet to come up with a business model for "making maps of everything".

rmnclmnt
4 replies
13h3m

So cool, thanks for sharing! I see you've also done it for movies, which is pretty cool and useful.

I could not find any technical details on the input data / feature extraction / clustering method used in these tool. Do you mind sharing what you have used so far?

mg
3 replies
12h45m

The Music-Map and the Movie-Map are based on user preferences. The Music-Map is based on https://www.gnoosic.com and the Movie-Map on https://www.gnovies.com, two AI projects I started before the maps.

The AI and the mapping algorithm are my own developments. I was mostly inspired by thinkers like Douglas Hofstadter and John R. Koza.

tomthe
2 replies
10h19m

It is really cool and useful. Interesting that you were able to gather enough data from users to make it work. I guess it was much less useful in the beginning?

I thought of making something similar with data from https://musicbrainz.org/

mg
1 replies
9h23m

Yes, in the beginning pretty much everybody hated it and thought the project was nuts. I got pretty much no positive feedback but lots of negative. I was like "But it's learning! It's learning!" :) Strangely, that convinced almost nobody, even among my friends.

Now that many millions of people have used it, I get a lot of great, often enthusiastic feedback on how Gnod makes the best recommendations.

That teached me that you can't convince people with just an idea. For most people, you have to deliver something which is already useful to them.

Citizen_Lame
0 replies
7h35m

Your effort is appreciated, but recommendations miss the mark by a considerable margin, to say at least.

snats
2 replies
10h12m

I built a map of all the PDF urls on the internet recently.

I used a tiny embeddings model and PCA for dimensionality reduction.

https://weblog.snats.xyz/posts/2024/03/20/

ubutler
1 replies
9h37m

Interesting, did you try also using PaCMAP or UMAP for dimensionality reduction? It might result in a more meaningful representation of their underlying semantic structure: see the 'mammoth' example in my article.

snats
0 replies
7h4m

No! I only tried PCA, but I still have the embeddings.

I'll try later and post results.

vsnf
1 replies
14h4m

I had something similar once - it was a graph of connections between all the artists in my Spotify library to see who had collab'd with who. It was a lot of fun to see just how distantly connected two artists were through a long chain of collabs and collabs. Of course, like most human connection maps, it mostly came down to a handful of super-connectors who collaborate with hundreds of people, who in turn collaborate with their own niche groups. But there were some interesting groups revealed by it.

dylan604
0 replies
3h20m

I was halfway expecting a 6-degree to Kevin Bacon reference here. Disregarding the actual Bacon, I was almost hoping a similar effect from any 2 artists can be connected in 1 Bacon or less

itshossein
1 replies
7h46m

Great job! There is a form to report typos. Anywhere for duplicates and more complicated errors?

mg
0 replies
7h34m

What is the difference between a typo and a duplicate? If you mean that two ways of writing the same name are both legit, then you have to decide on one being the more "correct" one. After a while Gnod will figure out which one is the more common name.

And "complicated errors"?

alwyn
1 replies
12h55m

You are the creator? Thank you for what you do! I've used it with pleasure for many years.

mg
0 replies
12h49m

Yes. Happy you like it!

quenix
0 replies
4h23m

This one is mesmerizing. Highly recommend checking it out.

ubutler
0 replies
13h40m

Thanks for sharing that map, I’m going to start using it to discover new artists :)

I’d love to see a semantic map of the internet, I’m considering having a crack it as well, but it’d be a monumental task. There is this cool map but it’s quite dated: http://internet-map.net/

jcul
0 replies
12h28m

Very cool. I've immediately found some music I really like that I've never heard before.

epgui
4 replies
8h30m

I think visualizing it like this is very strange. I am not a legal expert but I have read a lot of law textbooks.

Normally, I’d expect blackletter law to form a somewhat sparse, tentacle-like structure.

Case law (or “cases” or “jurisprudence”) is by its nature largely interstitial: it consists of judges “filling in the holes” that are left by any unclear meaning (requiring interpretation) of blackletter law, or in some cases by the absence of such.

Having case law and blackletter law form two distinct clusters makes no sense to me: I really think it’s a domain modelling error. It’s what I would expect to see if one applied a text similarity measure naively to some data set, without regard for the domain models.

ubutler
3 replies
8h10m

As I note in my article, the language and style employed in Australian judgments is different from that employed in statute. Furthermore, in common law countries like Australia, you have many legal concepts that have developed independently of statute and either remain independent or have been formalised into statute (see, eg, torts: https://www.alrc.gov.au/publication/traditional-rights-and-f...).

epgui
2 replies
7h34m

I understand that, but there is a difference between text similarity and semantic similarity. You claim to have performed semantic clustering, but what I am seeing, and what you are saying in your response to my comment, has less to do with semantics and more to do with superficial textual encodings.

Case law and blackletter law will obviously look very different in terms of their textual representation, style, formatting, etc... And this will be true even when they pertain to the same ideas and the same concepts.

To state the obvious, semantics is about the meaning of things, not about style and not about specific word choices or specific syntactical forms (although sometimes these carry meaning as well).

ubutler
1 replies
7h28m

Furthermore, in common law countries like Australia, you have many legal concepts that have developed independently of statute and either remain independent or have been formalised into statute.

This is the bigger point. In my own university studies, there was a clear segmentation between the common law and statute, although they are certainly interrelated.

It’s also worth noting that the boundaries between cases and legislation were not absolute, there were areas of the cases ‘mainland’ that contained legislation.

My point on the style was that in addition to differences in purposes, they are also textually different, which can indeed bleed into semantics.

epgui
0 replies
7h24m

The point is not lost on me. Certainly tort law, contract law, administrative law, and many other areas of law aren't usually sourced from blackletter law as much as from jurisprudence or other sources of law.

I think this very point you're trying to make would be more persuasive if the analysis had modelled the relationships that do exist between blackletter law and case law. As we have already discussed, text similarity may not suffice to reveal these relationships. And while these relationships don't always exist, when they do exist they are very strong.

Simon_ORourke
3 replies
11h39m

Does this make it any way easier to replace lawyers with an LLM or expert system?

ivyirwin
1 replies
11h22m

Not OP but working on a project in similar domain (ndaok.com). The technology is definitely making it easier to replace lawyers. The biggest barrier right now is lawyers themselves. In fact our project stopped trying to sell to lawyers because it's almost like they purposefully refuse to adapt new technology. Instead we've had success with customers trying to find a way not to use lawyers when they are not needed.

Simon_ORourke
0 replies
8h26m

trying to find a way not to use lawyers when they are not needed.

Kudos to you guys, the elimination of the need for lawyers is up there with any societal issue you care to name. It may do more for social justice than funding anything else

lmeyerov
0 replies
8h42m

This is the heart of most real generative AI systems for reasoning about text: index data using this basic technique (chunked document embeddings), and when talking to the AI, the AI looks up documents from these clusters and loads them in as context for making the answer. Many ways to improve over this, but it's the heart.

In our case (louie.ai), users will have vector indexed their documents into a scalable database like OpenSearch/elasticsearch, or we help them do it, and they can talk to the data, visualize it, run analytics, etc. For example, "get everything on koala adoption from the last decade and draw as a clustered map" would generate a hybrid query to find "semantically similar" documents based on vectors and also symbolically on the time stamps, run it, and then decide to do the followup step of visualizing it using the same family of viz technique in the article. We haven't tried law yet, but already do this for areas like disaster, crime, & misinfo intelligence from social media & news. (Imagine: "Alert me when ..." or "summarize what...").

We find this approach fast and easy, but for very important questions, lower quality than we would like. Imagine a scenario like case law around koalas changing precedent over time. RAG using Langchain/LLMindex + OpenAI over a vector index doesn't solve that kind of thing out of the box. But they are solveable, and it's pretty fun to work through these kinda of issues :)

jasonjei
2 replies
5h54m

I’ve noticed in many commonwealth countries there is no official codification of case law, administrative law, and statutory law passed by the legislative body and receiving assent from the executive branch.

The US being a hard fork of the commonwealth has the official US code and state codes—attempts to organize impacts of case law, admin law, passed law, etc—but Canada has pockets of codification (the Criminal Code), but not all acts of Parliament are organized in a single code. The UK as far as I can tell has no such thing in England or Wales. Hong Kong has some semblance of codification with the Basic Law and ordinances. Does Australia have codification at a federal or state level?

dragonwriter
1 replies
5h48m

The US has the US code and state codes—it attempts to organize impacts of case law, admin law, passed law, etc

Um, yes and no.

“US Code” is statute law. The “Code of Federal Regulations" is admin law. There is no codification of case law; there are reporters, but they are just a flow of case results, similar to the sequential publication of statutes in places that don't codify statute law (and those that do, too, but for most purposes where they do the codification is more generally useful for most purposes.)

The states are generally similar: there is codification of statute and admin law, but not of case law.

jasonjei
0 replies
5h42m

Got it. I was just wondering if other commonwealth countries had an equivalent of a US code or Code of Federal Regulations that documented law in a centralized store. IANAL, but law seems to have so many distinct sources—and more curiously, does codification help a lawyer with the job?

isoprophlex
2 replies
14h3m

Thank you so much for replacing the interactive visuals with screenshots on mobile! Makes for a much better experience reading this on my phone.

ubutler
1 replies
13h3m

I'm glad you appreciated that touch :) Seeing as 59% of my readers are on mobile, I thought it'd be better to have a static image rather than an interactive map which would be pretty unusable on a phone.

amand33p
0 replies
11h46m

I second it. But there's a bug. If we reduce the browser window width, and re-increase it, the charts stay in non-interactive state.

dleeftink
2 replies
13h33m

"we can also see that Australian case law is a continuum of sorts"

It definitely provides a pretty picture, but just wanted to emphasise the map !== territory addage. The continuum may rather be a function of the projection, chosen similarity metric and so on.

That does not mean we cannot learn from the map, but that the actual 'knowledge structure' of the sum of documents may not be a convenient continuum at all.

In any case, the way you've documented this project is remarkable, and it does provide a novel view of the Australian legal sphere. Thanks for sharing!

ubutler
1 replies
13h6m

It definitely provides a pretty picture, but just wanted to emphasise the map !== territory addage.

You're right — my map does not necessarily represent the underlying semantic structure of Australian law, it is an approximation, one that is biased by the data I used (which as I mentioned, is missing laws and cases from a number of jurisdictions), the embedding model I selected and the dimensionality reduction model I used to project my embeddings into a two-dimensional space, to name a few.

Because I was writing for both legal and data science audiences, I tried to avoid sounding like my inferences are anything more than just inferences but without getting too technical and explaining the inherent limitations of any attempt to semantically map knowledge with today's technology.

I will just say though that, having studied law myself, Australian case law is indeed somewhat of a continuum. A single case may touch on many areas of law and there are no restrictions in terms of subject matter on what precedents a judge may draw upon in reaching a decision, apart from that they are both relevant and binding (or, if they are not binding, are not treated as such).

It was also interesting to observe how the final clusters that developed were uncannily similar to the way in which I was taught law at university. It goes to show, there's a lot of thought put into the design of our legal courses here in Australia. In fact, there are 11 subjects that are mandatory, known as the Priestley 11: https://en.wikipedia.org/wiki/Priestley_11. All of those are reflected on the map, although some have been rolled up into larger categories or divided by other means.

mistermann
0 replies
10h3m

I think it can sometimes be useful to take this map != territory concept further - all instances of map != territory are not equal, some have the potential for higher utility than others. And, I would estimate that concepts/methodologies like this (anything that provides humans new ways to examine and conceptualize important matters) almost certainly have higher potential than standard, run of the mill instances of map != territory (the likelihood of us being able to find and harvest that utility is another layer of complexity, but then so is the notion that utility is often found not only in the destination, but also in the journey). (Unfortunately, modal logic notation seems to currently have no support for describing these sorts of concepts, at least according to ChatGPT).

The "so what?" of it is that if people (particularly smart ones) exclude these additional concepts from their logical consideration, it is possible that the idea could be dismissed, or have its potential importance estimated to be lower than it actually/potentially is, potentially leading to an outcome whereby this map or the underlying methodology (applied to other domains) is not maximally exploited to achieve positive outcomes.

bbor
2 replies
12h47m

Amazing work. As someone doing self-funded web dev, how do you find the time to work on this? Is this a resume booster, a product/prototype, or just a passion of love? To say the least this is groundbreaking.

I love your technical explanations, even tho I started skimming there. It appears this is all built on modern embedding algorithms, plus traditional ML clustering magic. Now that you have the basic data, have you thought about using full generative models for semantic analysis? Ie “write summaries of this subset of cases and tag them with specific situations or intricacies”, and then do clustering on that? I feel like that’s the natural next computational step, and surely (hopefully?) what the many millions/billions of dollars worth of SWEs that were put to work applying LLMs to case law over the past year in America are up to.

The very best projects on here are ones where I’m tempted to ask to collaborate, even though I know I’m already booked up with work through the horizon! I’ll have to console myself with a comment and a very prestigious place in my “inspirations” bookmark folder :)

defrost
1 replies
12h41m

The blog about might interest you:

https://umarbutler.com/about/

    I’m Umar Butler, an Australian data scientist, legal technologist and AI researcher. This is my blog where I write about law, technology, AI and everything in between.

    As part of my research into legal technology and AI, I have published, inter alia, the first dataset for training LLMs on Australian law, the largest open database of Australian law and the first open LLM for Australian law.

    I currently serve as the Assistant Director of Data Science at the Attorney-General’s Department. My work centres around the responsible use of AI to enable, accelerate and enhance public decision making and legal and policy analysis, in addition to consulting on the development of key AI policy.

bbor
0 replies
1h31m

WOW ok, thanks so much for doing my homework for me! I guess I just have to look into a high level government position that encourages me to follow my own interests, easy peasy…

infostud
1 replies
14h43m

Thank you for this effort. Did you access data from http://austlii.edu.au ?

defrost
1 replies
14h27m

Really nice writeup, I appreciate the work you've put into that in both the descriptive analysis of the data and the technical breakdown of the process.

ubutler
0 replies
12h20m

Thank you :)

chottocharaii
1 replies
8h31m

"My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning."

I think Jade.io has had a go at this, IIRC. This isn't to detract upon your amazing work though, great stuff.

ubutler
0 replies
8h8m

Thank you :). Would you mind sharing what you have in mind? I haven't come across a visualised semantic map of state and federal Australian laws, cases and regulations before.

boffinAudio
1 replies
14h31m

This is really awesome, thanks for the work and thanks for sharing.

This is a really interesting form of mapping - would you consider doing it for the original occupant's languages, as well?

Australian law itself is fascinating - those outliers on the edges of some of the trails are very curious - is this indicating that some of this material is authored, possibly by the same people/groups whose ontology is transferred with each new document?

I'd love to see this semantic map for the original occupants languages.

It would also be interesting to see Australia's human rights proclamations and related legislature, as well as its military orders and authorizations for involvement in the 5-eyes catastrophe somehow, semantically, in this context.

defrost
0 replies
12h36m

would you consider doing it for the original occupant's languages, as well?

Bit of a challenge as of the many languges, few are still actively spoken and, as oral unwritten languages, there's an issue with inconsistent European spelling creating text that truly native speakers still have to learn to read.

For your interest; Aboriginal Language Groups: https://mgnsw.org.au/wp-content/uploads/2019/01/map_col_high...

MisterDizzy
1 replies
6h52m

Seems like quite a project. And very useful.

Australia is the perfect example of when too many well-meaning people who think they can solve everything with more government power are given too much capability to see their vision through to its logical conclusion. It ends up making most of the problems it tries to solve far worse, and nobody has the guts to pull the plug on the programs that aren't functioning.

techbrovanguard
0 replies
6h3m

Clearly, the solution is more neoliberalism.

IIAOPSW
1 replies
7h28m

I've been dealing with some matters in the Australian legal system, for a long while self represented and self taught but recently with a solicitor. I've read a number of acts for myself, procedural civil and criminal, and have even run into the invisible wall between legislation and case law.

This has been shockingly pertinent to my interests and I thank you for compiling it. My only gripe is that you didn't post it several months prior when it would have been most helpful to me ;)

ubutler
0 replies
7h22m

I've read a number of acts for myself, procedural civil and criminal, and have even run into the invisible wall between legislation and case law.

Glad to hear it corresponded with your lived experience, it really was surprising to see how the map correlated with my own understandings of the law developed through my degree!

throwup238
0 replies
7h53m

You need to get in contact with Rob Sitch. They can probably make a whole season of Utopia based around this!

sevenseventen
0 replies
7h32m

Mapping the internet as a whole has been a thing for quite a while, going back to Kumar et al in 2000. https://scholar.google.com/citations?view_op=view_citation&h...

I recall at least one of those papers characterizing the shape as resembling a bow-tie.

This and other early contributions were looking at the link structure of the internet, not textual similarity, though.

sema4hacker
0 replies
6h21m

Most of your work seems over my head, but doesn't the "mammoth" example indicate that by tweaking numbers you can end up getting just about any visual blob you want?

jordanpg
0 replies
9h43m

This is very cool, congratulations.

When I was in law school, I sometimes visualized the "common law" as a web of interdependencies. This is a similar visualization, although it doesn't quite capture the dependencies, at least as I have always imagined it.

For context, the common law refers to law made by (mostly) appellate judges. Sometimes it's built on top of statutory law (e.g., providing meaning, interpretation, or definition to statutory laws) and sometimes it's completely made up, when there is no law "on point." It's made up in the sense that it's constructed on top of a long trail of historical precedent, sometimes going all the way back to Victorian-era England or even older. Really.

(Aside: This is why certain individuals sound so silly when they rail against "judge-made law" in the US. Virtually all law in the US is "judge-made law.")

Anyway, the common law has always seemed to me to be amenable to representation as a graph-like structure where nodes are cases or precedents and the edges somehow encode the strength of the support for the precedent. I think judges might think twice about breaking from precedent (which can be virtuous or not, depending on your viewpoint) if they could see a visualization of how strong the precedent is.

This representation is a step in that direction and I hope your tech can be extended to other common law countries!

ivanoconnor
0 replies
13h34m

Last year, I had a similar idea to "map out" case law and legislation in the UK — as usual, though, life got in the way and it's ended up joining my vast collection of half-finished projects. Having read your excellent writeup, I'm now feeling rather inspired to give it another try! :)

guidedlight
0 replies
12h4m

*Except Victoria by the looks of it. :-(

green-eclipse
0 replies
8h7m

Would be great to see some of Fisk's cases in here /s

feliixh
0 replies
4h50m

Great job, I intend to reproduce this on a similar dataset I've been collecting!

I will say, it would be great to see the color labeling done on domain url alone, to see how much of the topography of the map is driven simply by the different formatting characteristics of the websites you're gathering data from.

contingencies
0 replies
12h6m

The problem with Australian law, and I suspect most law, is that the practical problems of the actual system appear to be less about the theory and more about the absence of enforcement, oversight and due process.

amb1337
0 replies
14h33m

This approach could be used to build a global map of AI and/or data privacy legislation and cases that would be potentially very valuable and useful, particularly for startups.

adammarples
0 replies
5h35m

Would it be correct in saying that a semantic map, clustered by meaning, might be pushing it? If the data are word embeddings, then you'd hope that they have distilled the semantics in the raw text but as you said yourself, they are also heavily influenced by style and who knows what else, to the point that semantically identical but syntactically different texts might have different clusters? Think, if half of the texts were in French, would you keep the same semantic map or would you have a French continent and an English continent?

TheCaptain4815
0 replies
9h41m

This is such an interesting use of semantic representation. I wonder if it could be used to map out cases vs outcomes, and determine sentencing outliers.

Hammershaft
0 replies
1h59m

The central shape created by this dataviz could make for an interesting island shape.

6510
0 replies
4h50m

This is great, with so many laws it is hard to get any kind of overview.