<< Go back to Tech

Visualizing Neocities Community

Neocities allows you to host a static website. You can search for the other website on its main page, and follow any website you found interesting. The question is how is organized this social network ? Is there any structure ? I performed the analysis using my graph projection tools.



Introduction

Does Neocities look like any other blogging systems ? Maybe, maybe not. First, Neocities is a IPFS hosting service. You may not be interested by this social network features.

To perform the analysis, I gathered two types of information:

  • Followers
  • Tags

Gathering Data

There is no dedicated API to gather information on Neocities website. So I gathered the .html pages and extracted the necessary information.

Processing Data

Nodes Connectivity

We collected information about 12,531 websites.

There are four types of websites:

  • Websites that are following and are followed (very active)
  • Websites that are not following and are followed (super star)
  • Websites that are following and are not followed (active, but not popular)
  • Websites that are not following and are not followed (not active at all)
Following Followed  # %
Y Y 2481 20
N Y 2853 23
Y N 6413 51
N N 784 6

So most people engage in Neocities activities, as there are only 6\% of people that are not discovered not discovering.

Filtering

For graph analysis, it’s always difficult to process nodes with very few connections. I decided to discard websites which did not belong to the largest connected components, and following less than $5$ other websites (at the date of 27th of April 2022, I am excluded from my own analysis).

Tags

I wanted first to exploit tags, but unfortunately, very few websites add them on their profile. 32\% of them have no tag at all, and the max is 5 (this is fixed by the system). This would be very hard to use, so I keep them for post-analysis.

Tag count Proportion
0 32
1 21
2 7
3 9
4 9
5 20

The keywords’ distribution follows a power-law: very few keywords are largely represented while many others are not.

Range \(\geq 1000\) \(> x \geq 100\) \(> x \geq 10\)  \(> x > 1\)  \(x = 1\)
Count 2 21 336 1760 5213

You can see that the majortiy of the tags are unique.

We can list the top ones and their number of occurrences:

Tag #
art 1664
music 1070
personal 950
videogames 939
programming 919
games 392
anime 369
blog 348
writing 322
design 197
food 168
technology 162
html 155
90s 154
education 135
gaming 133
photography 129
comics 126
javascript 115
fun 114
movies 105
retro 102
game 101

On the official website here, you can see all the first tag sorted by frequency. For some reasons, I don’t have exactly the same list, but most terms are in both.

Transformation

To represent the neighborhood of a node, I performed a Random Walk, where all nodes get weighted based on their accessibility from the current node. Then, I projected the graph using a method previously presented in PolBlog.

Results

We obtain such a map. You can get the .html version » here «, where you can select and click to access the website. For visibility, I kept some of the links. Don’t worry about the “color” column, it is just \(\frac{\text{NumberOfTag}}{5}\).

What you can see is a backbone which is made of the largest website in the middle. Then, there are peripheral nodes with almost no followers. This is not a surprise, as weakly connected items are easier to place on the map because there are less constraints on them.

The second thing is that this main website are also the one with many keywords.



>> You can subscribe to my mailing list here for a monthly update. <<