How to build a graph visualization engine

stuntkite · on Sept 16, 2022

I was an early employee at the first round of graphistry.com. I'm really proud of Leo and that he took it from the brink and put it back on the market and is growing it.

I think more people should build more graph visualization engines, but you're going to have a hard time competing with how slick pygraphistry is, but there is not enough alternatives that are worth a shit. Graphistry, when I left was able to do half a billion nodes (my memory might be fuzzy, could have been 250,000 but I think we ran tests and got 500k) all memory resident in the browser. Our goal was a billion. I'm not sure what it can do now but it has UMAP and some other very fancy features.

Here's a stupid unmaintained thing I made that got me the job at Graphistry. I did not build the graph vis engine, but I did write the procedural color stuff. It's ugly and terrible. Lol. I'm gonna release something some time soon that is a rework on that idea.

https://github.com/millerhooks/graphterm

throwoutway · on Sept 16, 2022

> half a billion nodes (my memory might be fuzzy, could have been 250,000 but I think we ran tests and got 500k) all memory resident in the browser. Our goal was a billion

Did you mean a million? 500k is half a million not billion

mbuda · on Sept 16, 2022

+ Have you seen https://github.com/cosmograph-org/cosmos? Looks like a really scalable thing :D

stuntkite · on Sept 16, 2022

I had not. This is very pretty and I will play with it. Thank you.

mbuda · on Sept 16, 2022

Graphistry is really cool but the point here is not to compete on the visualization side, we needed a tool that scales for a few of our use-cases (mostly as a highly configurable graph visualizes) + a library that can easily be extended :D

lmeyerov · on Sept 16, 2022

Thanks for kind words all :)

Scale - backend: As we're rapids.ai-native (helped start early days of both Apache Arrow + Nvidia RAPIDS.ai), we work with customers doing billion-level nodes/edges in interactive time on GPU servers. Mostly for fast ingest, ETL, + graph neural nets / manifold learning, and we're slowly pushing that into the visual stack.

Scale - frontend: We normally recommend reducing down to about ~2M edges or less. For sensible visual experiences, add in auto algorithms that cut to more like 500K. We've planned a way we think we can do another 100X, just not (yet) an engineering priority. Fun fact: your browser's JS VM is limited to ~1GB of RAM, so we're already at that limit in practice.

RE:Scaling graph visuals as an engineering practice, it looks like memgraph is starting where neo4j reached a few years ago, and makes sense. That approach doesn't really work well for the use cases memgraph advertises for, because as soon as a bunch of user/customer/IT/etc events happen and get visualized, the browser crashes. Optimization approaches like wasm and workers are clever -- a v0 prototype of graphistry did that! -- but we found is too unreliable to be the path for good performance across users of most operational teams ("Works on my machine" syndrome). Do it, but that shouldn't be the main source of 100X performance, just a 2X boost. We end up connecting GPUs in the browser to GPUs in the datacenter not just for 100X'ing this kind of stuff, but for a predictable performance way that limits how often your user's browser crash on real datasets.

Also maybe not obvious, this article focuses on interactive rendering, but a lot of the challenge after they figure out how to solve it is interactive analytics too. Most layout algorithms have non-linear complexity, so O(500K) is actually a challenge. A lot of our GPU offloading work nowadays isn't just rendering but layout, ETL, ML/AI clustering, etc. This ends up overwhelming the browser (why we do distributed GPU), and OLTP graph DB's aren't good at that either -- Neo4j basically had to write a V2 DB-in-a-DB to make their Graph Data Science module perform.

And no worries: There's no competition because Graphistry isn't a graph database :) Most of our users will do something like databricks dashboard / jupyter notebook / powerbi / etc query <> graphistry visual. I bet pairing a great streaming db like memgraph with Graphistry would combine respective engineering strengths quite well!

stuntkite · on Sept 16, 2022

Heh. You and me both. When I worked there something I really wanted to get to was GIS integration and abstraction with graph. They are doing some really, really cool stuff and I think their future offerings will be things to take note of, but it's a service too and it's not open source. If you find a solution to what you're talking about, I'm interested. Let me know.

Also, I talk to Leo. If you have specific requests that you want to list, I'll make sure he reads this thread.

taubek · on Sept 16, 2022

What kind of integration with GIS where you considering? What did you want to accomplish?

stuntkite · on Sept 16, 2022

Well I haven't let go of the idea and am in the process of releasing a GIS data processing platform, but that's a story for another day. My thought is that all data, especially large datasets from the top down only has a few ways it can be displayed and sliced. Relationships, multi temporal (slicing and or playing by time but then also playing back multiple time sections so they can be compared. compute vs. real time, time to exec on server vs time experienced by user, and also finding various time slices to compare. So multi temporal being that your timeline playback isn't just a film strip), and spatial. The ground truth for all data is that it happens somewhere in the world so a graph should be able to be put into a GIS space for display, bonus points for 3D, but then also you should be able to build composite fake GIS spaces and visualize the relationships between them. Like for instance, your products network traffic all going to colocation spaces and to users and back again. That has normal GIS locations and that can provide constraints for graph display but also, especially in a distributed system there may be artificial geography that can be defined, lets call it "The Astral Plane" that has relationships that are important and grouping just like state and country boundaries that can be defined in shape files and put into a spatial database in a projection you just make up or is defined by the graph data and physical locations and you should be able to slide between all those things. That's where I wanna go and still intend to get to.

EDIT: Obviously when I say everything has GIS coordinates I'm not quite talking about outer space, but I think outer space is even covered by the rest of this idea. So is tiny space. This realization came to me when I was reading about people using PostGIS as a database for chemistry simulation. I have no link for that, I haven't thought about it in years, but now that I'm talking about it, I'll try to find it and post a link if I find it as I would like to readdress.

taubek · on Sept 16, 2022

Interesting take on GIS application. I used to work only with ArcGIS some 25 years ago. It was pretty new concept for me at that time. The whole spatial concept of linking databases with geo coordinates.

erwinh · on Sept 16, 2022

Recently a new web-based fully gpu driven graph visualisation engine was released and it could handle quite some big graphs directly in the browser: https://twitter.com/hoogerwoord/status/1568188361503907840?s...

Happy that people are building new graph visualisation engines and I think still a very open space for innovation in interactively exploring graph/network structures.

cizl · on Sept 16, 2022

Cosmos really looks amazing when it comes to performace. Besides visual rendering on the GPU, they also do the graph layout simulation calculations on the GPU as well, which is a first. SigmaJS for example uses graphology for the calculations, and WebGL for rendering.

Cosmos is a really new lib, so when we did our initial research it wasn't even out yet. You can read more about our research over here: https://memgraph.com/blog/you-want-a-fast-easy-to-use-and-po...

From what we've understood, it's somewhat limited when it comes to graph styling. But an amazing technology nontheless.

xtracto · on Sept 16, 2022

I'm currently using vis-network [1] as the graph display for a Fraud analysis UI. Cosmos looked really good when I checked it but as you very well mention it wasn't able to style vertices with different images/icons and not sure if it was able to style edges color and width. All of this is heavily used in my use case. Hopefully it gets these features as it matures

[1] https://github.com/visjs/vis-network

justtoni · on Sept 16, 2022

vis-network is a great library. We actually started with it because we liked the API, the styling capabilities, easy to use event handlers. But we experienced performance issues when we wanted to simulate and render larger graphs because it was done in the main thread, the whole UI was blocked by it. We tried to fix it in vis, but simulation was using DOM so it was super hard to split simulation and renderer.

Actually that was the main reason (along with the note that main authors are not contributing to visjs any more [1]) for a creation of the Orb where we fixed the blocking UI issue with graph simulation. Orb engine has two parts now:

* Simulator that doesn't depend on the DOM so we can move its heavy calculation to the web worker - we use d3-force for it [2]

* Renderer is pretty much influenced by vis-network, using similar style mechanism and canvas drawing capabilities (we credited vis-network in our code for those sections)

[1] https://github.com/almende/vis/issues/4259#issue-412107497

[2] https://github.com/d3/d3-force

mbuda · on Sept 16, 2022

https://github.com/cosmograph-org/cosmos seems super nice! Yep, it's amazing to see better and better tooling around!

nextaccountic · on Sept 16, 2022

Is it open source? Or does it have a website? The tweet wasn't clear

kasupe · on Sept 16, 2022

Wow! Looks nice :) Thanks for sharing

twoWhlsGud · on Sept 16, 2022

Interesting license choice - Creative Commons Attribution-NonCommercial.

erwinh · on Sept 27, 2022

Yeah I respect the choice although would have liked to see a different license.

Noumenon72 · on Sept 16, 2022

"And why you shouldn't" doesn't deserve equal billing in the headline. The article structure is

* first two sentences: You shouldn't build from scratch

* next four sentences: How not building from scratch failed for us

* entire rest of article: How we built from scratch and made an amazing product you should try.

I guess the implication is "you shouldn't build an engine because we just built the engine to end all engines"?

justtoni · on Sept 16, 2022

Great destruct of the existing title to get to the better title!

Even though "end all engines" might not be 100% correct, because the initial idea of the Orb is to make a single interface where the background (simulation and rendering engine) can be changed.

To add to your title, "you shouldn't build an engine because we just built the engine to unify all engines". :)

dang · on Sept 16, 2022

Ok, we've dropped that bit from the title above. It's kind of baity anyhow.

dtomicevic · on Sept 16, 2022

something like that, I think we still agree it's an amazing product and now there is really no reason to rebuild it from scratch, at least for a while...

graphviz · on Sept 16, 2022

Good discussion. It may help to separate the different threads.

Business models that support network visualization: mostly, not such a great story. Customers want to solve problems, not just look at pictures of networks. Inevitably this drives the work toward domain-specific capabilities in areas like computer security, fraud detection or bioinformatics. It's a slippery slope. If you stay focused on core algorithms, your audience is other tool builders i.e. cost centers.

Scaling up network visualization: fascinating technical problem, but a human can't actually see a million objects at once or form a mental map of their locations. So it's more like a clustering problem. Not a big surprise that Graphistry adopted uMAP. It's treating nodes more like points in big plot. We're not concerned with the same problems as illustration quality rendering of small readable graphs.

Building your own: an appeal of network visualization is that you can get going by just writing some kind of physical simulation, assign reasonable coordinates to nodes, drawing edges as lines, and poof you're done. If your goal is consistently making concrete diagrams that look like a human drew them (with nodes that have shapes and ports, various kinds of labels, constraints on edge routing, nesting, aspect ratio control, etc.) there are so many intricate subproblems that you could spend years on any of them. But what's the financial incentive?

The research frontier: no doubt machine learning will eventually transform this domain the way it has many others. The combinatorial objectives of network diagramming make it challenging for now. (Can an algorithm learn orthogonal planar layouts with port constraints? Maybe. Would like to see that.) Another frontier is to extend general methods for declarative 2D layouts. People don't want just pictures of networks, they want more elaborate diagrams: computer networks, metabolic pathways, business processes, cryptocurrency transactions. Network visualization is only a subproblem in information visualization. This ties in to the first point, people need to solve problems in a specific domain.

analognoise · on Sept 17, 2022

"Can an algorithm learn orthogonal planar layouts with port constraints? Maybe. Would like to see that."

I think ELK can do that: https://www.eclipse.org/elk/

throwaway202209 · on Sept 16, 2022

Firefox 104.0 64bit Linux, page doesn't complete loading.

When viewed through archive.org page takes >30 seconds to display any content other than menu's, orange back ground, and "are you having problems?" chat prompt.

Guess I'll have to work out how to build my own graph visualization engine :)

chaps · on Sept 16, 2022

Heh same, with noscript blocking 14 unknown domains, where unblocking the main domain doesn't fix it.

No way in hell am I going to trust a company on any technical matter when they have to cobble their own product together through that many external services and trackers.

avmich · on Sept 16, 2022

Yes, I too couldn't pass through all the page requirements.

taubek · on Sept 16, 2022

I managed to load it right now. But as someone said, archive.org is also an option.

https://web.archive.org/web/20220916170642/https://memgraph....

zbird · on Sept 17, 2022

"Building something from scratch is rarely a good idea."

Except when you actually want to learn something. Worst possible advice to give someone and I haven't even read the second sentence of this blog post.

kez_z · on Sept 16, 2022

I recently tried graph visualization and it was a mess. Might check this out, anything to make it easier.

kasupe · on Sept 16, 2022

If it’s at least a bit easier to use than D3.js, I like it :’) Seems like a really cool project.

justtoni · on Sept 16, 2022

D3 is a great product. We really like using it. D3 has a great integration with svg rendering, but we needed a canvas rendering which resulted in a merge of vis.js (guys did a great job on canvas rendering) and D3 (just the simulator: d3-force).

vpavicic · on Sept 16, 2022

I tried it a bit and it does seem easier and it renders way faster

kasupe · on Sept 16, 2022

I see that it draws on canvas, and I used to draw on svg. This is awesome!

justtoni · on Sept 16, 2022

SVG excels at drawing a small number of large elements, where it sometimes performs even better than the HTML5 Canvas or WebGL.

politician · on Sept 16, 2022

This is open source?! It looks far more approachable than ReGraph.

dtomicevic · on Sept 16, 2022

Yep, Apache 2.0 https://github.com/memgraph/orb

kasupe · on Sept 16, 2022

Yep :)

fnordpiglet · on Sept 16, 2022

mkaic · on Sept 16, 2022

> Building something from scratch is rarely a good idea.

In the context of this article, I agree. Professional products should probably not try to build everything from scratch. But I do think it's important to acknowledge that building things from scratch as personal projects is one of the absolute best ways to gain a deep understanding of a topic. Even if you go on to use graph visualization engines built by others in all your future work, that doesn't mean you shouldn't give it a go on your own time if you're interested in trying it out.

Not trying to be contrarian or criticize the article, just my 2 cents. In general I agree with the sentiment of using existing solutions wherever possible.

stuntkite · on Sept 16, 2022

It depends on how you value your time and effort. Not every idea should be valued by its market fit. I don't think people should write code they are uninterested in writing. That's my litmus test for myself. Usually when I'm really, really interested the thing I'm aiming at doing initially isn't even about what I'm really trying to solve. It's picking away at the cruft over something and then I can see what's on the other side. Don't write things from scratch if you absolutely hate it and you are only doing it because you think the market will buy it and you wanna be a big biz, unless you feel like it. I don't know, YMMV. Programming is stupid and I plan to be a pig farmer as soon as possible.

jylam · on Sept 16, 2022

I don't thing Parent was talking about business at all. Personal projects are not things you do because of potential business opportunities in my book. I wrote a GameBoy emulator, several raytracers, countless demos on 8bits machines, countless "IoT" things, an automatic cat feeder, a 3d scanner and much more, I never ever had the goal to transform that into money, that was for fun.

Of course you won't write anything that you hate at first, because it's for fun. But hating something for some days or weeks is part of the fun, too. You are challenging yourself, not doing it the easy way, to appreciate how well the others are doing.

If you want to turn it into a business, so yeah, don't write a graph visualization engine, or crypto stuff, or anything really. Get a job, that can be fun too, and let the entrepreneurs figure out the rest.

stuntkite · on Sept 16, 2022

I see what you're getting at and concede that you may be totally right that I'm not on the same page as Parent, but if the reason for not doing something dumb that you'll probably fail at that other people are doing better at, that lets say you also hate, isn't for building a business... What are you doing that for? Pathological self harm?

I think you and I agree. I'm not being crass, but I'd if you had more thoughts on what I'm trying to say about what you're trying to say I'd be interested in further discussion.

dtomicevic · on Sept 16, 2022

haha well, while running a pig farm is also a difficult business, there are some interesting challenges you can tackle :D also you could write software to manage some aspects of your pig farm

stuntkite · on Sept 16, 2022

Definitely. I have some giant robots and spend a lot of my time with computer vision. I'm gonna put them on tank treads and have them tend to the farm. I just like... fuckin do not want to ever be asked to and feel like I should care about anything related to javascript.... ever again... past 2024, but with just a tiny nudge, I could just live that life tomorrow.

gregfagan · on Sept 17, 2022

Wow… did JavaScript hurt your children? Have you tried typescript? It’s not so bad.

stuntkite · on Sept 18, 2022

Yes. And it's hurting yours. But you know, some people like all sort of other things that are literal poisonous torture. Eh, get in where you fit in.

xani_ · on Sept 16, 2022

I agree. Making a personal blog engine gave me new hatred for anything frontend and a deal of compassion for people that have to deal with that mess professionally.

stuntkite · on Sept 16, 2022

What blog engine and why did you and I assume your partners feel like the world needed another one? Not being glib. I'm genuinely curious.

xani_ · on Sept 16, 2022

It didn't need another one which is why I won't be showing it ;p. It was also project where I learned Go and did a bunch of redesigns along the way so it is a but of a mess. I'd also hate idea of someone else actually using it, or worse, reporting bugs, because just using some static site generator would've been much better idea and I kinda designed it for my workflow.

It did a lot of performance wankery, like templating system (not my lib, I just used available one) being just Go code embedded in HTML that needed to be compiled with the rest of the app, or pre-generating HTML from Markdown on load. Using same language to write app as to write templates was nice tho.

Most of it was mostly "how far I can go without cutting features" because really going from 10ms per page to 0.2 ms per page has no difference after client RTT is involved.

Hell, even on localhost for some reasons Chrome always have few ms delay before starting download compared to any cli client, chrome shows anywhere between 3 and 8ms to start downloading, while FF sits at 0ms

I originally planned to have some fun with HTTP2.0 PUSH but, well, while I kinda believed on authority that it is useful for something it turned out to be entirely stupid idea which apparently nobody bothered to test before pushing it into standard so I didn't get to do it. Maybe I should try again with HTTP3.0 hints.

stuntkite · on Sept 16, 2022

It sounds like you learned a lot... but a blog platform to do that? Like why not just pivot to something that might need the enhancements you're providing with go?

I'm asking this for rhetorical purposes. I know, I know deeply why. heh. I'm glad you came out of it ok. Thank you for your honest response.

xani_ · on Sept 16, 2022

Well it started after wordpress fucked up my formatting and replaced -- with — (em dash) one too many times. The whole PHP stack isn't exactly pleasant to manage in the best of days, and looming threat of WP having another bug and getting owned of no fault of my own was also a factor.

I wanted something simple that was just markdown for actual content, no database or anything more fancy. I have considered static gen and even eventually migrated my old blog to it as archive (not in english, and cringe anyway). But it really started as "well, it looks like fun thing to do and I will be scratching the itch I had, why not".

Mind you, that was in 2012, the first version was in asynchronous(!) Perl, Ghost didn't even exist at that point (and I didn't wanted to touch JS anyway), let alone any other alternative. It even did respectable ~3ms per render of the page.

I rewrote it in Go to learn some stuff and in the process also yeeted comment processing and just farmed it out to externals (self hosted, not written by me) app, as that's probably the most annoying and thankless part of blog engine when you include all kinds of spam detection that would need to be written.

On the nearby cementery I also have unfinished Go z80 emu (I did learn a bit about using Dear ImGui from it) and it rewrite in Rust (because what's better first project than that?), just coz I wanted to see just how much faster C<->Rust is compared to C<->Go interop(answer = a lot, rendering part got from >4ms to below 0.5ms).

I did it to the point where it could run some code, and quite fast too, I think it was down to few ns per instruction, and 8 byte prog ran in like 11 ns, which I kinda didn't expect from Go. It didn't emulate instruction delay tho, which would be required to emulate it with peripherals.

Program decoder was just...256 byte array with function pointers generated out of operator list, which probably helped

stuntkite · on Sept 16, 2022

You are a Don Quixote after my own heart. If you ever need a job, get at me. I think we'd work well together. Heh. Not even joking.

xani_ · on Sept 17, 2022

It's just the way I prefer to learn stuff. Learning for learning sake without making anything out of it isn't for me, I learn best when getting knowledge in process of making something.

stuntkite · on Sept 18, 2022

I realize it sounds like I was teasing you. I was not. I really relate and am impressed. Cheers.

taubek · on Sept 16, 2022

Yeah, I'd like also to take a look at it.

mbuda · on Sept 16, 2022

Yep, the history here is that we tried out a couple of standalone tools with the appropriate license (e.g., VisJS, which is a great library), but nothing was suitable after some point, and then the best thing was to build. You learn more + we put it open source + we can integrate some other, maybe more scalable visualization engines in the future :D

dtomicevic · on Sept 16, 2022

agreed, you can learn a lot by building something from scratch. also, some of the successful software tools we use today came from personal projects so it can be a win win

traceroute66 · on Sept 16, 2022

I do hate spam-blogs.

Bait and switch blog posts (i.e that start potentially interesting and then half-way morph into a poorly disguised sales pitch) are so tiresome.

mbuda · on Sept 16, 2022

Check out the Github repo https://github.com/memgraph/orb, it's Apache-2, with a lot of thinking around the software design :D

stuntkite · on Sept 16, 2022

This looks pretty clunky. Is this what Neo4j uses in their viewer? It feels like it's related.

taubek · on Sept 16, 2022

No, Neo4j doesn't use Orb. Orb was released just recently.

mbuda · on Sept 16, 2022

It's not, it's also based on d3 (like Neo4j viewer I think), but build from from 0

stuntkite · on Sept 16, 2022

The code giving a "lot of thought to software design" must be pretty good, because this looks like Java Graybeard garbage to me. Yeah, that's a caustic statement. I'm not trying to say that people who made it didn't do a good job, but the display is just.... Not good enough. They need to find a team member to help them get out of 20 years ago with this stuff. If anyone that is working on this reads this. That's my thought. I am not taking shots at your work. It does look well organized, but I cannot see a reason I would choose this.

justtoni · on Sept 16, 2022

What would you expect from a display to be good enough?

Just as a note, Orb is not there to compete with high volume graph visualizations like Cosmograph, Graphistry, Linkurious. It is more as a child from d3 and vis.js, which are great libraries, that uses d3 simulation and vis-like canvas rendering. We really liked what vis.js team did with the styling of the graph and how you can customize it - this is often a limitation for high volume graph visualizations.

We could also discuss about the analytics usability of seeing a graph with 1 billion nodes. It is definitely awesome, but it is too much data to grasp on as a user seeing it. Clustering or other graph algorithms would help. I think the question is: What is the maximum graph size (number of nodes/edges) when it becomes hard to get any useful visual information expect the graph global state? (e.g. seeing a bar chart with 365 columns (days) is harder to read than a bar chart with a smaller sampling, e.g. per week or month).

I don't know the answer to this, but maybe you will have due to your experience with graph visualizations.

stuntkite · on Sept 16, 2022

You are right. This is a fine rendering layer for datasets of the right size. The space I exist in and am interested in demands a bit more. I think it's just a difference in scale. I do think the article is wrong though. I think we need more graph vis engines of all scales. I think the marketplace for them is just starting to be cracked and there is plenty of room.

kasupe · on Sept 16, 2022

What does Neo4j use?

cizl · on Sept 16, 2022

Neo4j uses a fork of VisJS. They call it NeoVis. We didn't go that route because vis-network is tightly coupled and has a lot of calls to the browser window reference which doesn't work in a WebWorker environment. So simulations end up blocking the main thread.