More

riordan · on Dec 5, 2023

This is genuinely awesome. The crossstreet based search is a perfect approach in NYC. Reminds me of how I used to use Google’s sms service to get directions from about 2004-2010.

riordan · on Nov 17, 2023

Sunday service has always been particularly important to the New York City public libraries. Andrew Carnegie’s original deal always was, he would find the construction of the branches the libraries would run the branches, and the city would fund the branches with seven-day service. For a while. the Carnegie branches open seven days a week, even as they had to follow through on cutbacks at the non-Carnegie branches.

But that ship sailed long ago. Very few were still able to offer Sunday service before this:

- NYPL (Manhattan/Bronx/Staten Island): 8/92 sites

- Brooklyn Public Library: 8/66 sites

- Queens Public Library: 2/66

(Yes, there are three separate public library systems for New York City. They pre-date the consolidation of the city and no matter how hard folks have tried every study on consolidating the three systems into a single organization, winds up costing significantly more than the current status quo.)

riordan · on Nov 8, 2023

This is a big step forward, and is also making me nostalgic for the Friend-of-a-Friend (FOAF)[0] blogrolls of the early 00’s. RDF-in-HTML standard that could express recommendations and relationships.

That said, I’m glad to see Webmention adoption. It’s got a much clearer purpose than FOAF (it might’ve been a little too expressive) and fits nicely into the current web ecosystem.

[0]: https://twobithistory.org/2020/01/05/foaf.html

riordan · on Aug 16, 2023

> In this context-- the section in article where it says present data is of virtually zero importance to analytics is no longer true. We need a real solution even if we apply those (presumably complex and costly) solutions to only the most deserving use cases (and not abuse them).

Totally agreed, though where real-time data is being put through an analytics lens is where CDW's start to creak and get costly. In my experience, these real-time uses shift the burden from being about human-decision-makers to automated decision-making and it becomes more a part of the product. And that's cool, but it gets costly, fast.

It also makes perfect sense to fake-it-til-you-make-it for real-time use cases on an existing Cloud Data Warehouse/dbt style _modern data stack_ if your data team's already using it for the rest of their data platform; after all they already know it and it's allowed that team to scale.

But a huge part of the challenge is that once you've made it, the alternative for a data-intensive use case is a bespoke microservice or a streaming pipeline, often in a language or on a platform that's foreign to the existing data team who's built the thing. If most of your code is dbt sql and airflow jobs, working with Kafka and streaming spark is pretty foreign (not to mention entirely outside of the observability infrastructure your team already has in place). Now we've got rewrites across languages/platforms, and leave teams with the cognitive overhead of multiple architectures & toolchains (and split focus). The alternative would be having a separate team to hand off real-time systems to and only that's if the company can afford to have that many engineers. Might as well just allocate that spend to your cloud budget and let the existing data team run up a crazy bill on Snowflake or BigQuery as long as it's less than the cost of a new engineering team.

------

There's something incredible about the ruthless efficiency of sql data platforms that allows data teams to scale the number of components/engineer. Once you have a Modern-Data-Stack system in place, the marginal cost of new pipelines or transformations is negligible (and they build atop one another). That platform-enabled compounding effect doesn't really occur with data-intensive microservices/streaming pipelines and means only the biggest business-critical applications (or skunk works shadow projects) will get the data-intensive applications[1] treatment, and business stakeholders will be hesitant to greenlight it.

I think Materialize is trying to build that Modern-Data-Stack type platform for real-time use cases: one that doesn't come with the cognitive cost of a completely separate architecture or the divide of completely separate teams and tools. If I already had a go-to system in place for streaming data that could be prototyped with the data warehouse, then shifted over to a streaming platform, the same teams could manage it and we'd actually get that cumulative compounding effect. Not to mention it becomes a lot easier to then justify using a real-time application the next time.

[1]: https://martin.kleppmann.com/2014/10/16/real-time-data-produ...

riordan · on Sept 16, 2020

Indeed they did.

I also fondly remember how wonderful the web development deployment workflow was in Coda. This was back before source control was ubiquitous and CI/CD was required for any real production environment. It was the cleanest way I'd seen to go from saving in your test editor and having a clear path to SFTP it to the server. I half recoil in horror and am still in awe of how considered their experience was.

pilsetnieks · on Sept 16, 2020

> This was back before source control was ubiquitous

Your memory may be playing tricks on you, it wasn't that long ago. Coda 1 had SVN support out of the box, and at the time SVN was already old and in decline, not to mention CVS and other lesser source control systems. By the time Coda 2 came out, git had won the DVCS battle, and was built into it.

Now, granted, back then I wasn't very keen on version control either but it's not because it wasn't there, it's because I was young and inexperienced, and didn't know any better. Plus, those days the only common use for source control was actually, well, source control, unlike today. It was something you had to discipline yourself to use for no immediate benefit.

smoe · on Sept 17, 2020

From my memory in the mid 2000 version control and automatized deploys weren't ubiquitous for the audience Coda mainly targeted: web site development.

I started my career at around that time in a company using Python for web application development and everything was in SVN but already in the progress of being moved to git and while there wasn't a CI/CD server, all deploys were done running a single cli command. On the other hand for years to come I have seen e.g. people and agencies doing websites in Wordpress making changes directly in production.

In that circles, which often included people with no heavy technical background like designers that learned some php/html/css Coda was quite a relevation I think.

joshcain · on Sept 16, 2020

100%. Coda was especially useful if you weren't running your server stack on your local dev machine. Hit "Publish" and all your changed files are pushed up automatically. So much better than hunting through an FTP client every time you make a small edit.

riordan · on Jan 7, 2020

Not gaming related but if you want to know the origins of where everything in computation came from, George Dyson’s “Turing’s Cathedral” is a revelation. It’s the story of computer, told by interviews of the folks who created the architecture of it all. It’s incredible history

riordan · on April 30, 2019

It seems like you're putting the text out under CC-4.0-BY-SA [1] (Attribution, Share-Alike), which is great (the updated version of the same copyright for Wikipedia content). However, you're also collecting a TON of structured and relational data, which seems to be the value generated by your editor. Are you planning on keeping that locked up?

[1]: https://creativecommons.org/licenses/by-sa/4.0

coldacid · on April 30, 2019

My hope is that all the content, not just prose, will be CC licensed. Jude, will it be so?

riordan · on Aug 5, 2018

One of the under-appreciated aspects of Colaboratory is that it's completely integrated into the Google Drive ecosystem, including multiple real-time users of the same notebook (sharing the same VM). This was a real game-changer for me.

The real-time use-case has a nice wow factor to it; I've used it as a way to pair program for data science problems. The input cells sync in real-time (a la Google Docs), and so too do the output cells when one person runs a cell. And it's nice to be able to leave comment threads on a cell that can be resolved as a form of peer review.

But what made Colab a game-changer for me is how it let me seamlessly put my notebooks and a VM into Google Drive, making anything I put in a notebook accessible to anyone within my organization without needing to set up an environment, be it shared or local.

My last organization was a small rare disease research foundation, and I primarily worked on the fundraising side of the house; it was not a technical organization. When thinking about the longevity of my work, I realized that even the one person managing IT for them probably couldn't set up, let alone justify maintaining a networked Jupyter environment. So rather than ask for that and store all my analyses and small utilities on GitHub, I built everything on top of Google Drive and Collab. Folks were used to using Drive for everything else, so it meant my work was adjacent and discoverable to the team it was pertinent to and they could get access to both the outcomes of prior runs or change a few variables and run it again without me being needed. I left recently and I've still heard from a few former colleagues that they're still using many of the these notebooks and discovering others I'd built on their own time.

For a small data analysis operation in a Google Apps organization, Colaboratory is a godsend.

morenoh149 · on Aug 5, 2018

I've actually had the opposite experience, I upgraded my drive storage for an ML project and was still unable to load the datasets into a colab reliably. Hope this story gets better. In the meantime I'm using sagemaker and kaggle kernels.

mynewtb · on Aug 5, 2018

> including multiple real-time users of the same notebook (sharing the same VM)

In my experience our cell contents were synced but our Python state was not. This makes collaborating highly confusing and error prone.

riordan · on Jan 2, 2018

Have you considered teaching your journalism students Docker first?

danso · on Jan 2, 2018

No that’s be too much overhead for a beginners level class.

karussell · on Jan 3, 2018

You could do it on one server and give students access to it using the docs from mapzen

cookiecaper · on Jan 2, 2018

> Have you considered teaching your journalism students Docker first?

I agree. This would be a great prank. As a bonus, it'd help solidify their disinterest in everything computer-ish, making them more motivated journalism students. Do it OP!

riordan · on Jan 2, 2018

This is awesome and exactly the kind of thing that can give the OSM Armchair Mappers some very accurate superpowers.

It'd be great to see this fed into the OSM Tasking Manager [1] or Maproulette[2] for folks to validate (and then build even stronger training data).

[1]: http://tasks.openstreetmap.us/ [2]: http://maproulette.org/

maxerickson · on Jan 2, 2018

If I understand correctly the creator has already validated the output for much of the Northeast.

(They've proposed an import: https://lists.openstreetmap.org/pipermail/imports/2017-Decem... )