Hacker Newsnew | past | comments | ask | show | jobs | submit | _as_text's commentslogin

Yeah, my writeup is I guess what you get when you first remember it _wrong_ and then need to overcorrect.

ok but she was talking about riemann


know what this will be about without reading

Python 3.12-style type annnotations are a good example imo, no one uses the type statement because dataset inertia


Usually, i remember that type annotations exist when I'm debugging after things aren't working. If you look at the python code that I've written, type annotations are a sure sign that "there was a problem here". it's like a scar on repaired code


Type annotations by themselves, are little more than a comment


They're more than that as they allow for easy consistency checking of types across calls. This makes all the difference.


The first and only time I've ever felt like I really know Chinese was when I came across the phrase '洋汽扑鼻' in "Fortress Besieged" by Qian Zhongshu. It literally means 'the breath of the sea assaults the nostrils". It's a joke on how fashionable and in demand everything Western was in China in the 1920s. For me it's just laugh-out-loud-for-10-minutes funny. I've tried to explain it to literal dozens of my friends and now I know not to even try.


Probably worth noting that while 洋 does literally mean "sea", it also means "foreign".


I really don't think it's all that different, sorry. The difference is that there is a more direct and established way of talking about these things in China, because it has such a long history of bureaucracy and everyone got used to these dnyamics over thousands of years, but even in places like Sweden you can have guanxi. It boils down to doing something for a member of your ingroup strictly because he's in your ingroup.

I mean even the word 关系 and "network" have the same etymology. 系 = threads of silk arranged in a pattern.


Well that's all fun and games until you start putting off paying Internet bill for two weeks because it turns out that you misconfigured your password app and it actually didn't save your password to the utility service provider and you realize you have no internet one day and you have a school assignment ugh and maybe your credit score gets 0.5% lower and yeah it's all very much your fault. "But you can just be more careful! Handle stuff like this as it arises!" Yeah, sure, just like during Communist times you could easily get more than one pound of coffee per half a year if you're just careful and note when it's available in stores as a drop-in

I believe this whole Apple vs Linux debate is perfectly analogous to the West vs East Germany debate, to the point that almost all intuitions/arguments for the latter are perfectly reusable in the former


> Well that's all fun and games until you start putting off paying Internet bill for two weeks because it turns out that you misconfigured your password app and it actually didn't save your password to the utility service provider

As opposed to the centralized service that will kindly misconfigure it for you, or just discontinue it out from under you, or ban you because of a false positive, or ban you because of a true positive because you unwittingly violated their broad and ambiguous terms but you're still just as screwed.

> I believe this whole Apple vs Linux debate is perfectly analogous to the West vs East Germany debate, to the point that almost all intuitions/arguments for the latter are perfectly reusable in the former

The fallacy of Soviet Communism was the fallacy of central planning. The Party decides what's good for you and The Party is infallible so if you try to resist you'll be punished. Freedom of choice is heresy. Divergence is verboten.

Does that sound to you like the typical Linux user, or like Apple?


I have now read the paper and it alone is enough to make me seriously consider devoting significant amount of my time to the author's project. Here's why:

In my introductory statistics class, I learned that a independent and identically distributed sample is a sequence of random variables X[1], ..., X[n], all of the same signature Omega -> (usually) R. All of them are pairwise independent, and all of them have the same distribution, e.g. the same density function. Elsewhere in probability I have learned that two random variables that have the same density function are, for all intents and purposes, the same.

For all, really? Let's take X[i] and X[j] from some i.i.d. random sample, i != j. They have the same density, which leads us to write X[i] = X[j]. They are also independent, hence

P(X[i] in A, X[j] in A) = P(X[i] in A)*P(X[j] in A),

but X[i] = X[j], so

P(X[i] in A, X[j] in A) = P(X[i] in A, X[i] in A) = P(X[i] in A), so

P(X[i] in A) in {0, 1}.

This was a real problem for me, and I believe I had worse results in that statistics class than I would have if the concept was introduced properly. It took me a while to work out a solution for this. Of course, you can now see that the strict equality X[i] = X[j] is indefensible, in the sense that in general X[i](omega) != X[j](omega) for some atom omega. If you think about what needs to be true about Omega in order for it to have two different variables, X[i] and X[j]:Omega -> R, that are i.i.d, then it will turn out that you need Omega to be a categorical product of two probability spaces:

Omega = Omega[i] x Omega[j]

and X[i] (resp. X[j]) to be the same variable X composed with projection onto first (resp. second) factor. This definition of "sampling with replacement" is able to withstand all scrutiny.

Of course, just like in Buzzard's example of ring localization, it was all caused by someone being careless about using equality.


I really like this. I remember my first serious math book, which was this old-school set theory and topology book, and being all excited about learning set theory up until I encountered iterated Cartesian product, and was forced to accept that ((x, y), z) and (x, (y, z)) are supposed to be indistinguishable to me. An answer on StackExchange said that "you will never, ever, ever need to worry about the details of set theory", but I still did.


I kinda have this idea that while psychedelics like LSD are the easiest to catch a ride with, it is specifically tea that can get you furthest. When I was reading "Dream of the Red Chamber" some of the descriptions of tea drinking were really psychedelic, like being transported into a different realm of heavenly purity that permeates everything with effortless perfection, etc. Certainly I never experienced anything like this with any drug.




What’s in that tea? Something more than caffeine if people are taking trips with it.


Green tea also contains l-theanine, which interacts with caffeine uptake. Anecdotal but after having a pot of tea at a tea shop in Sheung Wan in Hong Kong I was high as a kite for a while afterwards in a way I’ve never experienced before or after, although I wouldn’t describe it as psychedelic - more stimulant.


Getting "Tea Drunk" is a common pursuit for many fine tea drinkers, myself included. My partner and I enjoy cultivars that give you a buzz. I went down a rabbit hole in trying different varieties and we found some that can only be drunk at the end of the day after work, else you'd write off the rest of the day. We had an aged white tea once that made us giggle like we were on shrooms, without all the overwhelming side effects or fear of a "bad trip".

I started watching "Mei Leaf", who's a YouTuber with a deep knowledge of tea and gong fu style brewing specifically and was hooked. He's got a video on getting tea drunk specifically that made me hunt the feeling initially: https://www.youtube.com/watch?v=HrLaKX9J8Uo


Wow did not realize this was a thing. Thanks for sharing this video.

OT but the host is using some kind of phone app to pull autofocus … anyone know what that is?


Adding my own experience, I take 200MG L-Theanine supplements daily and there is a noticeable calmness. It used in combination with Caffeine a lot to soothe the jitters. There isn't a known LD50 and I've taken up to 600MG, at best I can say it's borderline euphoric. Nowhere comparable to LSD even at the 20MCG level


That's correct. Puerh tea particularly has very strong "ChaQi" which is (part of) why people prize it so much. Some other teas have it as well, and it's really not just caffeine but something much more interesting and deeper. Real ancient tree puerh (for example [1]) or very old tea are excellent examples of teas with a lot of ChaQi

[1] [link redacted] Edit: both the site and the rambling about this tea and its ChaQi are mine (it's the best example I could think of from the top of my head)


I just skimmed through it for now, but it has seemed kinda natural to me for a few months now that there would be a deep connection between neural networks and differential or algebraic geometry.

Each ReLU layer is just a (quasi-)linear transformation, and a pass through two layers is basically also a linear transformation. If you say you want some piece of information to stay (numerically) intact as it passes through the network, you say you want that piece of information to be processed in the same way in each layer. The groups of linear transformations that "all process information in the same way, and their compositions do, as well" are basically the Lie groups. Anyone else ever had this thought?

I imagine if nothing catastrophic happens we'll have a really beautiful theory of all this someday, which I won't create, but maybe I'll be able to understand it after a lot of hard work.


You might be interested in this workshop: https://www.neurreps.org/

And a possibly relevant paper from it:

https://openreview.net/forum?id=Ag8HcNFfDsg


ReLU is quite far from linear, adding ReLU activations to a linear layer amounts to fitting a piecewise-segmented model of the underlying data.


Well, at all but a finite number of points (specifically all but one point), there is a neighborhood of that point at which ReLU matches a linear function...

In one sense, that seems rather close to being linear. If you take a random point (according to a continuous probability distribution) , then with probability 1, if look in a small enough neighborhood of the selected point, it will be indistinguishable from linear within that neighborhood.

And, for a network made of ReLU gates and affine maps, still get that it looks indistinguishable from affine on any small enough region around any point outside of a set of measure zero.

So... Depends what we mean by “almost linear” I think. I think one can make a reasonable case for saying that, in a sense it is “almost linear”.

But yes, of course I agree that in another important sense, it is far from linear. (E.g. it is not well approximated by any linear function)


Yeah, and we have more than measure zero -- the subsets of the input space on which a fully ReLU MLP is linear are Boolean combinations of hyperspaces. I was coming at it from the heuristic that if you can triangulate a space into a finite number of easily computable convex sets such that the inside of each one has some trait, then it's as good as saying that the space has this trait. But of course this heuristic doesn't always have to be true, or useful.


Everything is something. Question is what this nomenclature gymnastics buys you? Unless you answer that this is no different than claiming neural networks are a projection of my soul


Could looking at NN through the lens of group theory unlock a lot of performance improvements?

If they have inner symmetries we are not aware of, you can avoid waste in searching in the wrong directions.

If you know that some concepts are necessarily independent, you can exploit that in your encoding to avoid superposition.

For example, I am using cyclic groups and dihedral groups, and prime powers to encode representations of what I know to be independent concepts in a NN for a small personal project.

I am working on a 32-bit (perhaps float) representation of mixtures of quantized Von Mises distributions (time of day patterns). I know there are enough bits to represent what I want, but I also want specific algebraic properties so that they will act as a probabilistic sketch: an accumulator or a Monad if you like.

I don't know the exact formula for this probabilistic sketch operator, but I am positive it should exist. (I am just starting to learn group theory and category theory, to solve this problem; I suspect I want a specific semi-lattice structure, but I haven't studied enough to know what properties I want)

My plan is to encode hourly buckets (location) as primes and how fuzzy they are (concentration) as their powers. I don't know if this will work completely, but it will be the starting point for my next experiment: try to learn the probabilistic sketch I want.

I suspect that I will need different activation functions that you'd normally use in NN, because linear or ReLU or similar won't be good to represent in finite space what I am searching for (likely a modular form or L-function). Looking at Koopman operator theory, I think I need to introduce non-linearity in the form of a Theta function neuron or Ramanujan Tau function (which is very connected to my problem).


I would argue that there are a few fundamental ways to make progress in mathematics:

1. Proving that a thing or set of things is part of some grouping

2. Proving that a grouping has some property or set of properties (including connections to or relationships with other groupings)

These are extremely powerful tools and they buy you a lot because they allow you to connect new things in with mathematical work that has been done in the past. So for example if the GP surmises that something is a Lie group that buys them a bunch of results stretching back to the 18th century which can be applied to understand these neural nets even though they are a modern concept.


> what this nomenclature gymnastics buys you?

???

Are you writing off all abstract mathematics as nomenclature gymnastics, or is there something about this connection that you think makes it particularly useless?


I did a little spelunking some time ago reacting to the same urge. Tropical geometry appears to be where the math talk is at.

Just dropping the reference here, I don't grok the literature.


> deep connection between neural networks and differential or algebraic geometry

I disagree with how you came to this conclusion (because it ignores non-linearity of neural networks), but this is pretty true. Look up gauge invariant neural networks.

Bruna et al. Mathematics of deep learning course might also be interesting to you.


What? The very point of neural networks is representing non-linear functions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: