More

kasmura · on Oct 23, 2024

This is THE book. It is suitable for beginners with basic math background equivalent to CS undergrad I’d say

kasmura · on Aug 12, 2024

I disagree. Ring attention and tree attention are so general that the core ideas are independent of details about modern GPUs. Maybe so for flash attention but not this. I also disagree because these algorithms are fundamentally about enabling long context by distributing across gpus and this would not be enabled by “moored law for gpu hardware”

kasmura · on Aug 12, 2024

Yes, it is just a way of computing the self-attention in a distributed way

kasmura · on May 24, 2024

That cannot be done when using OpenAI API calls as far as I know

coder543 · on May 24, 2024

Nobody in the original post or this entire discussion said anything about OpenAI until your comment.

I thought it was fairly obvious that we were talking about a local LLM agent... if DataHerald is a wrapper around only OpenAI, and no other options, then that seems unfortunate.

aazo11 · on May 24, 2024

The agent is LLM agnostic and you can use it with OpenAI or self-hosted LLMs. For self hosted LLM we have benchmarked performance with Mixtral for tool selection and CodeLlama for code generation.

kasmura · on Oct 28, 2020

Integrated information theory which is mentioned in the article is abstract and supposedly applies to any type of physical system.

kasmura · on March 31, 2020

Is the following quote at odds with what you are saying about 50-way classification?

"On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."

lars · on March 31, 2020

The difficulty of the problem is that of a 50-way classification. If the only goal was to minimize WER, a simple post-processing step choosing the nearest sentence in the training set could easily bring the WER down further. They've chosen to do it the way they did it presumably to show that it can be done that way, and I don't fault them for it.

They claim that word-by-word decoding implies that the network has learned to identify words. This may well be true, but it isn't possible to claim that from their result. For example, let's say you average all electrode samples over the relevant timespan, transform that representation with a FFW neural net, and feed that into the an RNN decoder. It would still predict word-by-word, on a representation that necessarily does not distinguish between words (because the time dimension has been averaged over). Such a model can still output words in the right order, just from the statistics of the training sentences being baked into the decoder RNN.

kasmura · on Dec 17, 2019

I am not much of an expert either, but as I understand it Zermelo-Frankel set theory is the most common foundation of mathematics and it is stronger than Peano arithmetic so I think Gödel's results hold in general.

ProfHewitt · on Dec 18, 2019

Gödel's proof of inferential undecidability (incompleteness) does not work in strongly typed theories because his proposition I'mUnprovable does not exist.

kasmura · on Jan 5, 2012

Internet Access is a NEGATIVE right

kasmura · on Sept 27, 2011

I am excited about this. Competition in the market is good for innovation and the price.

kasmura · on Sept 10, 2011

Pedro Teixeira has made a screencast about Coffeescript in Node.js on his website notetuts.com: http://nodetuts.com/tutorials/16-introduction-to-coffeescrip...