Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The code is written in Rust. The core algorithm can be found here: https://github.com/allthemusicllc/atm-cli/blob/master/src/ut... (Specifically the function gen_sequences. It basically uses multi_cartesian_product to iterate over all permutations.)

Isn't it a bit of wasted space to store the generated data on archive.org (https://archive.org/download/allthemusicllc-datasets)? I mean, as we see, it's trivial to create them. The code above is kind of a self-extracting archive, so it's just the same thing but in compressed form. And I guess it should not matter (legally) whether you store it compressed or uncompressed (or less compressed).



At the very beginning of this excellent interview https://www.youtube.com/watch?v=sfXn_ecH5Rw that I found in the comments here, they explain this.

Apparently, for this to work, they have to "affix them to a physical medium".

Strongly recommend watching that video! Excellent info.


Hm, they don't really explain this, or do they? They just say "by saving those melodies to a hard drive they have affixed them to a physical medium which is all that is necessary to copyright them".

So, by the same argument, they could copy the self-extracting archive to a hard drive, and then have the same thing, or not?

They even say that they already store it in a compressed form on the hard drive.

What if some future compressor (future ZIP format) is clever enough to see that you are going to save all possible permutations of something, and then saves this is a much better compressed way. Then suddenly when compressors do this, it means when you use such a compressor, you do not have copyright on the data anymore?


Very true.

I think the focus has to be on the spirit not the technology. Saying to a lawyer/judge/jurist "hey, this script can generate every possible melody!" _feels_ quite different from saying "every single melody is already written down, on this hard-drive I am waving at you! Look!"

Tangent: reminds me of the Hutter Prize for compression (hey, they prize pot recently increased to €500K and I submitted it to HN but it didn't get any votes http://prize.hutter1.net/ https://en.wikipedia.org/wiki/Hutter_Prize).


Takes any algorithm that can generates the digits of pi. It is proven that all the sequences of digits are present somewhere in these digits. Can you claim copyright on all the books of the universe because you can show the algorithm that can generate every possible books ?


"It is proven that all the sequences of digits are present somewhere in these digits"

This has never been proven for the digits of π. Of course, there are other digit-sequences for which it is trivially true. For example, just concatenating together every finite sequence of digits, in some suitable order.


I stand corrected. For pi, it is only a conjecture. My point remains true for "Champernowne sequence".


Without being any kind of copyright expert, this argument doesn't make much sense. If you have the relevant indices into pi, I guess you could claim copyright. The index will be much larger than the original book though. Essentially you have a really bad compression algorithm. I'm not going to get very far claiming that because my decompression algorithm could output any sequence, all sequences are mine.


It has already been done (even though it is only a conjecture):

https://news.ycombinator.com/item?id=13869691


So in other words you need to have used the compressor to compress what you are copyrighting, and that which you used it on must have been physically stored in its entirety. Is it provable that this all occurred, if there's no requirement to still have the original? It would require that you also supply the compressor to the court, so that somebody can use your self-extracting magic decompressor, recompress the result, and end up back in the same place. But you could make your compressor simply emit your decompressor, ignoring the input!


It is to avoid FUD. It ensures no one can derail the conversation into a debate about what constitutes being ‘affixed to a physical medium’.

“But are the tunes really affixed to the medium?”

“Yes, this hard drive in my hand literally contains a bunch of MP3 files” is a lot stronger than: “Yes, in a way, because this tune-generating script technically constitutes a self-extracting archive, your honour.”


Presumably musicians/composers using purely electronic tools such as abletron live have no problem asserting copyright already.


The musican has to cut a CD or something phyiscal.

https://www.tunecore.com/guides/copyrights-101 the first paragraph is:

> For a work to be “copyrightable,” it must be original and fixed in tangible form, such as a sound recording recorded (affixed to) on a CD or a literary work printed (affixed to) on paper.


is a hard drive not a tangible form? Come to think of it, so is a brain, and everything else capable of storing information. That language is atrocious.


It’s about “permanence”, not being able to copyright a live performance that wasn’t recorded for example. It just means you need to be able to distribute / replay the recording, and that means using one of the current common audio technologies.


It made a lot more sense when it was originally written.


Saying 'I don't actually have that melody written down anywhere, but I easily could have, if I'd run this program' feels a bit like saying 'I don't have that melody written down anywhere, but I easily could have if I'd just thought of it first'.

You may say that in fact having the code that generates the data is effectively the same as having the data. This may convince reasonable, rational people, but the fact that we're dealing with the law that we are should tell you that we're not dealing with reasonable, rational people.


Its just the practicals of copyright law. You don’t copyright the idea of a song, a plan for a song, or code that generates a song. You register the tablature accompanied by a recording. See https://medium.com/@dawn_ellmore_employment/what-does-tangib...


You can trivially write a program which will enumerate all the possible byte sequences in the universe. So you can claim that this tiny program contains everything including god. So you can claim copyright for everything.


> You can trivially write a program which will enumerate all the possible byte sequences in the universe. So you can claim that this tiny program contains everything...

Not to get too far off topic, but even an infinitely long byte sequence cannot represent all numbers. It's possible to construct something not representable by that sequence via Cantor's diagonal argument: https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument


It doesn't contain the floating point representation of that number, which would be infinitely long, but it does contain the UTF8-encoded constructive proof that uniquely identifies that number.


Incorrect.

Most real numbers have infinite-length constructive proofs.

Uncountable is uncountable is uncountable. If any correspondence/encoding existed between naturals and reals, the reals would be countable. (But they are not, so it is fool's errand to search for such an encoding.)


Surely not? If there are more numbers that can be represented in any amount of bytes (as shown by the diagonal argument), then you cannot represent a constructive proof for each of them in any amount of bytes.


Kind of. A constructive proof by definition means something like you can explain exactly what the number is in a finite number of bytes.

The standard diagonalization argument is a constructive proof, identifying a specific number. Normally a constructive proof is preferable. But with a bit of hand waving you can make it into a non-constructive proof that there are "unknowable" numbers that cannot be described in any finite amount of bytes.


In practice constructible for reals mean that you can approximate them with arbitrarily small know precision e.g. a sequence of retional number numbers q_n each no more that 1/n apart from the actual real number.

The point is that such a sequence needs to be constructive in the usual sense, so there are still only a countable number of them.


The argument still applies, yes. Nevertheless the string still contains all finite substrings, which includes all English sentences describing such numbers.

The implication is that human minds wouldn't be able to represent it either, at least not with language.


Well, it could contain the floating-point representation, since floating-point is a finite representation.

But it couldn't contain the real number representation.


Perhaps I am mis-undertanding the diagonal argument, but it doesn't appear to show what you claim it shows.

The diagonal argument seems to be a proof that there are uncountably many infinite byte sequences. So while it proves that it would impossible to "enumerate" every infinite byte sequence, it doesn't prove that there exists a number that cannot be represented by some infinite byte sequence.

Indeed, I believe the opposite can be shown to be true by constructing a mapping from every finite number to an infinite byte sequence. ASCII trivially provides such a mapping for real numbers, and finite-tuples of real numbers (such as imaginary numbers) can be mapped by alternating digits from each element of the tuple.

Edit: The key distinction is that the set of finite byte sequences is infinite, but countable, while the set of infinite byte sequences is not only infinite, but also uncountable. Which of these two sets is deemed the set of "possible byte sequences" seems to be the critical distinction and the turn of phrase "all possible byte sequences in the universe" seems to imply all byte sequences of finite length.


All the possible byte sequences contain all the possible programs.

All the possible programs can receive input of infinite length and output bytes of infinite length depending on input.

So if we can supply any input to any program and treat its output as part of result, does this argument still work? Is there such a number that can't be constructed by some program receiving some input?


> All the possible byte sequences contain all the possible programs.

Correct.

> Is there such a number that can't be constructed by some program receiving some input?

Yes. Most numbers are not representable by a program...unless you allowed a program to be infinite length, in which case the first statement is no longer true.

There's nothing particularly unique about a digit-based encoding for Cantor's diagonal argument.


> in which case the first statement is no longer true.

Why is that? What would be an example of an infinite program that could not be represented by an infinite byte sequence? Indeed, it seems trivial to map an infinite series of machine code instructions to an infinite series of bytes.

Edit: It seems like the first statement is false only if you use a different understanding of "possible" for "possible byte sequences" and "possible programs" where the former excludes infinite length and the latter does not.


I internally editorialized your first statement to be "all the possible [finite] byte sequences contain all the possible programs".

Because you cannot enumerate all infinite byte sequences, according to Cantor's diagonal argument as linked earlier. [1]

Calling certain numerical representations "computer programs" in no way changes the fundamentals. There is no way -- no matter how clever your encoding -- to enumerate all real numbers.

[1] https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument


I am not the person you were originally responding to.

You don't have to enumerate all real numbers to create a mapping from real numbers to infinite byte sequences.

I am merely pointing out that the first statement only becomes false if you editorialize it to "all the possible [finite] byte sequences contain all the [infinite] possible programs". If you treat the meaning of "possible" consistently in that sentence, I believe it remains true regardless of whether you define infinity as possible or not.


Yes, you're right. I did not do a clear job of explaining the flaw.


> even an infinitely long byte sequence cannot represent all [real] numbers

You misread. The previous comment said

> you can trivially write a program which will enumerate all the possible [finite] byte sequences in the universe

No one is interested in your infinitely long musical score.


> an infinitely long byte sequence cannot represent all numbers

Okay, all natural numbers then. I don't think that changes the argument.

(Reals are a funny thing because any format you choose has numbers that are infinitely long in that format. Which is basically the same thing you said.)


Not all numbers, such as real and complex numbers. But you indeed can represent all integers, all possible programs, all digital music, digital images, digital everything. Not sure if those other music / images which cannot be digitalized are so relevant.


All you have to do is enumerate an infinity of infinities. It's no problem at all.


One could make the same argument for a mathematician with a typewriter. The fact that it could come up with a particular sequence is, I think, less important than whether or not ot actually has recorded a particular sequence.


Claiming copyright of everything won't save you from producing all that child pornography and hate speech, not to mention unauthorized copies of state secrets. ;)


For the state secrets, the state would have to identify the specific strings that constitute the violation, so that's unlikely to be a problem in practice.


Nah, they have no problem identifying the string, they'll just slap you with a gag order, which itself will also have a gag order.


That's all great until the judge asks for a demonstration to show it can generate the material you're presumably suing over.


At that point, you simply explain Cantor's diagonalization proof and the concept of countable and uncountable infinities.


But is it enough to prove that it could produce any melody? After all, couldn't a binary adder produce every executable binary possible? Well, eventually, and with enough memory.


No, because each thing you want to claim copyright on must be “fixed in physical form.”


By that logic, the library of Babel owns the copyright on every piece of literature that could be written, which would be a terrible idea.

https://libraryofbabel.info/


They don't pre-generate all the "books", right? So they can't claim the copyright (it's not "affixed to a physical medium")


So what is the difference between a program that views some text by reading it from a huge file and a program that views the same text by generating it on the fly? Especially if the file is generated by the very same process that happens on the fly in the second case? Why would you even have to actually build one of the two programs, enumerating all possible character sequences of some given length is a somewhat obvious idea and actually implementing that idea, especially if the text is only generated on the fly, does not bring the text into existence substantially more than just pondering the idea.

I guess for matters of copyright you have to produce a specific artifact, just blindly enumerating all possible artifacts is not good enough. Which would also mean that the existence of all those MIDI files will not make a difference.


There is something missing here.

If I were to write a "random book generator" that just regurgitate infinite text then I can copyright a pair starting index and length, but not simply all possible finite sequences of letters.


I would argue writing a book, composing a song, or inventing a device should be viewed as a search process. You are wandering around in the space of all possible books, songs, or devices looking for one that is interesting to read, pleasant to listen to, or useful to use. Your work is mainly identifying special items in the vast space of possible items, making it physical once identified is only secondary, at least for the purpose of this discussion. So writing a computer program to generate all possible songs is finding a specific and - at least somewhat - useful program among all possible programs and - ignoring its triviality and the general questions about copyright - seems copyrightable to me. On the other hand using this program to turn the space of all possible songs from something in your head into a huge pile of MIDI files on a disk seems not copyrightable to me because you do not single out interesting songs. Finding a good song did not get significantly easier, searching through and listening to a sea of MIDI files instead of playing all possible combinations of notes on a piano and judging the sound of it.


By that logic, you wouldn't be able to claim copyright on works that are distributed compressed or encrypted.


Nice link; thanks!


I'm not sure if it would hold up legally anyway, but with it already being generated I can see it working out a bit better.

If you only provide the code as a means of generating any melody, you could just as well replace it with any music instrument and claim you can generate any melody with it.


Ad absurdum (or maybe not so much?), any universal Turing machine can be programmed to list every possible program, specifically the one generating every melody. So the copyright claim could then just as well point to Conway's game of life or rule 110.


Yep. They could've used that space for rainbowtables. I'm old enough to remember when someone from shm00 had rainbowtables available for download, but I guess the network bills added up or the feds threatened them.


I guess you need it stored somewhere to claim it existed on a given date...

If someone is using archive.org for this purpose, please consider donating :-)


> Isn't it a bit of wasted space to store the generated data on archive.org

Could the algorithm be considered a form of compression?


Yes, mathematically. But copyright requires copying, strictly speaking, so you need a copy from which to have copied.

An algorithm that generates all possible English sentences doesn't demonstrate that I copied this sentence from it, and the balance of probabilities (used in tort cases) suggests that I came up with the sentence, run-on as it is, rather than performing that algo and then selecting that sentence. Moreover 17USC (102? sorry I don't recall) says that media needs to be "fixed" to acquire copyright; so me having to perform the algo makes it no impediment to my owning copyright of a sentence that, if performed, the algo would by expected produce (which is sensible really, perhaps the algo is errant and can't make this sentence, maybe it only forms grammatical sentences ...


Does the algorithm represent the Kolmogorov Complexity of the dataset?


Of the "all possible melodies" dataset? Yes. Of the "all possible melodies humans would conceivably consider music, and nothing else" dataset? Not at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: