It doesn't even seem to be on GitHub, in fact the source doesn't seem to be list...

qwantim1 · on Jan 7, 2021

No, I think you’re correct. Incomplete source is bad in any world.

Unfortunately, it’s that world we live in for pretty much everything.

Reproducibility? What if all of the source were to depend on part of a CPU instruction set that we stop using? How long must things be reproducible? We don’t even make lab equipment exactly like we used to with the experiments our current sciences are based on.

However, I give a thumbs up to Groundhog for trying to do the right thing.

corty · on Jan 7, 2021

Reproducibility down to CPU bit differences is a sign that you did something wrong. Usually calculation with insufficient precision and no thought given to the range of simulation error. Simulation must be treated like a measurement, there is a maximum precision for your instrument and you have to know and apply it.

And even if you might disagree for the single-threaded case, most things running in parallel will eat that free lunch of bit-identical results due to timing differences.

cowsandmilk · on Jan 7, 2021

Is it not on GitHub at https://github.com/CredibilityLab/groundhog ?

roel_v · on Jan 7, 2021

While this specific project does have a github page, the R world is 'complete amateur, avoid avoid avoid'. It's not really a 'programming language' in the way software engineers would see it. It's more a loose collection of stats functionality that is tied together with text interfactes in a way that somewhat looks like programming to the uninitiated. I mean, batch scripting is technically 'programming', and Excel (even without VBA) is technically Turing complete, but neither of those would be considered 'programming' by software engineers, at least not under an intuitive understanding of what 'programming' is. (by that I mean, it's easy to be pedantic and argue that R and batch files and Excel files are 'programming' because of [xyz] where [xyz] will probably involve real 'definitions' and selection criteria etc; but despite those tools being useful, you can't do real software engineering in them, which you sometimes want/need).

vharuck · on Jan 7, 2021

This argument seems elitist. R is more than just technically Turing complete.

It's definitely a specialized language. It's not the go-to for managing servers or anything with a lot of I/O, but it has those capabilities because they're useful for managing projects. And I'd be hard-pressed to justify using a language for statistical analysis if it doesn't focus on statistical analysis. It'd be like rolling my own cryptography.

You need to differentiate between "base R" (everything that comes with a new install) and community-contributed packages. Base R is amazingly reliable. It has detailed documentation[0].

User-package land is more of a Wild West, that's true. I would personally not use anything that's not on CRAN unless I can walk up to the maintainer's desk (in non-pandemic times).

[0] https://cran.r-project.org/manuals.html

roel_v · on Jan 7, 2021

shrug. It's largely opinion-based, I guess. My pet peeve (which also illustrates my point, but again, in an opinion-based way): there is no documented, 'officially supported' way to get the path of the current script in R. That is not a problem for amateur programmers who don't think about things like robustness, distribution etc, and it's needlessly complicated and bolted on in SAS, too. But it's still silly and indicative of R's typical use cases. Excel is reliable and well documented too, and I still wouldn't call even complicated workbooks 'software engineering'.

And CRAN... well... let's just say that people used to point to CPAN as a strength of Perl, too... All that sort of archives, after the first few years which comprise mostly of contributors with deep knowledge and who can produce high quality libraries, turn into dumping grounds for trivial half-assed 'libraries' under the guise of 'community contributions'. Example: try to do trivial compound interest simulations in R. So basic that it's barealy worth calling 'finance'. There are (at least) three packages on CRAN that claim to do this, except that (depending on which variable in the equation you want to solve for) they all provide only part of the solution, in mostly incompatible ways. And this is because very few of the people putting code into CRAN know how to... well... write good code. This is not an indictment of those people; many of them are much more intelligent than a bunch of us combined. It's just that for them coding is a byproduct, and with good intentions they share what has been useful for them, it just leads to a situation of 'in the land of the blind one eye is king'.

epistasis · on Jan 7, 2021

> you can't do real software engineering

This is completely, 100%, absolutely wrong.

Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.

R has so much good software engineering, that clever people with no software engineering background can easily make their own packages!

And come on, the R language is a masterpiece. It's not cobbled together like JavaScript or bash. It's got impeccable functional programming language pedigree, you can even look at the AST directly of a function directly inside code.

I'm not sure how you came to any of your conclusions, other than not bothering to understand the language to start. It's a beautiful language with a messy, user contributed set of stats code.

huijzer · on Jan 7, 2021

> Of course you can. There's packages, with excellent software engineering structure, that are designed to include documentation and tests.

For me, the problem with R is that the language is inconsistent. Many packages arose to address many problems, but they all feel like a hack on top of the core language. Take the whole Tidyverse; it just does dataframes from R core but then from the ground up. Now, users can choose between the core language dataframes and the Tidyverse dataframes. Same holds for plotting. The core issue, I think, is that the core language misses some essential features which other languages do have nowadays. For example, a type system. In R, since types are missing, everything is a table (dataframe) which I find just weird.

> It's not cobbled together like JavaScript or bash.

But also not as good as my favorite: Julia. Comparing it to Bash is like saying that its better than COBOL. We all know Bash is quite old, but for certain situations it just works.

epistasis · on Jan 7, 2021

The tidyverse is the benefit and the curse of metaprogramming, something that R takes from lisp, and something that has cursed (helped?) C++ since it was added.

As far as type systems, there's really two different types of "types": individual types objects that can have generic functions attached to them, etc. This is not as well known, and there are actually several object systems for typing:

http://adv-r.had.co.nz/OO-essentials.html

But these sort of objects are not quite as commonly created by programmers, because the second type of "types" are much more useful: data frames, which is kind of a vectorization of structs. This is what would be used in data oriented design, which is apparently much more common in modern game design.

Hansi · on Jan 7, 2021

https://github.com/CredibilityLab/groundhog