Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Recharging the Python standard library (oreilly.com)
53 points by ingve on Oct 11, 2013 | hide | past | favorite | 40 comments


Summary: The author wishes that the third-party requests module were part of the Python standard library (ignoring the wishes of the person who wrote the requests module).

The title is link bait. The "batteries" work fine - some are venerable, but they are well tested and documented.

The author ignores stellar parts of the standard library such as: collections, itertools, contextlib, functools, itertools, sqlite3, hashlib, struct, json, elementtree, etc.

Essentially, the author is bashing the entire standard library because the urllib modules are somewhat dated. IMO, The rant is not worthy of being on the front page of Hacker News.


I agree with you that the title is baiting.

His complaints are essentially: 1. The 'sys' and 'os' modules require too much typing and aren't 'beautiful' enough. 2. 'Requests' isn't stdlib. (because obviously) 3. Rargh time zones are annoying. (not python's fault that we, as people, split time into arbitrary 'zones') 4. I prefer a different testing module.

And his solutions are: 1. To use a wrapper that requires a little less typing. 2. To complain about it. 3. To use a third party lib that 'solves' the issue because it gets updated more frequently. 4. To use a different testing module.

I don't see how any of this means that Python's stdlib has 'dead batteries.' What a garbage article.

On a side note, I feel this is a terrible idea:

>The proposal introduces a provisional stage that allows packages to go into the standard library without enforcing the hard guarantees of backwards compatibility and API stability that previously made standard library inclusion undesirable.

The standard lib should be made of things that are rock solid. You don't build a house on unreliable-but-so-fun-to-use materials, nor should Python mess with stability so that blog writers bitch less.


I get a sense the author just wanted to complain, since he seems to agree with why most of his examples aren't in the stdlib.

Sidenote: the json module lives in the stdlib now but started out as a third party module, simplejson. Not sure what that says vis-a-vis the author's fuzzy thesis.


You mentioned itertools twice. Accident or pun intended?


Hmm, I was going to list additional standard libraries that I use on a regular basis to "prove" the author wrong, but I could only come up with two (in addition to your list): csv & re.

Still, the only thing I use with any frequency that is an outright replacement for standard library functionality is requests and beautifulsoup. Well, if you don't include ipython, which is an absolute necessity.


I can come up with quite a few of them.

    os
    sys
    logging
    os.path
    re
    subprocess
    datetime
    json
    pickle
    itertools
    urllib2
    signal
    traceback
    socket
It has a lot to do with my job - devops for remote clients. I can't rely on the ability to use 3rd party python libraries, and my stuff has to work reliably with few problematic corner cases. The standard library fulfills both of those cases pretty well.

To refer back to the "old batteries" argument, the included batteries may be the old lead acid type, but they're dependable in a lot of cases where newer lithium ion batteries would rather catch fire.


If I may ask, what's wrong with datetime? I may not have encountered unusual cases, but for most uses the date + datetime + timedelta scheme seems to work pretty well.


I don't think anything is wrong with datetime - it's just a stdlib tool that I use on a regular basis. Sorry, wasn't listing just those I think have warts or those replaced by other tools, just libs that I use regularly.


My only gripe with datetime and time are the organization of it. I can never remember whether it's datetime.datetime.now or datetime.now or time.timedelta or datetime.timedelta, and so on.


Yeah, typing datetime.datetime for everything gets pretty old.

I just remember that datetime is the package, datetime.datetime is the class, and datetime.datetime.now() is a @classmethod that returns instances of that class. I do kinda wish it was datetime.now() for the constructor instead.


I also like using argparse. Nice utility library to "power up" helpful scripts.


Argument parsing is an essential utility and part of the standard library in all major programming languages, though. Dating back to getopt becoming a standard in C (or earlier?)

Or do you find Python's argument parsing to be especially notable for some reason?


For a language whose motto is "There should be one - and preferably only one - obvious way to do it", Python has at least four ways to handle command line arguments out-of-the-box:

* sys.argv

* argparse

* optparse

* getopt

There are, of course, additional third party modules (like opster) that also handle arguments.


sys.argv is part of the interpreter. The other three are abstractions over it.

optparse is deprecated (kept around for obvious backward-compat reasons)

getopt is just a different API for people who feel more comfortable with C's getopt API and don't want to learn something else

argparse is the CLI option parser for the future.

tl;dr: There is one way to parse command line arguments: sys.argv. But there are a few abstractions of doing that available in the stdlib.


Is getopt part of the C library? I think it only is POSIX, not C.


I like the use of "dated". urllib is horrible and was never good to begin with.


I get the feeling that overloading the division operator for paths is a really bad idea for clarity... To someone who is unfamiliar with the path.py package, it looks like you're trying to divide some sort of object by a string.

Not to mention the fact that, in his code, he aliased path to p, so it's even more confusing as to what is going on there.


Yeah, not the best example I think. This code is not bad at all:

    filepath = os.path.join('.', 'test_dirs)
    os.makedirs(filepath)
    with open(os.path.join(filepath, 'test.txt'), 'w') as f:
        f.write('hello world')
And it's actually obvious what happens with the file handle after write. I'm not sure if it's still open or not at the end of his example really. I wouldn't be surprised by his example in ruby code (overloading with magic functionality), but in python that just looks weird...


It does make the code a lot clearer to read. There are a lot of other python packages that overload specific operators/magic functions to aid readability. For example, numpy arrays overload the slice operator to great effect.

Also, on the same topic, what about using the '+' operator with strings and lists/tuples for concatenation? Or the '|', '&' and '^' operator for sets (which I have to admit I never use because I can never remember which is which)?


> It does make the code a lot clearer to read.

If, and only if, you're familiar with how the path library works. I had to shift mental contexts away from python and into shell to understand that it was a "shortcut" for os.path.join(). The division operator is not typically associated with path joining in programming languages.

Frankly, it's not really offering any greater level of abstraction for paths; just shortcuts. You still have to call makedirs, open, write, etc. I don't find the minute improvements to os.path.join that terribly compelling.


I agree. I inherited some code that did this and it was not immediately obvious what was going on. I thought the variables being used where numbers, and I was very confused.


I guess its a terrible thing for a dynamic language like Python. boost::filesystem::path has been overloading the '/' operator all along.


The author has some reasonable points. There is old cruft in the standard library, and there are vital 3rd party libraries that will never go into the standard library. I don't think the path example is particularly strong, but things like NumPy, IPython, pytz and requests are essential for some tasks.

The following is rather poor advice, though: "Python programmers should never restrict themselves to the standard library and should be open—even eager—to depend on third party packages that provide the APIs and functionality they need."

Really, you should not be "eager" to depend on things outside the standard library unless you have a good reason to. Every 3rd-party dependency you use adds complexity and you will have to make sure that the latest version doesn't change out from under you. If you are building an open source tool, for example, you may end up supporting multiple versions of the packages you depend on. Sure, use a 3rd-party library if it is worth that overhead, but you should never do it without good reason.


Now that we've (mostly) figured out how to do dependency management, a large standard library has become more of a liability than an asset. As a scala fan I hope we can move more library-type functionality out into, well, libraries, allowing that to evolve on its own schedule, and keeping the core language small.


A while back, while I was a doing Ruby, I remember a discussion that came up regarding moving some of/all of (can't remember) the standard library to Gems so that, as you say, they could "evolve on their own schedule". It'd be interesting if standard libraries of languages adopted this approach. A release of a language would then become the interpreter/compiler and set of vendored libraries deemed the standard library. Then, in projects, if say, the path module gets updated with new features, I can include that version and not wait for the next major/minor release of the language to get new features/bug fixes.


The term “standard library” has a second interpretation that is increasingly relevant in modern programming: as well as meaning “the [one true] standard library”, it could also mean “the library of standards”. It doesn’t necessarily have to provide its own tools. It can also define conventions and frameworks that establish a baseline for interoperability that everyone else’s tools can work with.

I don’t believe it’s realistic to create a standardised toolbox when every new language come along and expect everything in it to still be a good way of doing things ten years later, as requirements change and new ideas come along. This is probably why so many languages today have at least a de facto standard place where you get other libraries when you need them: PyPI, CPAN, Boost, etc.

On the other hand, there are many recurring themes that appear in those libraries, particularly in the interface they offer and their overall design: basic data types and data structures, application-specific concepts like windows and database connections and file formats, architectural tools like publish-subscribe mechanics and test hooks. Standardising these common ideas, so it’s as easy as possible to connect up libraries from different sources or to replace one library with another, offers many benefits in efficiency and maintainability for a programming language’s ecosystem as a whole.

We already see examples of this being done: many languages have a somewhat standardised interface for connecting to SQL-based database engines, for example. We also see some unfortunate examples where it wasn’t done quickly enough, such as the myriad string types everyone defined in C++ because there was no standard type for so long.

I suspect the most successful programming languages of tomorrow may not have anything like as many tools available “out of the box” as Java and Python and C# do today. However, if I were a betting man, I’d wager they will all provide solid foundations for managing third party libraries and much better frameworks and conventions for interoperability between those libraries.


node.js follows this pattern relatively strictly (the standard library is quite small, with everything else pushed into userland)


I am not sure we've figured out how to do dependency management, on any platform.

E.g. even in scala land I got the feeling ivy/maven dependency management weren't universally loved, but maybe I am wrong?


I think we're getting pretty close to figure it out in the world of node with npm.


please correct me if I'm wrong because I just had a cursory look at NPM some time ago, but I believe

* it doesn't have virtual packages (in the sense, "'email-sender' is a package that provides a sendmail() function")

* it hardcodes a max of two dependency sets: runtime and dev. This maybe be better than a freeform Gemfile or a 6-scopes pom.xml but it seems quite arbitrary


Some of us use python as a rapid prototyping language before writing "real" code in c or even (horrors) c++ for performance. This is what makes ctypes so popular, python becomes the glue between calls to well tested low level functionality. So the place of the std libraries in this paradigm is to replace the c-like functionality. In c mkdir does not return the directory name, so os.mkdir does not either, allowing translation of the final prototype to c where it becomes another well tested tool in the next evolution of the application.

I can appreciate the need for higher level libraries in the std, but not at the expense of os, sys, and other 'dead' modules.


Batteries included or not, I can always find battery replacements in Python and that's what I love about it.

Do you want an excellent graph library? you have networkx. Do you need more speed? bind an existing C++ library with SWIG. If I want to use .NET or Java libraries there is IronPython and Jython. And you have two excellent IDEs: Visual Studio and PyCharm.


Batteries included or not, I can always find battery replacements in Python and that's what I love about it.

I agree that the breadth of high-quality python libraries is great... but in a language that believes

"There should be one-- and preferably only one --obvious way to do it,"

it'd be nice if we weren't also looking for replacements for the standard library.


the article mentions pytz, but that's pretty low-level. i wrote a wrapper, called simple-date, that tries to make it easier to use. https://github.com/andrewcooke/simple-date (self, link, obviously; also 3.2+ only)


Slightly off topic, but does anyone know why is PyCrypto standard part of the python library?


Do you mean why is PyCrypto not part of the standard library? Export controls, mostly.

From PyCrypto's website:

"Unfortunately, cryptography software is still governed by arms control regulations in Canada, the United States, and elsewhere. The controls are fairly loose for free/open-source software, but they exist nonetheless."


Ahh, ok, that makes sense.


Who is this article for?


The author.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: