I'd say a gradient is usually a covector / one-form. It's a map from vector directions to a scalar change. ie. df = f_x dx + f_y dy is what you can actually compute without a metric; it's in T*M, not TM. If you have a direction vector (e.g. 2 d/dx), you can get from there to a scalar.
I'm not a big Riemannian geometry buff, but I took a look at the definition in Do Carmo's book and it appears that "grad f" actually lies in TM, consistent with what I said above. Would love to learn more if I've got this mixed up.
This would be nice, because it would generalize the "gradient" from vector calculus, which is clearly and unambiguously a vector.
It's probably just a notation/definition issue. I'm not sure if "grad f" is 100% consistently defined
I'm a simple-minded physicist. I just know if you apply the same coordinate transformation to the gradient and to the displacement vector, you get the wrong answer.
My usual reference is Schutz's Geometrical Methods of Mathematical Physics, and he defines the gradient as df, but other sources call that the "differential" and say the gradient is what you get if you use the metric to raise the indices of df.
But that raised-index gradient (i.e. g(df)), is weird and non-physical. It doesn't behave properly under coordinate transformations. So I'm not sure why folks use that definition.
You can see difference by looking at the differential in polar coordinates. If you have f=x+y, then df=dx+dy=(cos th + sin th)dr + r(cos th - sin th)d th. If you pretend this is instead a vector and transform it, you'd get "df"=(cos th + sin th)dr + (1/r)(cos th - sin th)d th, which just gives the wrong answer.
To be specific, if v=(1,1) in cartesian (ex,ey), then df(v)=2. But (1,1) in cartesian is (1,1/r) in polar (er, etheta). The "proper" df still gives 2, but the "weird metric one" gives 1+1/r^2, since you get the 1/r factor twice, instead of a 1/r and a balancing r.
And I'm just a simple applied mathematician. For me, the gradient is the vector that points in the direction of steepest increase of a scalar field, and the Jacobian (or indeed, "differential") is the linear map in the Taylor expansion. I'll be curious to take a look at your reference: looks like a good one, and I'm definitely interested in seeing what the physicist's perspective is. Thanks!