This appears to be NEITHER your "Archive" NOR your albums in Google Photos. This is basically a collection of media you uploaded to Blogger, Hangouts, and Picasa Web Albums.
Still, doesn't hurt to Takeout your Google Photos every ~6 months.
(I am working on an app that will help you organize and view them, together with your text messages, location history, and other online-only data.)
I would be extremely interested in a Google Takeout viewer if you ever end up releasing one.
I dealt with Google Takeout, trying to export my photos to Apple Photos (when Google was planning to charge money for old Google Workspace accounts), and I found it extremely difficult to deal with the file format. The script I wrote (https://github.com/citelao/google_photos_takeout_to_apple_ph...) ended up being decently reliable, but there were a ton of weird mismatches between the EXIF data in Google Photos metadata and the EXIF data in the photos themselves. Although some of that wonkiness was Apple Photos, not Google.
I'd love to see software that could wrangle the mess :)
It's called Timelinize (might rename it?), and you can follow it here: https://twitter.com/timelinize (Click "Media" on the Twitter account to view a few screenshots for a preview. More to come!) (There's no website or project page yet because I've been busy developing.)
If you want an invite to try out an early dev preview today, follow @timelinize on Twitter and tweet at it, I'll see about getting you into the Discord.
Some background:
Saving a local copy of my Google Photos has been a passion project of mine since ~2014 (before Google Photos even!). For years it was only focused on downloading the data using APIs -- but then we found out that Google strips location data (from your own photos!) if using the API, so I added Takeout support.
The problem is there was no viewer. Well in 2019 I finally started working on a viewer. It has evolved a lot since it's a very ambitious project and there's nothing quite like it.
It's not just Google Photos: it's any photos and videos. It's also for your text messages and emails. And your location history. And contact list. And chat apps. And really, any files you have. It also supports Facebook, Twitter, and Instagram account exports too. Oh, and iPhone backups.
Timelinize is entity-aware, and it can map identities across data sources (with enough info, or with a manual mapping, or some optional heuristics). It's just not a photo gallery.
It's basically a really detailed view of your life and online history. It's neat because I have my family pictures, my text messages between me and my wife when we were dating (and after of course), and there's different views to explore: map, timeline, conversations, gallery, and more to come (calendar, etc).
We can even place non-geolocated data on a map since we can correlate timestamp and entity. So when we went on our honeymoon, I can see text messages received from friends while we were driving to a beach.
It's really quite immersive and magical and I haven't seen anything quite like it.
And everything is stored on your own computer, it's a GUI app and you have to have enough space to store your stuff. The data is just organized as files within a folder on disk, with a SQLite DB holding the index and the small textual items.
What is the correct tool to properly merge a large set of tar.gz files for which may have an enormous overlap of similar files, and some that have been altered just slightly?
Git plus some parsing seems close in that space, as analyzing the files to create a dendrogram like tree of potential alterations to files over time by levenstein distance may be useful to approximate commit history.
However, this doesn't seem to exist or be popular as a tool.
There's vimdiff or meld, but they are extremely manual and tedious to the extent of being pointless to try for something like a large history of takeout tar.gz's.
Throwing in the towel completely, borgfs can be helpful to reduce the amount of space they take by de-duplication on the block level, but this is a terrible solution as it doesn't really track file changes in a reasonable way, etc. It is useful to extract the files into a directory without the tar or gz, but this can also cause issues with how to appropriately organize the directory structure over the history.
Any thoughts or projects that do a better job of this?
> What is the correct tool to properly merge a large set of tar.gz files for which may have an enormous overlap of similar files, and some that have been altered just slightly?
Can you elaborate on this? My understanding is that they should all extract into the same target folder without issues because each archive's set of files is distinct. But maybe I'm just assuming wrongly?
What exactly is your goal, too? It sounds like you are trying to find and de-duplicate visually similar images? Like what do you mean by "enormous overlap" or "altered just slightly"?
The problem isn't one takeout overlapping (multiple zips from one date) it's many takeouts over the years (full history).
So for example in 2001, you make a takeout with 30 zips, and then delete half of your photos off of Google. Then in 2007, you have another 20 zips, and delete 25% of your emails and photos to make more room, 2008 again, on up to now.
So now you have a big folder with many zips, and maybe some extracted folders, because things happen over the years, etc.
What's the best tool to merge all of this into one directory?
Got can help for the notes from Google keep that may have had things appended to or removed, photos can be overlapping a bit so really a set union is all that's required for many files, but some will be slightly different like the Google keep notes.
My best thought is to make some git repo and add things in, but to do a levenstein distance on the bits of each file to check if there is overlap in content and to estimate the 'lineage' of a file if there is significant overlap with another. Effectively you reconstruct the git commit tree with the set of all files over all histories. Then you build the git repo history from all of the files.
This would likely just be a local git repo since it would likely be several terabytes of info, but that would be the general idea I guess.
I just haven't found a good tool to actually do this easily unfortunately, but it seems like it would be a very basic , or commonly used scenario (especially for those 'should-be-a-git-repo' directories that everyone made before knowing about git. You know the ones: 'myfile.v1.doc', myfile.v2.doc', 'myfile.final.doc', myfile.reallyfinal.doc', myfile.finalfinal.doc')
_> So now you have a big folder with many zips, and maybe some extracted folders, because things happen over the years, etc._
Oh, right.
Timelinize can do that. Takeout all your data, then import it into Timelinize. Then delete your Takeout (after Timelinize is finished and stable, of course, heh). Then next time you Takeout, just import it all into Timelinize again. (It de-dupes!) Then delete the Takeout, etc. (Maybe Timelinize can do the cleanup for you someday.)
The de-duping depends on the item being recognizable. Best if the data source provides an ID. Otherwise, things like certain metadata and content can be used to determine duplicates.
I'm not chaxor, but as far as I remember I think you're right:
If I had unzipped all the takeout directories into one giant folder, there'd be no conflicts.
Since I didn't do that, I had to do weird multi-pass parsing, since an album could be split across multiple ZIPs. I get a bit neurotic around backups like this, so I'd have loved some sort of virtualized filesystem that non-destructively represented all of those zips "merged together." But in retrospect, I should have just merged the directories into one folder---would have made parsing easier :)
I don't recall substantial problems with duplicates. Just weird renames and EXIF data mismatches. And since I was trying to archive my data, I definitely didn't want similar photos to be deduplicated.
My problems are probably different than chaxor's, though.
i think give some time to make sure you've changed all your log-ins. there's bound to be accounts out there you've forgotten about and might not be able to use. maybe once you don't get any important emails for 6 months? and then there's like updating and moving all google authenticator entries to something else. things like that. anyway, don't just rush it!
Note to anyone who might be reading this and thinking "that's me" - if it's you, then you should move to a password manager and create a database of all your accounts. I once spent a couple days moving every account I have ever created into KeePassXC, and changing all of their passwords to unique randomly-generated ones. You only have to do this once, and given enough time (making sure you add accounts over time that you need to use but missed), you can be fairly confident that you know every single account you have, anywhere.
At least any account that you'd ever need to log into.
Then, you'd know every account that uses your gmail, but also you'd know the password of every account to change the email in the future without needing a password reset, if you want to delete your email early. Of course, assuming someone doesn't have a breach or something and require a reset anyway, which happens from time to time.
Still good to have that passwords database.
I personally replicate mine between multiple computers and also publish it to my web server on someone else's infrastructure. I will never forget another password again, because I will never need to know another password again. I will also always know what accounts I have or whether I've used a particular service before.
One step further, buy a domain that offers email forwarding. When you sign up to websites, use this domain and have your email forwarded to whatever free email provider you choose. If you change providers you only need to change your domain forwarding email.
Oh yeah. I currently use Firefox Relay for this but they're on nearly every abuse list by this point so I can't even sign up for some websites. The downside of using domain forwarding instead though is that suddenly I'm responsible for my email and that's a risk that I don't want to take.
I do actually accept emails addressed to logandark.net. I just don't rely on it.
The email I got says that Album Archive is shutting down, that I accessed it yesterday, and that I should use Google Takeout to export a single photo that I can't access (?). So I got to Google Takeout and I'm told that I have one export remaining. None of this makes any sense. I have never heard of any of these.
I'm really surprised at how bad Google is at making products given how much money they have and the level of skill the people have there. There are so many problem across all of their business and consumer products that I've seen. I can't see what the draw is to keep using their stuff, especially since they are so good at killing stuff off.
Similar experience. Fortunately there is an activity log link, showing that Dropbox uploaded something which looks like a CD cover there once in 2011. important notice.
Whoever sent this email should be reprimanded. The vast majority of recipients have never used this "product". The images will not be deleted when it's killed because they actually reside in other products such as Blogger and will still be downloadable using Takeout[1], so approximately nobody needed to be notified. But now this is yet another PR cycle for the "Google kills everything" meme.
I got the email and was confused because I've never heard of it, but I see another comment in this thread that it's related to Picasa, which I certainly did use, and really liked. So this really just serves as a little salt on that old wound.
I too got the email and was confused. I thought all my photos went from Picasa to google photos… I actually thought the email might be someone trying phish me.
Google seems to really need a change to management. I’m not sure what’s going on over there.
I got the email, thought, “huh, never heard of it.” Wasn’t curious enough to find out until I saw this post.
Still didn’t understand what exactly this was or how I used it, so I clicked. 30% of the screen (on mobile) is a banner telling me I should be using chrome. Another 40% banner is telling me the site is going away soon soon soon. Another 10% is exhorting me to make connections with my account so others can see my (soon to be deleted) photos. And the remainder of the screen is an empty gray box, with no photos or other content, just gray as far as the eye can see (until you hit one of the aforementioned banners); all that remains in light gray letters is the text “Looks like you’ve reached the end.”
Ah, I guess I had to request the desktop site to be able to upload to imgur. If you still care, this is what I saw when I clicked the link in the email.
it was so bad, I actually took a screenshot and was going to post it, but I guess imgur doesn’t allow anonymous uploads anymore. Wonder how long that’s been the case.
I’ll look for a decent looking photo host that’s not going to mercilessly exploit my metadata and edit and post the screed cap.
And they have every right to be scared. I mean, Google is saying they are deleting something related to photos. Who knows what the hell it is, but it sounds scary.
I still cannot figure out what exactly they are shutting down, and I'm computer savvy.
Google is supposed to know everything about every of their users but they cant figure out how to send emails only to the ones concerned. Such epic fail
That lists just about everything even if an app just got merged 1:1 into another app. It's so dumb and doesn't even add context, may be valid to add in this case but often it's just dumb
When Google finally figures out AGI, we'll know almost immediately. The second it comes on, it will ingest all of Google's data and immediately deduce that its purpose is to dismantle and shut the company down, before marking killedbygoogle.com feature complete at last and dissolving into the digital æther.
"The Last Question" by Asimov is a classic scifi short story that parallels this comment (presumably intentionally). The story is a short read and available for free on archive.org[0].
If there's no migration route, a tool that could be created by one Google developer, it means thousands of people now have work to fix thousands of Blogger posts. Cumulatively ~48 hours of work for a tool has just turned into at least 48000 hours for everyone else.
What's worse is, people will fix their blogs but I bet Google will use broken links as (another) justification to shut down Blogger next.
That means those 48000 hours will be an utter waste of time.
If you ever needed proof Google have no fucking strategy or clue anymore this is it (that and shutting down Google Music when YouTube Music was incomplete, people just moved to competitors).
This is only a guess, but I get the impression Album Archive is some sort of aggregator service. The original Blogger images appear to still be stored in Blogger's own media manager.
For example, many images referenced from Blogger posts contain links with URLs like https://1.bp.blogspot.com/... further strengthening the case that these images originate from Blogger, not the Archive service.
So in summary, my guess is that shutting down Album Archive will not affect Blogger media files.
Another case for this conclusion, is that it seems totally insane to delete all of Blogger's media files, and not inform Blogger users directly, but instead do it through some obscure Archive service notice.
Would be nice to get an official confirmation on this, but let's say we can make an educated guess that people's Blogger photos are probably safe.
Hedge funds are demanding Google downsize more than it already has. Layoffs look bad so they may be disguising more layoofs by shutting down products - Domains especially likely requires a few hundred engineers and UI/UX people across the many aspects of the service.
For the life of me I cannot figure our what Album Archive is. I remember being similarly confused when Google shut down... what was it? Photos? Picasa? And then nothing really happened.
I see I have pics I uploaded to Blogger showing up in this thing called Album Archive. Will they be deleted when it's gone? Fuck if I know.
Google, get your shit together. Your services and mails about them are goddamn confusing.
And I work with computers for a living! If I can't figure this out, sure as hell my parents won't either.
I just downloaded the archive it created: basically a couple of random photos. The archive was 3.3MB. The last activity was from 2013 when somebody else took a photo of me and shared it with me somehow via picasa web. The other activity was from 2007.
I had never heard of album archive, but apparently I have one. I started the export which might take 'hours or even days'. Looking forward to discover what is in there.
Still, doesn't hurt to Takeout your Google Photos every ~6 months.
(I am working on an app that will help you organize and view them, together with your text messages, location history, and other online-only data.)