Quantcast
Channel: Antarctica Starts Here.
Viewing all articles
Browse latest Browse all 210

LOCKSS and Git.

$
0
0

The archival community has a saying: LOCKSS. Lots Of Copies Keep Stuff Safe.

Ultimately, if you trust someone else to hold your data for you there is always a chance that the service can disappear, taking your stuff with it. A notorious case in point is Google - the Big G has terminated so many useful services that there is an online graveyard dedicated to them. Some years ago a company called Code Spaces, which was in pretty much the same business as Github was utterly destroyed in an attack. Whoever cracked them got into their Amazon EC2 control panel left a ransom note, and when the company investigated the attackers wiped everything, from the virtual machines to the code repositories to the backups. Everybody lost everything.

While anybody who's cloned a Git repo can, in theory reconstruct the project anywhere they want, there are still repercussions of a hosted project suddenly vanishing. For starters, it's demoralizing as hell. If you lost your project hosting it's a real kick betwixt wind and water. Collaborators on the project may (and in the past, have) kicked the project in the head and given up after such a loss. Additionally, the value provided by a project hosting outfit lies in the bug tracker integrated with the code repository, occasionally the wiki, and integration with CI/CD (continuous integration/continuous delivery) pipelines. While there are software packages out there that integrate all of these things with the code repository, like Fossil, good luck getting anybody to start using them.

So, what can you do?

Of course, you can always stand up your own project hosting software someplace. There are some excellent alternatives out there like Gitea or Gitosis, but a mere application just doesn't go far enough because you have to use it correctly as well. Plus, you have to figure out just how you're going to use them. So, here's what I did:

Let's take my public repository of Huginn networks as an example. It's on Github, which is simultaneously the de facto hub of the open source community these days as well as a potential single point of failure. So on Leandra (a machine I control, because she's installed in my office) I set up a bare Git repository (slightly reformatted for clarity):

{22:49:18@SatDec19}[drwho@leandra:(3)~]$mkdirexocortex-agents{15:04:10@SunDec20}[drwho@leandra:(3)~]$cdexocortex-agents/{15:04:13@SunDec20}[drwho@leandra:(3)exocortex-agents]$gitinit--bareInitializedemptyGitrepositoryin/home/drwho/exocortex-agents/{15:04:17@SunDec20}[drwho@leandra:(3)exocortex-agents]$ls-alFdrwxr-xr-xdrwhodrwho98BSunDec2015:04:172020./drwxr-xr-xdrwhodrwho4.6KBSunDec2015:04:102020../drwxr-xr-xdrwhodrwho0BSunDec2015:04:172020branches/.rw-r--r--drwhodrwho66BSunDec2015:04:172020config.rw-r--r--drwhodrwho73BSunDec2015:04:172020description.rw-r--r--drwhodrwho23BSunDec2015:04:172020HEADdrwxr-xr-xdrwhodrwho460BSunDec2015:04:172020hooks/drwxr-xr-xdrwhodrwho14BSunDec2015:04:172020info/drwxr-xr-xdrwhodrwho16BSunDec2015:04:172020objects/drwxr-xr-xdrwhodrwho18BSunDec2015:04:172020refs/

Then I set up a Git remote which, as the name implies is a Git repository accessed remotely (i.e., on a different machine).

{15:05:14@SunDec20}[drwho @ windbringer exocortex-agents]()$gitremoteaddleandrassh://leandra/home/drwho/exocortex-agents{15:09:01@SunDec20}[drwho @ windbringer exocortex-agents]()$gitremote-vgit.hackers.townssh://git@git.hackers.town:2222/drwho/exocortex-agents.git(fetch)git.hackers.townssh://git@git.hackers.town:2222/drwho/exocortex-agents.git(push)gitlabgit@gitlab.com:virtadpt/exocortex-agents.git(fetch)gitlabgit@gitlab.com:virtadpt/exocortex-agents.git(push)leandrassh://leandra/home/drwho/exocortex-agents(fetch)leandrassh://leandra/home/drwho/exocortex-agents(push)origingit@github.com:virtadpt/exocortex-agents.git(fetch)origingit@github.com:virtadpt/exocortex-agents.git(push)

If you look at the above output, you'l note that I have multiple remotes for that code repository. The new one (leandra) I just added breaks down like this:

leandra ssh://leandra/home/drwho/exocortex-agents (fetch)leandra ssh://leandra/home/drwho/exocortex-agents (push)
  • leandra - The name of the remote. You refer to it by name for convenience.
  • ssh:// - The remote is accessed over SSH, so I can work with it at home.
  • leandra - Leandra's hostname.
  • /home/drwho/exocortex-agents - Full path to the repository on Leandra.
  • (fetch) - This means that I can pull from this copy of the repo with the URL on that line.
  • (push) - This means that I can also push to that copy of the repo with the URL on that line.

At the moment it's empty. Let's fix that.

{15:17:05@SunDec20}[drwho@windbringerexocortex-agents]()$gitpushleandraX11forwardingrequestfailedEnumeratingobjects:67,done.Countingobjects:100%(67/67),done.Deltacompressionusingupto12threadsCompressingobjects:100%(67/67),done.Writingobjects:100%(67/67),27.90KiB|1.16MiB/s,done.Total67(delta35),reused0(delta0),pack-reused0Tossh://leandra/home/drwho/exocortex-agents*[newbranch]master->master

Now there is a full copy of the repo in question on Leandra. Let's test it.

{15:18:26@SunDec20}[drwho@windbringerexocortex-agents]()$cd~/tmp{15:18:27@SunDec20}[drwho@windbringertmp]()$gitclonessh://leandra/home/drwho/exocortex-agentsCloninginto'exocortex-agents'...X11forwardingrequestfailedremote:Enumeratingobjects:67,done.remote:Countingobjects:100%(67/67),done.remote:Compressingobjects:100%(67/67),done.remote:Total67(delta35),reused0(delta0),pack-reused0Receivingobjects:100%(67/67),27.90KiB|595.00KiB/s,done.Resolvingdeltas:100%(35/35),done.{15:18:41@SunDec20}[drwho@windbringertmp]()$cdexocortex-agents/{15:18:45@SunDec20}[drwho@windbringerexocortex-agents]()$lsbutterfly-in-china.jsonsearx-answering-api-examples.jsoncoronavirus-news-agents.jsonshake-rattle-and-roll.jsondemo-weather-forecaster.jsontest-matrix-integration.jsonelephant.jsontest-scenario.jsonmastodon-integation-demo.jsontripwire.jsonREADME.mdtwitter-activity-monitor.jsonsample-rss-feed-consumer.jsonuser_credentials.jsonsearcherizer.json

There we go.

As you can see from earlier that particular project has a bunch of remotes. Now, when I'm working in a repository I have to push updates to each and every one of them. I could push to each one in sequence but that kind of sucks as a workflow because it's easy to forget things. There's an easier way that someone showed me else.net (I wish I could remember whom - please ping me and I'll credit you). When you use Git you can set up a .gitconfig file in your home directory to set some personal defaults. Here's mine:

{15:23:02@SunDec20}[drwho @ windbringer exocortex-agents]()$cat~/.gitconfig[user]email=drwhoatvirtadptdotnetname=TheDoctorsigningkey=0x807B17C1[push]default=simple[alias]pushall=!gitremote|xargs-L1-P0gitpush--all --follow-tags

The [user] and [push] bits are there because Git yells at you if they're not set, which is a bit of a misfeature as far as I'm concerned. But it is what it is. It's the [alias] block that is of interest to us. Here's what it means when you break it down:

  • pushall - The name of the new git command to create.
  • !git - Run the command git in a subshell.
  • remote - List just the names of the configured remotes, without their URLs.
  • | - Run the output into another command.
  • xargs - A basic command line utility (manpage) that basically means "for every thing you pass me that is separated by a newline or whitespace, I will do the following thing to it."
  • -L1 - Take at most one full line from the input to xargs at a time.
  • -P0 - Run as many processes simultaneously as possible. This basically amps offxargs runs. You probably don't need this but I find it handy.
  • git push - Push new commits.
  • --all - Push all branches with new commits, all at once.
  • This is a thing folks usually do at work. If it's just you there isn't really much of a need for this. The command line option won't hurt anything, though.
  • --follow-tags - Also push all annotated tags that have any changes.
  • Same. If you use tags, you know. If you don't use tags, don't worry.

Once the above line is in your ~/.gitconfig file you can use it regardless of what you're working on. Let's try it out:

{15:25:36@SunDec20}[drwho@windbringerexocortex-agents]()$gitpushallX11forwardingrequestfailedEverythingup-to-dateHostkeyfingerprintisSHA256:nThb...HostkeyfingerprintisSHA256:HbW3...HostkeyfingerprintisSHA256:IyW9...X11forwardingrequestfailedonchannel1X11forwardingrequestfailedonchannel1Everythingup-to-dateX11forwardingrequestfailedonchannel1Everythingup-to-dateEverythingup-to-date

As you can see I just pushed all of my changes (there weren't any at the moment I wrote this, but just pretend there were) to all three remotes. The output is a little out of order due to the -P0 argument to xargs, but that's okay.

And there we go. I hope you find this useful. Happy hacking!


Viewing all articles
Browse latest Browse all 210

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>