37 Git LFS
Laura Hausmann edited this page 2023-11-19 15:03:32 +01:00

This repository uses Git LFS. Please make sure it is installed before cloning this repository.

Fully offline development

Should you want to work on this project, specifically checking out branches/commits other than the current one, without a reliable internet connection, you will need to pre-fetch the LFS objects by running git lfs fetch <branchname>. If you want a local copy of all LFS data, use git fetch lfs --all instead.

Rebasing your fork

Should your rebase fail with strange errors, please run git lfs fetch upstream --all && git lfs fetch --all && git lfs push origin --all (assuming your remote is called origin and the main repo remote is called upstream) before the rebase command to get the LFS objects in sync.

Fixing up a preexisting cloned repo

If you cloned the iceshrimp repository before the LFS migration, make sure git-lfs is installed, and you have a backup of the repository in case something goes wrong. Then use the following commands to get it back in sync:

git fetch --all
git rebase
git lfs pull
git pull --prune
git tag | xargs git tag -d
git fetch --tags

If you've deleted all branches that still reference the old tree, you can run git reflog expire --expire=now --all && git gc --prune=now --aggressive to massively clean up disk space.

Should you have more remote-tracking branches, run git switch <branchname> && git rebase for them as well. If you have local-only branches, run git switch <branchname> && git rebase origin/dev for each of them. (replacing origin with the name of the remote).

Migrating a fork to Git LFS

First, create a new fork in the Forgejo Web UI. After making sure you have git-lfs installed, clone the new fork into a new local directory, leaving the old repository untouched. Then, execute the following commands for every branch you want to copy:

cd /path/to/old/clone
git switch branchname
git format-patch <last shared commit hash> --stdout /path/to/tmp/folder/branchname.patch

cd /path/to/new/clone
git branch -c branchname
git switch branchname
git am /path/to/tmp/folder/branchname.patch

Then resolve any merge conflicts like you normally would. Once you're done, push your changes & delete the old repository in the Forgejo Web UI.

Copying this repository to a different server

When copying this repository and not using the "fork" function, use the following steps to ensure the LFS files are transferred as well.

  • Before doing anything else, make sure you have a full copy of the LFS data by running git lfs fetch --all.
  • Now, add your new remote or change the url of the existing one, e.g. with git remote set-url origin git@iceshrimp.dev:myusername/new-repo.git
  • Next, run git push origin --all (swap origin for the name of your new remote if necessary)
  • Finally, copy the LFS files by running git lfs push origin --all, again swapping origin with the name of your new remote

And that's it! You should have a complete copy of this repository in your new location. To verify everything worked, check if the logo is displayed correctly in the README.

Migration

Things we had to do in preparation:

  • Merge or close all open PRs as they are unable to be preserved (closed & merged ones are easily fixable, see below)
  • Forcibly orphan all forks of the main repository with a note telling people to re-fork the repository after the migration and manually rebase their patches on the new repo
  • Unprotect all branches
  • Create the backup repo or enable PUSH_TO_CREATE
  • Disable the creation of new forks and PRs during the migration

Example nginx config for the last point:

location /repo/fork/1 {
        add_header Content-Type "text/plain" always;
        return 503 'Forks are disabled while we are migrating to Git LFS';
}

location /iceshrimp/iceshrimp/compare {
        add_header Content-Type "text/plain" always;
        return 503 'PRs are disabled while we are migrating to Git LFS';
}

Here are the commands we used to migrate to git-lfs for future reference:

# Set up variables
folder=iceshrimp-lfs-migration
source=https://iceshrimp.dev/iceshrimp/iceshrimp.git
target=git@iceshrimp.dev:iceshrimp/iceshrimp
backup=git@iceshrimp.dev:iceshrimp/iceshrimp-pre-lfs-migration

# Clone the repo into a fresh directory
git clone --mirror "$source" "$folder"
cd "$folder"

# Backup the current state of the repository to a different repo
git remote add backup "$backup"
git push --mirror backup
git remote remove backup

# Save pre-migration rev-list
git rev-list --all > ~/rev-pre.txt

# Migrate all binary files to LFS
git lfs migrate import --include "*.zip,*.xcf,*.ai,group1-shard?of6,*.mp3,*.afdesign,*.blend,*.glb,*.psd,*.gz,*.woff2,*.enc,*.lockb,*.webp,*.png,*.jpg,*.ico,*.svg,*.gif" --everything

# Strip the now-invalid commit signatures
git remote remove origin
git filter-repo -f --replace-refs=update-no-add
git remote add target "$target"

# Resign commits we have the key for
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --commit-filter 'if [[ "$GIT_COMMITTER_EMAIL" = "laura@hausmann.dev" ]] || [[ "$GIT_COMMITTER_EMAIL" = "zotan@zotan.pw" ]]; then git commit-tree -S "$@"; else git commit-tree "$@"; fi;' -- --all "$(git rev-list --all --committer="laura@hausmann.dev" --committer="zotan@zotan.pw" | tail -n1).."

# Clean up tree
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

# Optimize repo
git reflog expire --expire=now --all && git gc --prune=now --aggressive

# Save post-migration rev-list
git rev-list --all > ~/rev-post.txt

# Upload the rewritten history to the forge
git push --mirror target -f

# Build commit id mappings
echo 'map_hash_max_size 32768;' >> ~/rev-mapping-nginx-map.conf
echo 'map_hash_bucket_size 512;' >> ~/rev-mapping-nginx-map.conf
echo 'map $uri $new_uri {' >> ~/rev-mapping-nginx-map.conf

for (( i=0; i<${#rev_pre[@]}; i++ )); do
  # Save the mapping for future reference
  echo "${rev_pre[$i]} ${rev_post[$i]}" >> ~/rev-mapping.txt
  # SQL commands to fix up all closed/merged PRs
  echo "UPDATE \"pull_request\" SET \"merge_base\"='${rev_post[$i]}' WHERE \"merge_base\"='${rev_pre[$i]}' AND \"base_repo_id\" = 1;" >> ~/rev-mapping-pr.sql
  # Redirect rules for nginx. We tried using rewrites at first, but at 26k rules the performance penalty is quite severe, especially when fetching LFS objects over HTTP.
  echo -e "\\t\"/iceshrimp/iceshrimp/commit/${rev_pre[$i]}\" \"/iceshrimp/iceshrimp/commit/${rev_post[$i]}\";" >> ~/rev-mapping-nginx-map.conf
done

echo '}' >> ~/rev-mapping-nginx-map.conf

Note: We tried using replace refs of various varieties, however as of this Gitea commit, it (and by extension Forgejo) do not respect replace refs, so we had to settle for nginx redirects and updating the database manually.

To find issue comments with commit ids (paste into sudo -u postgres psql -d forgejo):

SELECT regexp_matches("content", '(?<=[^0-9a-f/\-\"])[0-9a-f]{10}(?=[^0-9a-f/\-\"])', 'g') FROM comment;
SELECT regexp_matches("content", '(?<=[^0-9a-f/\-\"])[0-9a-f]{40}(?=[^0-9a-f/\-\"])', 'g') FROM comment;
SELECT regexp_matches("content_text", '(?<=[^0-9a-f/\-\"])[0-9a-f]{10}(?=[^0-9a-f/\-\"])', 'g') FROM issue_content_history;
SELECT regexp_matches("content_text", '(?<=[^0-9a-f/\-\"])[0-9a-f]{40}(?=[^0-9a-f/\-\"])', 'g') FROM issue_content_history;
SELECT regexp_matches("content", '(?<=[^0-9a-f/\-\"])[0-9a-f]{10}(?=[^0-9a-f/\-\"])', 'g') FROM issue;
SELECT regexp_matches("content", '(?<=[^0-9a-f/\-\"])[0-9a-f]{40}(?=[^0-9a-f/\-\"])', 'g') FROM issue;

Once you have a text file with the commit IDs, use a script like the following to fix them (modify the script as appropriate for each of the columns & tables mentioned above. If you get hits for abbreviated commit IDs, you'll have to adjust the WHERE clause and regex_replace search string as well):

readarray -t set < ~/set_issue_content.txt
readarray -t rev_pre < ~/rev-pre.txt
readarray -t rev_post < ~/rev-post.txt

for (( i=0; i<${#set[@]}; i++ )); do
	idx=$(echo ${rev_pre[@]/${set[$i]}//} | cut -d/ -f1 | wc -w | tr -d ' ')
	if [[ ${rev_post[$idx]} != '' ]]; then
		echo "UPDATE \"issue\" SET \"content\" = regexp_replace(\"content\", '(?<=[^0-9a-f/\-\\\"])${set[$i]}(?=[^0-9a-f/\-\\\"])', '${rev_post[$idx]}', 'g') WHERE \"content\" ~ '(?<=[^0-9a-f/\-\\\"])${set[$i]}(?=[^0-9a-f/\-\\\"])';"
	fi
done

Things we had to do after the migration:

  • Run git reflog expire --expire=now --all && git gc --prune=now --aggressive in the forgejo repo directory to clean up all the old refs and actually get the repo size improvements

Things we missed:

  • Disabling email notifications during the migration would've prevented a couple people from being spammed when rewritten commits mentioned issues they were subscribed to
  • Disabling detection of "closes #issue-number" during the migration would've prevented a couple issues from being closed, but those were easily reopened after we noticed it.

As this process rewrites all history, we had to reset the cloned repo on every CI system, and manually fix up all the forks. This was a lot of work, but the efficiency improvements were absolutely worth it.