Repo Sync: File Overlay And Merge For Customization

by Admin 52 views
Repo Sync: File Overlay and Merge for Customization

Hey guys! Ever felt like the current file sync system is a bit too rigid? Like, you either get the whole file replaced or nothing at all? That's a real pain when you need to tweak just a few settings in a synced file without losing all the org-wide defaults. Let's dive into how we can solve this with a cool new feature: file overlay/merge support!

Problem Statement: The Struggle is Real

The current file sync system operates in a pretty binary way – it's either a complete file replacement or an exclusion. This presents a significant hurdle when a repository aims to tailor specific fields within a synchronized file while still benefiting from organization-wide default configurations. Think of it like trying to adjust the volume on your favorite song without accidentally skipping it. You want that fine-grained control, right? The struggle is real when you're forced to choose between losing essential org-wide updates or accepting a configuration that doesn't quite fit your repo's unique needs. Let's break down the current workarounds and why they're not ideal:

  1. Excluding the entire file from sync: This is like throwing the baby out with the bathwater. You lose all the valuable org-wide updates, which defeats the purpose of having a centralized sync system in the first place. It's a no-go for maintaining consistency across the organization.
  2. Manually editing after each sync: Talk about a never-ending task! This is about as sustainable as a sugar rush. Every time the file syncs, you have to go in and make your changes all over again. It's time-consuming, error-prone, and frankly, just plain annoying. Who has time for that?
  3. Accepting org-wide config that doesn't fit the repo's needs: This is like wearing shoes that are a size too small – you can do it, but it's not comfortable. You're stuck with a configuration that doesn't quite work for your repo, which can lead to inefficiencies and potential issues down the road. Not the best way to keep things running smoothly.

Real-World Example: Renovate.json Woes

Imagine this: your org-wide renovate.json sets "rebaseWhen": "behind-base-branch" as a safe default. But some of your repos need "rebaseWhen": "conflicted" due to specific CI requirements. Currently, you're stuck choosing between excluding renovate.json entirely or accepting the wrong setting. This is exactly the kind of problem the overlay/merge feature aims to solve. It's about giving you the flexibility to customize while still benefiting from org-wide standards. This feature lets you have your cake and eat it too – org-wide defaults with repo-specific tweaks. Let's get into the proposed solution and how it's going to make your life a whole lot easier.

Proposed Solution: Merge to the Rescue!

To address this, we're proposing adding a merge or overlay configuration option in .github/sync-config.yml. This will allow repos to specify:

  1. Which files should be merged instead of replaced. This is the core of the solution – the ability to selectively merge files instead of wholesale replacement.
  2. Repo-specific overrides that take precedence over org-wide defaults. This ensures that your repo's unique needs are met, while still benefiting from the organization's standards.

Configuration Schema: How it Works

Here's how the configuration would look in .github/sync-config.yml:

sync:
  files:
    # Existing options
    skip: false
    exclude: []
    
    # NEW: Merge/overlay support
    merge:
      - path: "renovate.json"
        strategy: "deep-merge"  # or "shallow-merge"
        overrides:
          rebaseWhen: "conflicted"
          prConcurrentLimit: 5
      
      - path: ".github/workflows/ci.yml"
        strategy: "overlay"
        overrides:
          jobs:
            test:
              timeout-minutes: 60  # Override just this field

In this example, we're configuring renovate.json and .github/workflows/ci.yml to be merged with repo-specific overrides. The strategy option determines how the merge is performed, and the overrides section specifies the values that should take precedence over the org-wide defaults. This approach is straightforward and easy to understand, making it simple to configure and maintain.

Merge Strategies: Deep vs. Shallow

Let's break down the different merge strategies:

deep-merge (default for JSON/YAML):

  • Recursively merge objects/maps. This means that nested objects are merged as well, allowing you to override specific values within complex configurations.
  • Arrays are replaced (not merged). This is a deliberate choice to avoid ambiguity when merging arrays. If you need to merge arrays, consider using objects with keys instead.
  • Repo-specific values override org-wide values at the leaf level. This ensures that your repo's customizations take precedence.

shallow-merge:

  • Only merge top-level keys. This is a more basic merge strategy that only merges the top-level keys of the configuration.
  • Entire nested objects are replaced if overridden. If you override a top-level key with a nested object, the entire nested object is replaced.

overlay (alias for deep-merge):

  • More intuitive name for configuration files. This is simply an alias for deep-merge, but it's more descriptive and easier to understand in the context of configuration files.

The deep-merge strategy is the default choice for JSON and YAML files because it provides the most flexibility and control over the merge process. However, the shallow-merge strategy can be useful in certain situations where you only need to merge the top-level keys of a configuration.

Implementation Flow: How it All Comes Together

Here's how the merge process will work:

  1. Fetch both files: Org-wide template + repo-specific overlay config. The system will fetch both the org-wide template file and the repo-specific overlay configuration file.
  2. Parse both: JSON/YAML parsing based on file extension. The files will be parsed based on their file extension, using either JSON or YAML parsing.
  3. Merge: Apply strategy (deep-merge, shallow-merge). The merge strategy will be applied to merge the two files together.
  4. Validate: Ensure result is valid JSON/YAML. The merged result will be validated to ensure that it's valid JSON or YAML.
  5. Compare: Check if merged result differs from current repo file. The merged result will be compared to the current repo file to see if there are any differences.
  6. Update: Create PR if changes detected. If changes are detected, a pull request will be created to update the file in the repo.

This process ensures that the merged file is valid and that changes are only made when necessary. This reduces the risk of introducing errors into the repo.

PR Behavior: Keeping You Informed

When merge is configured:

  • PR title: chore(sync): sync and merge organization files

  • PR body includes section:

    ## Files Merged with Repo Overrides
    - `renovate.json` (deep-merge strategy)
    
  • Comment in merged files (optional):

    {
      "_org_sync_merged": true,
      "_merge_strategy": "deep-merge",
      "rebaseWhen": "conflicted"
    }
    

This ensures that you're always aware of when files have been merged and what strategy was used. Transparency is key to building trust and ensuring that everyone is on the same page.

Use Cases: Where This Shines

Let's look at some real-world use cases where this feature can make a big difference:

  1. Renovate.json customization

    • Org: "rebaseWhen": "behind-base-branch"
    • Repo override: "rebaseWhen": "conflicted"
    • Repo keeps org-wide label rules, extends config, etc.
  2. Workflow customization

    • Org: Standard CI workflow with 30min timeout
    • Repo override: 60min timeout for integration tests
    • Repo keeps org-wide secrets, runner config, etc.
  3. Dependabot configuration

    • Org: Standard update schedule and commit message
    • Repo override: Different schedule for specific ecosystem
    • Repo keeps org-wide ignore conditions, reviewers, etc.
  4. Issue templates

    • Org: Standard bug report template
    • Repo override: Additional custom fields for specific project type
    • Repo keeps org-wide template structure and labels

These are just a few examples, but the possibilities are endless. This feature empowers you to customize your repos to meet your specific needs, while still benefiting from org-wide standards and best practices.

Implementation Considerations: Under the Hood

File Type Support: Starting Simple

Phase 1 (MVP):

  • JSON files (.json)
  • YAML files (.yml, .yaml)

Phase 2 (Future):

  • Markdown files (append sections)
  • TOML files (.toml)
  • INI files (.ini)

We're starting with JSON and YAML files because they're the most common configuration file formats. We'll add support for other file types in the future, based on demand and feasibility.

Merge Logic: Keeping it Simple

We'll use existing tools to perform the merge:

  • JSON: jq with * operator for merging
  • YAML: yq with merge operator
  • Keep it simple, leverage standard tools

By leveraging existing tools, we can keep the implementation simple and avoid reinventing the wheel. This also makes it easier to maintain and update the merge logic in the future.

Validation: Ensuring Quality

  • Schema validation after merge (if schema available)
  • Syntax validation (valid JSON/YAML)
  • Fail PR if merge produces invalid file

We'll validate the merged file to ensure that it's valid and doesn't contain any errors. This helps prevent issues from being introduced into the repo.

Edge Cases: Handling the Unexpected

  1. Array handling: Always replace (don't merge arrays)

    • Rationale: Merging arrays is ambiguous (concat? unique? position-based?)
    • Alternative: Use object with keys instead of arrays
  2. Conflicting types: Repo override type must match org-wide type

    • If org has {"foo": "bar"} and repo wants {"foo": [1,2,3]}, reject
    • Show clear error in action logs
  3. Null values: Explicit null in override removes the field

    overrides:
      someField: null  # Removes someField from merged result
    
  4. Comments preservation: Not preserved (JSON doesn't support, YAML comments lost)

We've considered various edge cases and have designed the merge logic to handle them gracefully. This ensures that the merge process is robust and reliable.

Performance: Keeping it Fast

  • Merge happens during sync action (no extra API calls)
  • Small overhead: parse → merge → stringify
  • Negligible for typical config files (<100KB)

The merge process is designed to be efficient and have minimal impact on performance. This ensures that the sync process remains fast and responsive.

Alternatives Considered: Why This Approach?

We considered several alternative approaches, but ultimately decided that the overlay/merge approach was the best solution.

1. Template Variables

# renovate.json in org repo
{
  "rebaseWhen": "{{ RENOVATE_REBASE_WHEN | default('behind-base-branch') }}"
}

Rejected: Complex, requires variable management, not as flexible

2. Include/Extend Syntax

# repo renovate.json
extends: "smykla-labs/.github:templates/renovate.json"
overrides:
  rebaseWhen: "conflicted"

Rejected: Requires native support in each tool (Renovate has it, but not universal)

3. Multiple Config Files

// renovate.json (org synced)
{ "extends": ["renovate.repo.json"] }

// renovate.repo.json (repo-specific, excluded from sync)
{ "rebaseWhen": "conflicted" }

Rejected: Not all tools support extends/includes

4. Post-Sync Hook

Allow repos to define .github/sync-hooks/post-sync.sh that runs after sync.

Rejected: Security concerns, complex to implement safely

We carefully evaluated these alternatives and concluded that the overlay/merge approach provided the best balance of flexibility, simplicity, and security.

Rollout Plan: Phased Approach

Phase 1: MVP (JSON only)

  • Implement deep-merge for .json files
  • Add sync.files.merge config schema
  • Update documentation
  • Test with renovate.json across 2-3 repos

Phase 2: YAML Support

  • Add YAML file support
  • Test with workflow files
  • Gather feedback

Phase 3: Polish

  • Add validation
  • Improve error messages
  • Add dry-run preview showing merge result

We'll roll out this feature in a phased approach to ensure that it's stable and reliable. We'll start with JSON files and then add support for YAML files in the next phase. We'll also gather feedback from users and make improvements based on their suggestions.

Related: Building on Existing Work

  • Existing issue: Special handling for renovate.json (detects manual modifications)
  • This feature would replace that special case with a general solution

This feature builds on existing work and provides a general solution for customizing synced files. It replaces the special handling for renovate.json with a more flexible and extensible approach.