The genetic connections between DNA repair pathways and human cancer predisposition have fueled interest in the proteins that recognize and repair specific sites of DNA damage. The repair enzymes are remarkably conserved from bacteria to fungi to humans, underscoring the premium placed on maintaining genomic integrity in the face of a mutagenic burden. DNA is susceptible to damage caused by errors committed during replication and by environmental factors, such as radiation, oxidants, or alkylating agents. Repair reactions involve the excision of chemically altered or mispaired bases from the DNA duplex. Resulting gaps are filled in by DNA polymerases; this reaction leaves a nick at or flanking the site of repair. An analogous process occurs during chromosomal DNA replication, whereby the 5’-RNA segments that prime discontinuous synthesis of Okazaki fragments are excised, and the intervening gaps are filled in by DNA polymerase.
The DNA repair and replication pathways converge on a common final step in which the continuity of the repaired DNA strand is restored by DNA ligase, an enzyme that converts nicks into phosphodiester bonds. Nicks are potentially deleterious DNA lesions that, if not corrected, may give rise to lethal double-strand breaks. Accordingly, the total loss of DNA ligase function is lethal.
DNA Ligase Reaction
DNA ligases catalyze the joining of a 5’-phosphate-terminated strand to a 3’-hydroxyl-terminated strand. Ligation depends on magnesium and a high-energy cofactor, either ATP or NAD+. The reaction mechanism involves 3 sequential nucleotidyl transfer reactions. In the first step, nucleophilic attack on the alpha-phosphorus of ATP (adenosine triphosphate) or NAD+ (nicotinamide adenine dinucleotide) by ligase results in release of pyrophosphate or NMN (nicotinamide mononucleotide) and formation of a covalent intermediate (ligase-adenylate) in which AMP is linked via a phosphoamide (P-N) bond to the epsilon-amino group of a lysine. In the second step, the AMP is transferred to the 5’-end of the 5’-phosphate-terminated DNA strand to form DNA-adenylate — an inverted pyrophosphate bridge structure, AppN. In this reaction, the 5’-phosphate oxygen of the DNA strand attacks the phosphorus of ligase-adenylate; the active-site lysine side chain is the leaving group. In the third step, ligase catalyzes attack by the 3’-OH of the nick on DNA-adenylate to join the 2 polynucleotides and liberate AMP.
Living organisms comprise 3 domains: eubacteria, archaeabacteria, and eukaryotes. All organisms encode 1 or more DNA ligases. The ligases are grouped into 2 families, ATP-dependent ligases and NAD+-dependent ligases, according to the nucleotide substrate required for ligase-adenylate formation. ATP-dependent DNA ligases are found in all 3 domains. NAD+-dependent DNA ligases (LigA) are ubiquitous in bacteria, where they are essential for growth and present attractive targets for anti-infective drug discovery. NAD+-dependent ligases are encountered only sporadically outside the bacterial domain of life, e.g., in halophilic archaea and certain DNA viruses, and were presumably acquired in these taxa by horizontal gene transfer.
Eukaryotic Cellular ATP-dependent Ligases
ATP-dependent DNA ligases are found in all eukaryotic species. Mammalian cells contain four DNA ligase isozymes. Amino acid-sequence comparisons suggest that a core catalytic domain common to all ATP-dependent ligases is embellished by additional isozyme-specific domains located at the amino or carboxyl termini of the proteins. It is thought that these flanking segments mediate the binding of mammalian DNA ligases to other proteins involved in DNA replication, repair, and recombination. The mammalian isozymes are referred to as ligase I, ligase IIIa, ligase IIIb, and ligase IV. DNA ligase I is a 919-amino acid polypeptide, expressed in all tissues, which catalyzes the joining of Okazaki fragments during DNA replication and also plays a role in DNA repair. DNA ligases IIIa (922 amino acids) and IIIb (862 amino acids) are the products of a single gene; they differ in amino acid sequence only at their carboxyl termini as a consequence of alternative mRNA splicing. Ligase IIIa is expressed ubiquitously and is implicated in DNA repair and is essential for mitochondrial function. Ligase IIIb expression is restricted to the testis, specifically to spermatocytes undergoing meiosis. DNA ligase IV is a 911-amino acid polypeptide that plays a role in the repair of double-strand DNA breaks via non-homologous end joining (NHEJ).
Yeast cells contain 2 separately encoded DNA ligases, which are homologous to mammalian DNA ligases I and IV, respectively. The DNA ligase I (Cdc9p) of the budding yeast Saccharomyces cerevisiae is essential for cell growth. Genetic experiments implicate ligase I in sealing Okazaki fragments and in the completion of DNA excision repair. In contrast, yeast DNA ligase IV is not essential for cell growth. However, deletion of the LIG4 gene elicits phenotypes indicating that ligase IV catalyzes the repair of double-strand breaks in the non-homologous end joining pathway (NHEJ). Budding yeast have no apparent homologue of mammalian DNA ligase III.
Viral ATP-Dependent DNA Ligases
Bacterial DNA viruses, such as the E. coli bacteriophages T4, T6, T7, and T3, encode their own ATP-dependent DNA ligases. ATP-dependent DNA ligases are also encoded by eukaryotic DNA viruses that conduct some or all of their replication cycle in the cytoplasm. These include vaccinia virus, African swine fever virus, and Chlorella virus PBCV1. The bacteriophage and eukaryotic viral DNA ligases are smaller than their cellular counterparts. Vaccinia DNA ligase, a 552-amino acid polypeptide, is strikingly similar at the amino acid-sequence level to mammalian DNA ligase III. Indeed, ligase III is more closely related to vaccinia ligase than to mammalian ligases I and IV. The ligases of T4 (487 amino acids), T7 (359 amino acids), T3 (346 amino acids), and Chlorella virus (298 amino acids) are smaller still. We’ve shown that the Chlorella virus ligase can complement the growth of a yeast strain in which the DNA ligase I gene has been deleted. This result suggests that the protein segments unique to the much larger DNA ligase I are not essential for yeast cell growth.
Nick-Sensing by DNA Ligases
We have examined the interaction of eukaryotic ligases with DNA using virus-encoded enzymes as models. Vaccinia virus DNA ligase and Chlorella virus DNA ligase each form a discrete complex with a singly nicked DNA ligand in the absence of magnesium that can be resolved from free DNA by native polyacrylamide gel electrophoresis. The viral ligases do not form stable complexes with the following ligands: (i) DNA containing a 1-nucleotide or 2-nucleotide gap; (ii) the sealed duplex DNA product of the ligation reaction; (iii) a singly nicked duplex containing a 5’-OH terminus at the nick instead of a 5’-phosphate; or (iv) a singly nicked duplex containing an RNA strand on the 5’-phosphate side of the nick (10 to 15). Thus, viral ATP-dependent DNA ligases have an intrinsic nick-sensing function.
Nick recognition by vaccinia DNA ligase and Chlorella virus DNA ligase also depends on occupancy of the AMP binding pocket on the enzyme — i.e., mutations of the ligase active site that abolish the capacity to form the ligase-adenylate intermediate also eliminate nick recognition; whereas a mutation that preserves ligase-adenylate formation but inactivates downstream steps of the strand joining reaction has no effect on binding to nicked DNA. Sequestration of an extrahelical nucleotide by DNA-bound ligase is reminiscent of the “base-flipping” mechanism of target site recognition and catalysis used by other DNA modification and repair enzymes.
Although the 5’-phosphate moiety is essential for the binding of Chlorella virus ligase to nicked DNA, the 3’-OH moiety is not required for nick recognition. Chlorella virus ligase binds to a nicked ligand containing 2’, 3’ dideoxy and 5’-phosphate termini but cannot catalyze adenylation of the 5’-end. Thus, the 3’-OH is important for step 2 chemistry even though it is not itself chemically transformed during DNA-adenylate formation.
To delineate the ligase-DNA interface, we footprinted the ligase binding site on DNA. The size of the exonuclease III footprint of ligase bound to a single nick in duplex DNA is 19 to 21 nucleotides. The footprint is asymmetric, extending 8 to 9 nucleotides on the 3’-OH side of the nick and 11 to 12 nucleotides on the 5’-phosphate side.
Crystal Structure of Eukaryotic DNA Ligase-Adenylate
Chlorella virus DNA ligase (ChVLig) is the smallest eukaryotic ATP-dependent ligase known. As the “minimal” DNA ligase, it presented an attractive target for structure determination. We crystallized ChVLig and determined its structure at 2 Å resolution. The enzyme consists of a larger N-terminal nucleotidyltransferase (NTase) domain and a smaller C-terminal OB domain with a cleft between them. An AMP moiety was covalently linked to Nz of Lys27 at the active site. Thus, we have the structure of a genuine catalytic intermediate.
Within the NTase domain is an adenylate binding pocket composed of the six peptide motifs (I, Ia, III, IIIa, IV, and V) that define the covalent nucleotidyltransferase enzyme superfamily that includes DNA and RNA ligases and mRNA capping enzymes. Motif I (KxDGxR) contains the lysine to which AMP becomes covalently linked in the first step of the ligase reaction. Amino acids in motifs Ia, III, IIIa, IV, and V contact AMP and play essential roles in one or more steps of the ligation pathway. The OB domain consists of a five-stranded antiparallel beta barrel plus an alpha helix.
Structural Basis for Nick Recognition by a Minimal “Pluripotent” DNA ligase
Although ChVLig lacks the large N- or C-terminal flanking domains found in eukaryotic cellular DNA ligases, it can sustain mitotic growth, DNA repair, and nonhomologous end joining in budding yeast when it is the only source of ligase in the cell. ChVLig can even perform the essential functions of mammalian Lig3 in mitochondrial DNA metabolism. We proposed that ChVLig represents a stripped-down “pluripotent” ligase owing to its intrinsic nick sensing function, the basis of which was illuminated when we solved the 2.3 Å crystal structure of ChVLig-AMP bound to a 3’-OH/5’-PO4 nick in duplex DNA.
ChVLig encircles the DNA as a C-shaped protein clamp. The NTase domain binds to the broken and intact DNA strands in the major groove flanking the nick and also in the minor groove on the 3’-OH side of the nick. The OB domain binds across the minor groove on the face of the duplex behind the nick. A novel “latch” module – consisting of a beta-hairpin loop that emanates from the OB domain – occupies the major groove and completes the circumferential clamp via contacts between the tip of the loop and the surface of the NTase domain. The latch is critical for clamp closure and is a key determinant of nick sensing.
Comparison of the crystal structures of the free and nick-bound ChVLig-AMP reveals a large domain rearrangements accompanying nick recognition. In the free ChVLig-AMP, the OB domain is reflected away from the NTase domain to fully expose the DNA-binding surface above the AMP-binding pocket. The peptide segment that is destined to become the latch is disordered in the free ligase and sensitive to proteolysis. But this segment is protected from proteolysis when ChVLig binds to nicked DNA. DNA binding entails a nearly 180˚ rotation of the OB domain around a swivel, so that the concave surface of the OB beta barrel fits into the DNA minor groove. This transition elicits a 63 Å movement of the OB domain and places the latch deep in the DNA major groove.
A network of interactions with the 3’-OH and 5’-PO4 termini in the active site illuminated the DNA adenylylation mechanism and the critical roles of the AMP in nick-sensing and catalysis. Addition of a divalent cation triggered nick sealing in crystallo, thereby establishing that the nick complex is a bona fide intermediate in the DNA repair pathway.
Structure of NAD+-dependent DNA Ligase bound to Nicked DNA-adenylate
NAD+-dependent DNA ligases (referred to as LigA) are a distinctive and structurally homogeneous clade of enzymes found in all bacteria. E. coli LigA (671-aa) is the prototype of this family. LigA has a modular architecture built around a central ligase core composed of a NTase domain and an OB domain. The core is flanked by an N-terminal “Ia” domain and three C-terminal modules: a tetracysteine zinc-finger, a helix-hairpin-helix (HhH) domain, and a BRCT domain. Each step of the ligation pathway depends upon a different subset of the LigA domains, with only the NTase domain being required for all steps. Domain Ia is unique to NAD+-dependent ligases, is responsible for binding the NMN moiety of NAD+, and is required for the reaction with NAD+ to form the ligase-AMP intermediate.
We found that the NAD+-dependent E. coli DNA ligase can support the growth of Saccharomyces cerevisiae strains deleted singly for CDC9 or doubly for CDC9 plus LIG4. This is the first demonstration that an NAD+-dependent enzyme is biologically active in a eukaryotic organism. Subsequent studies (in collaboration with Maria Jasin) showed that E. coli LigA could suffice for ligase function in mouse ES cells lacking the essential Lig3 enzyme.
Our crystal structure of E. coli LigA bound to the nicked DNA-adenylate intermediate revealed that LigA also encircles the DNA helix as a C-shaped protein clamp. The protein-DNA interface entails extensive DNA contacts by the NTase, OB, and HhH domains over a 19-bp segment of duplex DNA centered about the nick. The NTase domain binds to the broken DNA strands at and flanking the nick, the OB domain contacts the continuous template strand surrounding the nick, and the HhH domain binds both strands across the minor groove at the periphery of the footprint. The Zn-finger module plays a structural role in bridging the OB and HhH domains. Domain Ia makes no contacts to the DNA duplex, consistent with its dispensability for catalysis of strand closure on an AppDNA substrate.
The LigA NTase and OB domains are positioned similarly on the DNA circumference to the NTase and OB domains of the ATP-dependent DNA ligases, and they “footprint” similar segments of the DNA strands. Yet the topology of the LigA clamp is starkly different from the clamps formed by ChVLig and human DNA ligase 1 (HuLig1, determined by Tom Ellenberger and colleagues). The kissing contacts that close the LigA clamp are sui generis, involving the NTase domain and the C-terminal HhH domain. Based on available structural data, it is clear that DNA ligases have evolved at least three different means of encircling DNA.
Comparisons of the E. coli LigAAppDNA complex with structures of other bacterial ligases captured as the binary LigA•NAD+ complex (step 1 substrate), binary LigA•NMN complex (the post-step 1 leaving group), and covalent ligase-AMP intermediate (step1 product after leaving group dissociation) highlight massive protein domain rearrangements (on the order of 50 to 90 Å) that occur in synch with substrate-binding and catalysis. DNA binding and clamp formation by LigA entails a nearly 180˚ rotation of the OB domain so that the concave surface of the OB beta barrel fits into the minor groove, similar to what is seen or inferred for ChVLig and HuLig1. The four-point binding of the HhH domain at the periphery of the LigA-DNA footprint stabilizes a DNA bend centered at the nick. The LigA-DNA interactions immediately flanking the nick induce a local DNA distortion, resulting in adoption of an RNA-like A-form helix, again echoing the findings for the HuLig1-DNA cocrystal.
Mechanism of lysine adenylylation by ATP-dependent and NAD+-dependent polynucleotide ligases
The auto-adenylylation reaction of polynucleotide ligases is performed by a nucleotidyltransferase (NTase) domain that is conserved in ATP-dependent DNA and RNA ligases and NAD+-dependent DNA ligases. The NTase domain includes defining peptide motifs that form the nucleotide-binding pocket. Motif I (KxDG) contains the lysine that becomes covalently attached to the AMP. As Robert Lehman pointed out in 1974, it is unclear how lysine (with a predicted pKa value of ~10.5) loses its proton at physiological pH to attain the unprotonated state required for attack on the α phosphorus of ATP or NAD+. In principle, ligase might employ a general base to deprotonate the lysine. Alternatively, the pKa could be driven down by positive charge potential of protein amino acids surrounding lysine-Nζ. Several crystal structures of ligases absent metals provided scant support for either explanation. In these structures, the motif I lysine nucleophile is located next to a motif IV glutamate or aspartate side chain. The lysine and the motif IV carboxylate form an ion pair, the anticipated effect of which is to increase the pKa of lysine by virtue of surrounding negative charge. It is unlikely that a glutamate or aspartate anion could serve as general base to abstract a proton from the lysine cation. A potential solution to the problem would be if a divalent cation abuts the lysine-Nζ and drives down its pKa.
A metal-driven mechanism was revealed by our recent crystal structure of Naegleria gruberi RNA ligase (NgrRnl) as a step 1 Michaelis complex with ATP and manganese (its preferred metal cofactor). The key to capturing the Michaelis-like complex was the replacement of the lysine nucleophile by an isosteric methionine. The 1.9 Å structure contained ATP and two manganese ions in the active site. The “catalytic” metal was coordinated with octahedral geometry to five waters, that were in turn coordinated by the carboxylate side chains of conserved residues in motifs I, III, and IV. The sixth ligand site in the catalytic metal complex was occupied by an ATP α phosphate oxygen, indicative of a role for the metal in stabilizing the transition state of the auto-adenylylation reaction. A key insight, fortified by superposition of the Michaelis complex on the structure of the covalent NgrRnl-(Lys-Nζ)–AMP intermediate, concerned the role of the catalytic metal complex in stabilizing the un-protonated state of the lysine nucleophile prior to catalysis, via local positive charge and atomic contact of Lys-Nζ to one of metal-bound waters. The NgrRnl Michaelis complex revealed a second metal, coordinated octahedrally to four waters and to ATP β and γ phosphate oxygens. The metal complex and the ATP γ phosphate were engaged by an ensemble of amino acid side chains (unique to NgrRnl) that collectively orient the PPi leaving group apical to the lysine nucleophile. Consistent with a single-step in-line mechanism, the α phosphate was stereochemically inverted during the transition from NgrRnl•ATP Michaelis complex to lysyl–AMP intermediate.
DNA ligases are thought to have evolved separately from RNA ligases, initially by fusion of an ancestral ATP-utilizing NTase domain to a C-terminal OB domain (to comprise the minimal catalytic core of a DNA ligase), and subsequently via the fusion of additional structural modules to the NTase-OB core (7). NAD+-dependent DNA ligases (LigA enzymes), which are ubiquitous in bacteria and essential for bacterial viability, acquired their specificity for NAD+ via fusion of an NMN-binding Ia domain module to the N-terminus of the NTase domain. Escherichia coli DNA ligase (EcoLigA) was the first cellular DNA ligase discovered and characterized and it remains the premier model for structural and functional studies of the NAD+-dependent DNA ligase family. Interest in LigA mechanism is propelled by the promise of targeting LigA (via its signature NAD+ substrate specificity and unique structural features vis à vis human DNA ligases) for anti-bacterial drug discovery.
We solved a 1.55 Å crystal structure of EcoLigA as a Michaelis complex with NAD+ and magnesium. The structure reveals a one-metal mechanism in which a ligase-bound Mg2+(H2O)5 complex lowers the lysine pKa and engages the NAD+ α phosphate, but the β phosphate and the nicotinamide nucleoside of the NMN leaving group are oriented solely via atomic interactions with protein elements that are unique to the LigA clade. The two-metal (for ATP-dependent ligase) versus one-metal (for NAD+-dependent ligase) dichotomy demarcates a branchpoint in ligase evolution.