Protein structure and function
Proteins are the building blocks of cell structures and motors of cellular activities. They are modular in nature and their interactions with other molecules in the cell rely on the presence of specific functional domains. The precise shape of the domain, resulting from the presence of non covalent bonds between residues in a polypeptide chain decides about the function. The best known example of the shape-function relationship is the “key and lock” theory of enzymatic function. The change of enzymatic pocket, due to mutation or modification of an amino acid residue changes the affinity and/or specificity of the enzyme. In short, the better fit between two molecules, the better it functions, the more bonds can be made, the faster the signal can pass, or the stronger two molecules connect (think adhesion molecules).
The 3D conformation of the protein depends on the interactions between amino acids in the polypeptide chain. Since the sequence of the amino acids is contingent on the genetic code, the shape of the protein is encoded in the DNA. Proteins have four levels of organization. Primary structure refers to the linear sequence of the amino acids connected by the peptide bonds. The secondary structure consists of local packing of polypeptide chain into α-helices and β-sheets due to hydrogen bonds between peptide bond – central carbon backbone. Tertiary (3D) structure is a shape resulting from folding of secondary structures determined by interactions between side chains of amino acids. Quaternary structure describes the arrangement of the polypeptide chains in the multi subunit arrangement.
This video shows the 4 levels of protein structure.
Adapted from RCSBProtein Data Bank under CC -BY licence
All that is needed to give a protein unique shape and therefore a unique function is “written” in a fragment of the DNA known as a gene. Every time a gene is transcribed, either over the lifetime of the cell or in any cell that has the same DNA, natural or recombinant, the proteins turn up alike and assume their pre-programmed function.
Primary structure of proteins
Proteins are the most important and versatile class of macromolecules in the cell. The roles played by these molecules encompass anything from the transport of nutrients, catalyzing biochemical reactions to being structural components of cells or molecular motors. Proteins are linear polymers of amino acids connected by peptide bonds. They are synthesized from the template strand of DNA and contain unique and specific amino acid sequences in a linear form known as a primary structure.
Only twenty amino acids are necessary and sufficient for generating thousands of proteins in a cell. That does not mean there are only twenty amino acids. This is a common misconception. There are countless amino acids that exist in the world, but they are involved in other metabolic reactions but not protein synthesis. How individual protein gets its identity lies in the ordered combination of amino acids, which determines all its characteristics.
Amino acids that are connected by a peptide bond are called a polypeptide chain. The polypeptide chain is composed of a sequence of amino acids dictated by the gene. A sequence of the amino acid chains provides diversity vital for meeting the demands of life. Conservation of specific protein sequences is so important that the cell has regulatory mechanisms in place to ensure that only perfect proteins are produced. Each separate sequence has a unique order that conveys a very unique function. If you were to change one single arrangement of the chain, then that chain would have a completely different function. Protein function can be jeopardized or lost completely if sequence is out of order. But not all mutations or protein modifications lead to disastrous consequences. Some of them make the cell and organism better adjusted to environmental pressures, a process you know as evolution.
Properties of amino acids and their side chain differences
Amino acids have the same base structure, which is important for proper chemical bond formation between adjoining molecules. Each amino acid has a central carbon designated as the α-carbon. The α-carbon always has the following four groups attached to it:
- –NH2 a basic amino group
- –COOH an acidic group (known as a carboxyl group)
- –H a hydrogen atom
- –R a side chain
-R symbolizes the variable side chain, which is the only chemical group that differs among all twenty amino acids. Essentially side chain makes the amino acid unique and can be thought of as its fingerprint.
The most important property of amino acids that affects the folding and subsequently the function of the entire protein molecule is their known and predictable interaction with water. Amino acids can, therefore, be divided into hydrophilic and hydrophobic groups. Hydrophobic, aka non-polar, amino acids have saturated hydrocarbons as their side chains. These amino acids are alanine, valine, methionine, leucine and isoleucine and two amino acids with aromatic rings tryptophan and phenylalanine. Hydrophobic, non-polar amino acids play an essential role in protein folding because they tend to draw together and clump away from water. These amino acids usually form transmembrane domains and are found deeply buried in the hydrophobic interior of most globular proteins.
Hydrophilic amino acids interact easily with water. This group includes amino acids that ionize and become electrically charged (both negatively and positively) upon dissociation and amino acids that are polar but uncharged. Amino acids that have side chains with a carboxyl group in addition to the carboxyl group at α-carbon used in the formation of peptide bond carry a negative charge. These residues are glutamic acid and aspartic acid – notice their names actually contain the term “acid” owing to the presence of TWO carboxyl groups.
The side chains of lysine, arginine and histidine have strong basic groups and are positively charged. Hydrophilic amino acids that are polar but uncharged are asparagine, glutamine, serine, threonine and tyrosine. Hydrophilic and charged side chains of amino acids are exposed on the surface of the protein and are especially widespread in enzymatic pockets or transport molecules. The exposed electric charges convey the nature and activity of the protein to other molecules and act like magnets attracting similar forces to interact.
Several amino acids contribute to the protein structure because of unique features characteristic of their side chains. The structure of proline differs from the other amino acids in the fact that its side chain is bonded to nitrogen as well as the central carbon. This amino acid is chemically nonreactive (hydrophobic), but because of its five-membered ring, it disrupts the geometry of a folding protein causing abrupt shifts to conformation by physically introducing kinks and bends to a polypeptide chain. Glycine has no side chain at all just a second hydrogen atom attached to the α-carbon. Not exhibiting strong polar character or electronegativity, it is typically seen in places where parts of polypeptide chain bend and come close to one another.
Cysteine is an amino acid commonly known for greatly affecting protein structure. It has a sulfhydryl group responsible for the formation of disulfide bonds that stabilize tertiary structure of the proteins and contributes greatly to molecular functions which you will learn later in this text.
The secondary structure and all the loops
How do we know what proteins really look like when they are folded? There are two methods allowing us to glimpse into protein structure; the X-ray diffraction and nuclear magnetic resonance (NMR). X-ray diffraction method produces a three – dimensional contour map of the electrons in a protein crystal based on how x-rays bounce when they pass through the sample. NMR measures spacing between proteins in saturated solution and information about space constraints is used to determine fold structures of each protein. These two tests put together help us understand what the folded shape of a protein is.
The shape of a protein is solely determined by the amino acid sequence in the polypeptide chain. That’s right; it is just like DNA, unique code makes a unique design. Protein folding is the result of physical properties of amino acids’ side chains and their interactions with the environment around them. Proteins fold into the most energy efficient shape called native state in several steps or levels in protein structure.
Protein folding and architecture
When exposed to the conditions in the cytosol or lumen of the ER, polypeptide chains assume localized organization called secondary structure that optimizes interactions between side chains of amino acids with each other and water. The polypeptide backbone folds into spirals and ribbons of, respectively, α-helices and β-sheets. Both α- helix and β-sheet are segments of the polypeptide that have a regular geometry and are laced together with gentle and not-so-gentle turns, and separated by less organized loops.
Alpha helix is a structure that packs α-carbons with rotation providing favorable angles for the formation of strong hydrogen bonding and tight packing of side chains. Beta sheets are flat structures composed of several β-strands bound to the neighboring β-strands through hydrogen bonding. In β-sheets, the polypeptide chain can run in the same (parallel) or opposite direction (anti-parallel). Hydrogen bonds are more stable when the β-sheet has anti-parallel rather than parallel strands. Parallel sheets tend to be buried inside protein structure. The secondary structures are connected by unstructured stretches forming multiple loops.
The tertiary structure of the protein
There are many ways the secondary structures can bundle together into a large 3D lattice. Tertiary structure of the protein is a three-dimensional combination of α-helices and β-sheets that fold next to each other as a result of noncovalent interactions between amino acids’ side groups and the environment surrounding the single polypeptide. At this stage, proteins start solidifying their structure by additional bonds such as disulfide bonds between two cysteines. The most important feature of tertiary structures is the presence of conserved regions with similar functions known as functional domains. The tertiary structures are less stable, and indeed, most of them change shape during the lifetime of the protein, often multiple times. Conformational changes within these functional domains are the basis for the protein’s function. They can be permanent during protein folding and maturation or reversible and serve as a way of regulating protein activity on a reaction by reaction scale. Protein domains are regions of similar activity. They don’t necessarily have a conserved sequence. For example, a kinase domain, responsible for attaching phosphate group has a different shape, and sequence, dependent on the substrate the phosphate group is attached to. Secondary structures forming domains do not have to lie sequentially in a polypeptide chain. They might even be parts of several different polypeptides in case of multimeric proteins.
Motifs are a subgroup of functional domains that have evolutionarily conserved sequences, giving them, of course, conserved shape. One example, coiled-coil motifs are very regular superstructures of two α-helices paired up to form the fibrous configuration that is the base of stable dimers. Usually, there are two identical α-helices wrapped around each other in a left-handed conformation and stabilized by hydrophobic interactions. Intermolecular ionic bonds between side chains in an α-helix, 3.6 residues apart, give the hydrophobic residues space to interact with a similar motif on the opposing protein.
The quaternary structure is a result of an assembly of two or more polypeptides into one functional multimeric protein. Subunits are assembled by interactions between domains or regions in the protein and held together by hydrophobic interactions (two wet mirrors) and disulfide bonds. If the subunits are the same the structure is described with the prefix homo and if they are different with the prefix hetero (as in muscle glycogen phosphorylase homodimer or as in heterotrimeric G proteins)
Intracellular processes such as signaling depend on the interaction between molecules. The better the molecular fit between two molecules, the more bonds they can form, or the stronger the interaction (affinity between them). Amino acid sequence dictated by a gene, and in turn properties of amino acids’ side chains decide about the shape and in turn about interactions.