I am assuming that you want to express a stable, folded DOMAIN of a larger
protein. Correct ?
We routinely do protein domain length optimization - using various genetic
tools to generate arrays of progressive truncations. So far, we do it
manually, but AFAIK, there are folks at UAlabama who use a robotic system to
generate residue-per-residue truncation arrays and try to identify stable
and soluble domains of larger proteins in this fashion. It really does not
matter in the end what you use as long as it is easy and inexpensive.
Normally, you won't want to do this if your goal is a complete protein,
unless you have indications that the protein has acquired some sort of
regulatory sequence, signal peptide, etc. which does not affect its function
but does affect solubility, aggregation, etc.
As far as publications are concerned, there definitely are a few out there -
as suggested, try PubMed, unfortunately the exact references escape me at
the moment :)
Look up possible similarities with known protein domain structures and try
to make an intelligent guess where to truncate the sequence :) If all else
fails, shotgun truncations complemented by optimization of expression, may
do the trick.
A.G.E.