Playing with Chemical Lego Blocks: Rational Design of Semiconducting Polymers for Thermoelectric Materials

Paulette Clancy headshot

PI: Paulette Clancy (Whiting School of Engineering)

Co-I: Howard Katz

lan Heeger and colleagues won the Nobel Prize in 2000 for their 1977 discovery of a conducting polymer.

Since that time, semiconducting polymers have become popular in electronic applications ranging from thermoelectrics and organic solar cells to lightemitting diodes, field-effect transistors and sensors. Given that the palette of organic moieties available to modify conducting polymers is essentially limitless, finding which combination of organic groups will result in better performance is quite an art form. Our co-PI, Howard Katz, has a track record of excelling at that art, producing some of the highest performing, energy efficient thermoelectrics in the world from his experience and chemical intuition.

With the advent of “materials discovery,” under which umbrella this proposal falls, we can start to shift from relying on expert human knowledge into a datacentric, AI-guided one. In that vein, this work seeks to identify novel, high-performing polymer candidates using Bayesian optimization over a judiciously curated design space. Bayesian optimization is a well-known machine learning (ML) technique for optimizing global black-box functions. It is especially suitable in datascarce situations. Much can be achieved with tens of data points rather than the thousands of data points invariably needed for a deep learning neural net approach. Bayesian optimization requires inputs from a training set. Here, we will use first-principles density functional theory (DFT) calculations, which describe the distribution of electron density in a polymer ‘repeat unit;’ see Fig. 1 for some sample molecules. The target of the optimization is the minimum Gibbs free energy of solvation (ΔGsolv), a metric that describes the strength of a polymer’s interaction with surroundingsolvent molecules. This is known to be an indicator of good film quality, but hard to determine directly fromexperiments. If such an ML model can be trained and validated, it opens the opportunity for a data-driven search for disruptively successful polymeric materials, which are notoriously hard to model at the atomic level. Our goal is to create a ML-generated computational model that can inform “human-in-the-loop” experimental organic chemists and materials scientists in a feedback cycle that is significantly more efficient than relying on trialand-error and expert chemical intuition alone.

Polymers based on the diketopyrrolopyrrole (DPP) motif exhibit successful p-type semiconducting polymers. In those papers, expensive but accurate DFT calculations were used to characterize the electronic properties of repeat units of various DPP-based polymers, which were simultaneously synthesized experimentally and subjected to experimental characterization techniques for thin-film device performance. This is a great starting point for our work; there is a clear similarity between the polymers studied in these works (e.g., Fig. 1) and our interests; the polymers vary only in their subunits. However, there are “infinitely” more functional groups, repeat units, monomer lengths, and side-chains that could have been investigated, even if the design space was restricted to include a DPP unit in every polymer. Further, there are numerous solvent choices (eco-friendly or otherwise) for solution processing these polymers, vastly increasing the complexity of the design space. The problem of investigating such a large space of polymer compositions and available solvents very quickly becomes intractable, whether experimentally or computationally.

The goal of this work, therefore, will be to create hypothetical DPP-based polymers, made by combining ‘building blocks’ (Lego pieces) of functional groups similar to those studied in Refs. 1-3, and then training a Bayesian optimization model using DFT calculations as a means to identify high-performing polymer/solvent compositions. We are, in essence, putting Lego pieces together with different characteristics and determining if they lead to an improved design or not. The role of Bayesian optimization here is to sort through the limitless Lego pieces in our box.

A major assumption of this work is that a higher magnitude ∆Gsolv will yield a higher-performing polymer because of superior thin film morphology brought about by favorable solute-solvent interactions. This assumption has its foundations in previous work done both by us and others, demonstrating the impact of solution-phase chemistry on end-device thin film uniformity. Our end-goal is to create a multiscale model that correlates DFT results to end-device performance, an elusive task across materials research.

Our design strategy is summarized in Fig. 2, where we represent the otherwise dauntingly complex polymer chains as segments of different types of moieties (labeled A, B, C in Fig. 2) that typically appear in sequence in real conducting polymers. This pares the taxonomy down to a manageable set of categories, with a list of potential candidate moieties in A, B and C, as shown in Fig. 2. We have considered 8,424 possible combinations (4x9x9x26) for A x B x C x solvent choices. A polymer repeat unit is created wherein one of each of the A, B, and C, as shown in Fig. 2., and the sequence is immersed in an implicit solvation model of one of the solvent choices before the polymer/solvent system undergoes DFT relaxation. From the DFT results, we can characterize aspects of the polymer repeat unit’s electron density that relate to solvation and are theoretically consistent with solubility metrics like, the Hansen and Hildebrand solubility parameters, which are influenced by the system’s electron density. These are the model’s inputs. We can then directly calculate ∆Gsolv using a Solvation Model Density framework. The Bayesian -1 optimization model’s inputs and target value are readily available experimental properties: isotropic quadrupole moment, isotropic polarizability, electrostatic potential minimum and maximum, polarity index, percentage of the surface area that is polar, and the dielectric constant.

We are using our in-house Bayesian code, called PAL 2.0, for this task. It allows a user, whose expertise with these systems may range from amateur to expert, to provide a posited (uncurated) list of possible property data (like the list above) that PAL 2.0 can use to create a chemistry-informed surrogate model. We have used PAL 2.0 with success for chemical systems in solution, for space actuators and orthopedic screw alloy choices, which gives us the confidence to attack the high-dimensional parameter space of this project.

So far, we have conducted DFT sampling of 7000 of the 8,424-candidate design space, to provide the “right answer” for the most favorable (most negative) free energy of solvation. The distribution of ΔGsolv values shown for each polymer/solvent choice is shown in Fig. 3. Based on the assumption that this is the right ‘metric of success,’ we are at the point that we can, if funded, perform exploratory experiments to confirm that the top choices are highperformers. If not, then our assumption is incorrect, and we will look for a different target (objective function).

Looking at the results in Fig. 3, the range of ΔGsolv values we calculated covers differences amongst candidate polymer/ solvent pairings of up to almost 3.0 eV. This a large range, and hence highly discriminatory. This is encouraging from the point of view of candidate triage. Preliminary analysis suggests that the top performers (1%) in the training set generally exhibit the highest magnitude electrostatic potential extrema, polarizabilities, and quadrupoles, which is consistent with the solvation theories discussed earlier—the more the polymer’s electron density can distort to accommodate solvent near neighbors, the greater the free energy of solvation. Further analysis will reveal, in greater detail, trends which are not readily apparent; indeed, this is the strength of AI and ML. Symbolic regression will then look to ground these trends in physical realizability.

Our planned research will take a small portion of this very extensive data set and use Bayesian optimization to predict the polymer/solvent combination with the lowest value of ΔGsolv and compare it to the known best result. Further, we can use the fact that we have the best result for other investigations:

How few data can we use and still find the best result?

What can we learn from “feature engineering” to find the properties that correlate best with the data?

Is there a symbolically-regressed, closed-form function we can identify that provides physical interpretation and realizability to our Bayesian “gray box” model?

This project represents the first time that Bayesian optimization is used to rationally design semiconducting polymers with atomicscale granularity using chemical functional group building blocks. If our guess at the target being the lowest ΔGsolv is not representative of a high performer, we will harness our expertise in “feature engineering” to find a better metric. Once we make headway with this, it will open the door to considering a more complicated model for polymer design. Possible funding sources will include several NSF divisions CMMI (Civil Mechanical and Manufacturing), Future Manufacturing, DMR (Materials Research) and Chemistry. Our last IDIES seed proposal, a joint WSE/APL collaboration, produced two papers in high-quality journals and an NSF EAGER award. A full proposal to NSF CMMI is planned once we have preliminary data from this IDIES proposal.


IDIES logo