Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 18 March 2024

Will we ever be able to accurately predict solubility?

  • P. Llompart 1 , 2 ,
  • C. Minoletti 2 ,
  • S. Baybekov 1 ,
  • D. Horvath 1 ,
  • G. Marcou   ORCID: orcid.org/0000-0003-1676-6708 1 &
  • A. Varnek   ORCID: orcid.org/0000-0003-1886-925X 1  

Scientific Data volume  11 , Article number:  303 ( 2024 ) Cite this article

6328 Accesses

1 Citations

3 Altmetric

Metrics details

  • Cheminformatics
  • Physical chemistry

Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.

Similar content being viewed by others

good hypothesis about solubility

Boosting the predictive performance with aqueous solubility dataset curation

good hypothesis about solubility

Machine learning with physicochemical relationships: solubility prediction in organic solvents and water

good hypothesis about solubility

Computational simulation and target prediction studies of solubility optimization of decitabine through supercritical solvent

Introduction.

Aqueous solubility is a strategic parameter in synthetic, medicinal and environmental chemistry. It is one of the main parameters affecting bioavailability. Thus, a better understanding of this property is expected to improve success in drug design 1 , as a key player in pharmacokinetics and ADME-Tox (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profiling 2 . Solubility governs the fraction of the active substance available for absorption in the gastro-intestinal tract. Besides, a poor solubility of a compound or of a metabolite can be a threat for the patient: the substance may accumulate and crystalize, as exemplified by kidney stone diseases. Galenic formulation can improve the therapeutic potential of a compound 3 , but a soluble drug candidate is always a safer option for clinical trials.

However, measuring aqueous solubility is not always feasible at the early discovery stage because of the low throughput and large sample requirements 4 , 5 . For this reason, in silico predictive approaches have become highly valuable to prioritize drug candidates and reduce the number of experimental tests. Latest progress in this field is mainly due to (i) the organization of aqueous solubility prediction challenges, shedding a new light on existing tools; (ii) the public release of large aqueous solubility datasets; (iii) the advent of new machine learning methods promising unprecedented predictive performances. The current status quo in solubility prediction, which this study aims to analyze, is therefore very intricate.

In the first part of this study, we first remind the theoretical background of aqueous dissolution process, underlining the ambiguities and complexity of this measure. Next, we review the large number of datasets already published. Third, we critically discuss published models. This enables us, in a second part, to propose new guidelines to process thermodynamic aqueous solubility data. We applied them to existing datasets and proceed to a modeling exercise resulting in new QSAR models. All curated datasets and obtained models are publicly available at https://doi.org/10.57745/CZVZIA 6 .

Background of aqueous solubility

Several types of solubility measurements are reported in the literature, depending on the method and conditions of measurement. The thermodynamic solubility is described as the maximum concentration of a compound in solution, at equilibrium with its most stable crystalline form. This solubility is usually measured during lead optimization phases and is used as source of in silico regression models 7 . However, the above definition is not unambiguous, as the solute may, beyond physically dissolving, also chemically interact with water – with significant impact on the equilibrium. Therefore, no less than three distinct “thermodynamic” solubility measures are being used: water, apparent and intrinsic. The water solubility is measured with pure water as the added solvent. At equilibrium, the solution is a mixture of the potentially many proteolytic microspecies of the solute, and the sum of their concentration counts as “water solubility”. Acid-base interactions induce self-buffering effects, stabilizing the solution at a specific pH value, which must be reported as well. By contrast, the apparent solubility is defined in a fixed-pH buffer solution; it is also called buffer solubility and reflects the relative population of dissolved microspecies at the buffer pH. Finally, the intrinsic solubility (S 0 ) is the maximum concentration of the neutral compound: the pH of the solution is adjusted so the non-ionized compound becomes the predominant microspecies. Under certain assumptions and approximations, the Henderson-Hasselbalch (HH, Eq. ( 3 ) equation estimate the aqueous solubility (S), from the intrinsic solubility (S 0 ), the acidity or basicity constant (pK a or pK b ), and the pH 8 . Additionally, the kinetic solubility is often preferred during the early phase of drug discovery at the screening platforms level. It is frequently described as the lowest concentration at which the species starts to precipitate when diluting a 10 mM DMSO stock solution in buffer, usually Phosphate-Buffered Saline (PBS) 7.4. The kinetic solubility is usually perceived as a crude estimate of the thermodynamic solubility. Although these values are related, they quantify distinct phenomena: in kinetic measurements, there is no control or knowledge of the precipitating crystalline or amorphous form 9 , and artefacts due to supersaturation cannot be excluded. Additionally, there may exist large variations in the experimental setup between providers of kinetic solubility values; as a result, many of them cannot be used together 9 .

Accurately predicting thermodynamic solubility remains a challenge as numerous physicochemical and thermodynamic factors are involved. Some of them are, the solid-solvated phase transition, solid state (amorph or crystal), temperature, polymorphism, intermolecular interactions between solute-solvent and the co-occurring ionic forms of electrolytes 10 . Even though numerous drugs are electrolytes, they are still hard to predict at specific pH as their aqueous solubility is the result of co-occurring microspecies 11 , 12 . Over the past decades, several approaches have been developed to early identify poorly soluble compounds.

Experimental techniques

To ensure high quality data, experiments should use pure substance, temperature control and sufficient time for the solute to reach equilibrium. The current OECD 105 Guideline for the testing of chemicals 13 recommends two approaches for measuring thermodynamic water solubility: (i) the shake-flask method for chemicals with a solubility above 10 mg/L (ii) the column elution or slow-stir method for chemicals with solubilities below 10 mg/L.

The shake-flask method consists of mixing a solute in water until the thermodynamic equilibrium between the solid and solvated phase is reached. Then, the two phases are separated by either centrifugation or filtration. The column elution method consists of pumping water through a column coated with the chemical. The water flows at a constant rate through the column and is recirculated until equilibrium. For each method, the concentration of compound in the filtrate is measured to obtain the thermodynamic solubility. When working with surfactants, the slow-stir method should be used. Surfactants are amphiphilic organic compounds highly miscible in water. However, agitation and high concentration can induce micelle formation, distorting the measurements. This concentration point is called the Critical Micelle Concentration (CMC). The slow-stir avoids emulsion and helps solubilize low-density compounds using a controlled magnetic stirring.

An advanced technique called CheqSol was suggested by Llinas et al . 14 . Developed by Stuart et al . 15 to establish thermodynamic equilibrium conditions during measurement, the technique can measure the intrinsic and kinetic solubility of ionizable compounds. It is an automated titration method where the pH is adjusted until the solute precipitate or until the precipitate dissolves itself. The concentration of uncharged species is deduced from the point of equilibrium and the pK a ; this process is called Chasing Solubility. The method works down to 1 mg/L and is restrained to mono- and di-protic compounds with known pK a / pK b .

Limit of detection and quantification

The LoQ is the lowest possible concentration of an analyte that can be quantified by the method with precision and confidence. The LoD is the lowest concentration at which the method can detect. Thus, LoQ defines the limits associated to a 95% probability of obtaining correct value. Their determination is important as they define the sensitivity of the analytical method used. Thus, using measurements lower than the LoD or LoQ present higher probability of error. Compounds labeled “below LoD/LoQ” may not be used in regression models as their effective solubility is not precisely known but are safe to be labeled as “insoluble” in categorical models.

Dataset description

Thermodynamic solubility data sets gather these measurements and property prediction. Over the years, the ensemble of data has continued to grow to now reach more than 20 libraries available online, some of them containing more than 50,000 entries, Fig.  1 . Depending on their source, experimental conditions such as the temperature (T°C), pH, cosolvents and others may be reported. These metadata should also be taken in account when refining data for modeling.

figure 1

Network of the reported thermodynamic aqueous solubility datasets. Supersets composed by merging of previously available datasets are connected to the latter by directed edges, on which a hollow square connector designs the superset. For example, Raevsky et al . 132 includes Schaper et al . 133 , and is included in both OChem2020, and AqSolDB2020. The node size defines the number of entries of the datasets. The node color defines the age of the dataset, from dark blue (old) to white (recent). ECP stands for eChemPortal, and ChemID + states ChemIDPlus.

These libraries largely overlap, drawing a very complex network of relationships. Numerous modelers have used the dataset of Huuskonen et al . 16 from 2000, which gathers entries from AquaSol 17 and PhysProp 18 . AquaSol was published in 1990 by Yalkowsky et al ., reporting almost 20,000 records for 6,000 compounds. By that time, it was the most extensive compilation of thermodynamic solubility measurements for unionized compounds. Before that, PhysProp, published in 1994 by Syracuse, was the first large set containing values for 1,297 organic compounds. The ESOL 19 library, was disclosed in 2004 by Delaney; it contains 2,874 measurements for both ionized and unionized compounds.

As of now, these sets are still widely used and found in other libraries such as EPI Suite 20 , Wang et al . 10 from 2007, Wang et al . 21 from 2009 and Kim et al . 22 from 2020. Reporting recent measurements, their size ranges from 1,676 entries for Wang et al . from 2007, to 8,031 entries for EPI Suite. Fusion of datasets into ever growing supersets raises the problem of proper management of “duplicate” entries. If both merged sets independently include the same experimental value taken from a same source, trivial duplication of the entry should be imperatively avoided, when there is a risk of having one item in the training set and its identical in the validation set. This concern EPI Suite 2009, ESOL 2004, OPERA 2018, Tetko et al . 23 and Huuskonen et al . 16 . Moreover, it appears that the actual types of solubility reported by the sets differ. Some, such as Wiki-pS0 of 2020 and Llinas et al . of 2008 only contain intrinsic solubility entries. Llinas et al . 14 of 2008 reports 105 measurements available online. They were obtained using the CheqSol technique and used during the Solubility Challenge 2 (SC2). Wiki-pS0 24 is a private database of drug-like compounds owned by in-ADME research. As of 2009, Wiki-pS0 contained 6,355 entries for 3,014 unique compounds. Entries were obtained from CheqSol measurements, or through the conversion of aqueous to intrinsic solubility using pDISOL-X.

However, other datasets like AqSolDB 25 and OChem 26 are undefined mixtures of intrinsic , apparent and water solubility data. They now represent the largest thermodynamic solubility repositories freely available. OChem is an online platform reporting properties measurements linked to scientific articles and offering a modelling interface. As of September 2022, OChem “Water Solubility” (property = 46, in the OChem database structure) dataset contains 51,602 entries for almost 15,000 compounds and different solubility types, labeled as intrinsic solubilities. It also contains a dataset of “Water Solubility at pH” (property = 363, in the OChem database structure). The database aggregates entries from almost 150 sources, federating most of today’s measurements. However, it remains rarely used by the community, with only three applications for aqueous solubility data in 2021–2023, by Panapitiya et al . 27 , Wiercioch et al . 28 , and Lowe et al . 29 . In comparison, AqSolDB which was published in 2020 has already been used in 2021 by Francoeur et al . 30 and Sluga et al . 31 , in 2022 by Meng et al . 32 and Lee et al . 33 , and in 2023 by Lowe et al . 29 . AqSolDB is one of the largest publicly accessible set with 9,982 entries. It compiles nine open-source data sets. AqSolDB is known to have measurements of quality obtained from liquid, solid, or crystallized substances. Due to their diversity in solubility types, conditions and measurement techniques, these datasets require thorough curation to be used for modeling.

Yet, some sets remain poorly shared or used by the community. In particular, this concerns PubMed, QikProp 34 , ChemIDplus 35 , Khune et al . 36 of 1995, eChemPortal 37 and Wiki-pS0. eChemPortal provide free public access to information on the properties of chemicals. Most of them are part of ECHA REACH 38 , within which details about experimental conditions, protocol and substance composition can be found. ChemIDplus is a database containing information from the Toxicology Data Network. It contains chemical records of drugs, pesticides, pollutants, and toxins. Although relatively vintage, these datasets are overlooked resources that contain a wealth of experimental data.

Solubility prediction

Predictive approaches are either based on theorical equations or Machine Learning (ML) methods, including Neural Networks (NN). The few approaches based on first principles are mainly applied to estimate the solvation energy changes associated with a solute transitioning from its solid state to its solvated state.

From a thermodynamic point of view, solubilization can be managed in one or two steps starting from a solid material. It can either be by sublimation from solid to gas or by fusion from solid to liquid, followed with an energy transfer to water. Hence, in 1965 Irmann 39 coupled the entropy of fusion (Δ S m ) to the melting point (MP) through a group contribution approach to predict water solubility. Then, in 1968, Hansch et al . 40 found that the water solubility of organic liquid compounds was linearly dependent to the octanol/water partition coefficient (Log P o/w ). Yalkowsky et al . 41 combined these results in 1980 to develop the General Solubility Equation (GSE) and estimate the base-10 logarithm of water solubility Log 10 S w using the MP and Log P o/w - see Eq. ( 1 ).

The equation is restrained to solid nonelectrolytes, but it usually performs well (RMSE: 0.7–0.8 log) when employed with experimental values 42 . Here, an electrolyte is a chemical substance that produces mobile charges. As most drugs are electrolytes, only few are covered by the GSE. Also, High Throughput Screening (HTS) does not usually include the measurement of MP and Log P o/w , which are thus replaced by predicted values. Their use can introduce major discrepancies in the estimation of thermodynamic solubility, not to mention that the prediction of MP represents itself a challenge. Thus, the GSE is not practically useful for large-scale predictions.

20 years of solubility modelling

Most of today’s models are Quantitative Structure Property Relationship (QSPR). These methods seek to find a mathematical function expressed as Y = f(X) where X defines a set of N molecular descriptors [D 1 , D 2 , …, D N ] to correlate to the response value Y. Of course, the inner representation of a chemical graph by a GNN (Graph Neural Network) is no different. In our case, this Y value is the base-10 logarithm of the molar measurement of thermodynamic solubility, expressed as \(Lo{g}_{10}\left(S\right)\) .

Machine learning methods are mainly used to develop regression models leveraged on the compound’s topological, electronic, structural 2D/3D features, and molecular fragment counts. Models are then optimized using many ML methods to best fit the descriptors set. Recently, feature-based NN, graph-based NN (GNN) and structural attention methods have been used to develop powerful solubility predictive models. Tables  1 to 3 report a representative but not exhaustive list of aqueous solubility models developed over the last 20 years. It aims to highlight significant trends and achievements in this area. While the table includes models using diverse methods, caution is advised regarding overly optimistic performances. Depending on the data and approach employed, three periods can be distinguished. Prior to 2008, models were trained on vast datasets such as AquaSol, PhysProp and their aggregation, Huuskonen et al . 16 (Table  1 ). Few methods (ANN, SVM, MLR and theorical equations) were applied as the most decisive parameter of one’s ML model performance was the size and diversity of its training set. From them, two lessons can be shared:

The relationship between solubility and the classical descriptors used here tends to be largely non-linear. Therefore, in this context, ANNs clearly outperformed linear regression.

The prediction performances are limited by the quality of the experimental data. It is usually measured using the Inter-laboratory Standard Deviation ( SDi ) - Eq. ( 2 ). It is considered as a lower limit for theoretical prediction accuracy, and it was pointed out that the SDi can reach up to 1.0 log unit.

The SDi depends of the average value \(\bar{x}\) of the n replicated measures, x i .

Few attempts were also made to predict 43 the intrinsic solubility using the HH equation. An ANN was trained on PhysProp to obtain the predicted aqueous solubility. Acidity and basicity constants (pK a and pK b ) required by HH were estimated by pKaPlugIn from ChemAxon 44 . The HH equation depends on the ionization state of the compounds and can thus be used by Hansen’s combined model to compute the intrinsic solubility ( \(Log\left({S}_{0}\right)\) ) as a function of pH – see Eq. ( 3 ).

In 2007, Johnson et al . 45 renewed this approach by postulating an ansatz describing the intrinsic solubility as a function of the pK a , pK b, pH and, crystal packing \({\chi }_{pack}\) and degree of ionization F I – see Eq. ( 4 ). The influence of the crystal lattice on the solubility were simulated by a molecular dynamics simulation 45 .

It should also be noted that:

Solubility is an equilibrium between solute-solvent interactions and crystal formation. Yalkowsky et al . 41 proposed to use the melting point in the GSE as an early attempt to integrate crystal lattice effects. As MP depends on the polymorph, this approach is sensitive to polymorphism of solutes. So, the GSE requires either an experimental knowledge of the MP of the solutes or a precise knowledge of the polymorph. In both cases, it may be easier to measure the solubility directly.

Additionally, the solubility of a compound is highly dependent on its acid-base properties, particularly when the solution pH is within 2 log units of the compound’s pK a . Any errors in estimating pKa can lead to large deviations in solubility values. Thus, it may be safer to rely on experimental determination for these properties rather than trying to estimate them in QSPR models.

The abundance of modeling approaches motivated Llinas et al . 14 to organize in 2008 the Solubility Challenge (SC1). Its goal was to correctly predict the intrinsic solubility from 32 compounds using a given training set of 100 compounds. The challenge data covered a wide and high range in measurements, from 0.5 to 3.0 log unit. To predict it, participants used the full range of existing methods. Models’ performances highlighted difficulties in the prediction of highly and poorly soluble compounds. Overall, only about one-third of the compounds were correctly predicted by the best performing models, with the lower RMSE around 0.6 log 46 . SC1 sparked debates on how to enhance the predictive methods as well as the quality of the measurements. It also triggered the development of numerous models by the community, for which estimating the quality of the data took precedence over enhancing accuracy.

These methods employed novel neural network architectures (Table  2 ). For instance, Lusci et al . 47 introduced in 2013 a method based on Undirected Graphs (UG). Their approach was applied with a 10-fold internal Cross-Validation (CV) to ESOL, Llinas et al . 2008, and Huuskonen et al . 16 and reached a low RMSE of 0.58 log. Number of other approaches were introduced during this period: MLR by Huuskonen et al . 48 in 2008, PLS by Zhou et al . 49 in 2008, MLR by Wang et al . 21 in 2009 and CPANN by Eric et al . 50 in 2012.

This raise of powerful machine-learning methods available motivated Llinas and Avdeef 51 to organize a second Solubility Challenge (SC2) in 2019. This time, they invited participants to apply their own models to 2 datasets. Set 1 consisted of 100 druglike compounds with an average SDi of 0.17 log. Set 2 contained 32 molecules with an average SDi of 0.62 log. Participants were asked to use their own training set. No significant improvements were found compared to the SC1 52 . Every method worked equally well and achieved a minimal RMSE of 0.70 log 14 , 51 , 53 .

The current period is marked by a trend of deep learning architecture and molecular embedding inputs emerged (Table  3 ). In 2018, Goh et al . 54 introduced SMILE2vec, the first interpretable DNN to use SMILES for chemical property prediction. The developed NN was inspired by Word2Vec, a DL technique commonly used in NLP research. By comparing the performance of different Bayesian optimization techniques for hyperparameter tuning on the ESOL dataset, they were able to identify the most effective architecture, CNN-GRU. Applied to ESOL validation set, their model achieved a RMSE of 0.63 log and demonstrated interpretability by highlighting chemical functions, using a residual NN as a mask to identify important characters from the input. Their model accuracy outperformed feature-based methods.

A similar approach was conducted by Cui et al . 55 in 2020 by adapting the well-known ResNet to accept PubChem fingerprints as input. They constructed N-layers (N = 14, 20, or 26) CNN models based on the architecture of ResNet. Models were evaluated with a 10-fold CV on 9,943 compounds from ChemIDplus and PubMed. They achieved a RMSE of 0.68 log, highlighting the advantage of going deeper. However, this is in contradiction with Francoeur et al . 30 results from 2021, concluding that smaller networks performed better.

In their study, Francoeur et al . optimized a Molecular Attention Transformer (MAT) to predict aqueous solubility from SMILES representation, called SolTranNet. Their method is based on the MAT architecture developed by Maziarka et al . 56 MAT functions by applying self-attention to a molecular graph where each node is defined as a feature vector. Vectors are then combined with the adjacency matrix before being fed to the NN layers. The MAT hyperparameters were optimized by minimizing the RMSE of an AqSolDB subset. To validate their model, SolTranNet was applied to three different test sets: the SC2 test set, Cui et al . 2020 dataset, and Boobier et al . 22 2017 dataset, resulting in RMSE values of 1.295, 0.813 and 0.845 log, respectively. SolTranNet has comparable performance to current ML models. However, Francoeur et al . 30 points out that the small size of the community test sets limits the conclusions to be drawn from their reported performances. Even when trained over large sets, models may not be generalizable to other datasets, especially those from specific domains, such as compounds of pharmaceutical interest, as also mentioned in Lovrić et al . 57 .

We hypothesized that the performances published might be optimistic, because of: (i) inaccurate delimitation or failure from the applicability domain, if defined, and (ii) lack of independent external validation sets. Yet, caution is warranted when comparing model efficacy across studies, given the significant variability in test sets and methodologies. As of now, numerous models are still published without validation on completely independent sets. Different validation strategies, such as internal and external, can be distinguished, varying in levels of rigor. Internal validation makes use of the same data from which the model was fitted. External validation requires an independent dataset to correctly assess the model’s reproducibility and generalizability, and thus application to other chemical spaces (CS). However, it’s a common misconception that splitting a dataset into a training and a validation set (random split or k-fold CV) is sufficient, especially with GNN where data leakage can happen. Data leakage occurs when information from the test set is used in the training process, which can lead to biased performance assessment of the model. In CV, the test sets are independent to some extent 58 but the training set largely overlap. In the case of GNN, this can happen if the GNN has seen test set chemical structures during the pre-training process. This problem has been discussed in various studies, offering alternative validation techniques as potential solutions 59 . Despite these criticisms, the efficacy of cross-validation remains undiscussed, as empirically demonstrated in works by Breiman & Spector 60 and further supported theoretically by Vapnik 61 . The importance of the test set size, coverage and quality is supported by Francoeur et al . 30 . Ideally, this set should be meaningful and be excluded from the model training to ensure realistic performances. For instance, Cui et al . in 2020 validated their DNN models on two small test sets of 62, and 5 compounds, obtaining RMSE of 0.681 and 0.689 LogS unit, respectively. These test sets are arguably small, but the former was aggregated from recent literature while the second was composed of new in-house data. In this publication, models’ performances were also compared to human expert performances. This contrasted with previously reported results in Boobier et al . in 2017. In this study, models were trained and tested on 100 compounds from the DLS-100 dataset which regroup S 0 entries, mostly from Llinas et al . 2008 and Rytting et al . 62 . Data were used following a train/test split of 75/25 compounds. As a result, humans performed equally as ML models with a RMSE of 1.087 for the former against 1.140 log for the later.

For this study, we used two public thermodynamic solubility datasets: AqSolDBc (our clean version of AqSolDB) and OChem. Our intent was to externally validate models trained on AqSolDBc by testing them over public data. Datasets are resumed in the Table  4 .

Chemical space maps

The distribution of the CS over the map is shown in Fig.  2 and Fig.  3 . The dense population at its center correspond to small and diverse compounds. The solubility landscape displays multiple gradients from high to poor thermodynamic solubility. The distinct chemical sets were represented on the map as class landscapes, to help comprehend how they position to one another in CS (Fig.  4 ). The set specific to OChem fills vacant regions of AqSolDBc CS.

figure 2

GTM density landscape of the chemical space jointly covered by AqSolDBc and OChem. White spaces are unpopulated areas. Colors represent the number of molecules per nodes, from blue (low) to red (high).

figure 3

GTM landscape of the thermodynamic solubility from AqSolDBc and OChem datasets. Colors represent the experimental LogS of the aqueous solubility going from blue (poor) to red (high). Chemical space zones pertaining to specific chemotypes are highlighted. Squares and circles define areas representing respectively AqSolDBc and OChem compounds.

figure 4

Class landscape of the test sets versus the training set, AqSolDBc. The color represents the proportion of compounds from each dataset. Blue regions are populated with structures from AqSolDBc. White spaces are unpopulated areas and red spaces are from compounds specific to OChem datasets.

External validation

Public models were validated using public data from OChem. Priority was given to NN and models trained on AqSolDB. The validation process also involved testing the GSE (described above). We additionally trained Random Forest (RF) and MPNN (ChemProp 63 ) models on AqSolDBc.

Public data

To confirm the difficulty of predicting test chemical spaces uncovered by our training set, the best performing models were applied to OChem data. We report in Fig.  5 the MSE performances over the set specific to OChem, which range from 1.74 to 2.17 log. AqSolPred shows the best performance on the two sets with an MSE of 1.74 log and R 2 of 0.56. ChemProp presents a close MSE of 1.84 log.

figure 5

Predicted thermodynamic solubility against experimental solubility for the set specific to OChem. The red line represents a ± 1.0 log interval. The hexbins represent the density of points in the plot.

Applicability domain

The AD of a predictive model is a theoretical region of the CS covered by the model features. It delineates a region of the CS based on the similarity to the training set. Predictions on compounds in AD are considered reliable whereas out of AD they are considered uncertain. Still, few thermodynamic solubility models are delivered with an AD: Hewitt et al . 53 , Chevillard et al . 64 , Cao et al . 65 and Lusci et al . 2013.

Application of an Isolation Forest based AD are resumed for RF models with MOE2D descriptors are illustrated in Fig.  6 . Comparable behavior is obtained using other ML approaches. The general trend is a decrease of the RMSE as the AD coverage get more restrictive – decreasing test set coverage – with the increase of the contamination value. At some point, the test set coverage reduces too much, and the validation becomes unstable. This effect is visible on OChem data.

figure 6

Performance of the RF model (MOE2D) using the IsolationForest Applicability Domain. Performances were computed for each increment of the contamination parameter, from 0.0 to 0.99. Normalized RMSE is the external validation RMSE at contamination X divided by the RMSE at contamination zero.

Effect of the cleaning procedure from AqSolDB to AqSolDBc

To assess the impact of the cleaning procedure, several models were built on both AqSolDB and AqSolDBc datasets to observe the difference. RF models were constructed using MOE2D (n = 203) and ISIDA 66 (8 sets, n = 284 to 22,880) descriptors. Data were split into 10 folds. For RF, nine folds were used as the training set, and one as the test set. The test set was kept consistent for all models to ensure a fair comparison. Additionally, MPNN (ChemProp) models were trained. For MPNN, eight folds were used as the training set, one as the validation set, and one as the test set. The GSE was also applied. The RMSE of MPNN, GSE, and RF are reported in Table  5 . Performances over AqSolDBc should be compared to those of AqSolDB. Overall, the curation of AqSolDB resulted in a systematic improvement of the RMSE by ~0.10 log, supporting the proposed curation procedure, despite the reduced absolute training set size due to curation.

Recommendations for the curation of solubility data

Based on this analysis, we propose a decision tree for the curation of thermodynamic solubility data (Fig.  13 ). It starts by a verification of the chemical structure. This can be verified using the CAS number and checking a structural database.

figure 7

Comparison of the MAE from AqSolDB and AqSolDBc. MAE from the 10-fold CV computed over all models for AqSolDB (blue) and AqSolDBc (red) against the solubility range.

figure 8

Boxplots of the experimental standard deviation ( SDi ) of compounds in the OChem database. Data shared with AqSolDB (blue) are also present in AqSolDBc, and data specific to OChem (red) are absent from AqSolDBc. Boxplots are restrained to SDi  > 0.01 log.

figure 9

REC curve for each AqSolDBc subset corresponding to the major microspecies at pH7.0: Uncharged, Zwitterion, Negative and Positive ions. The y-axis is the proportion of AqSolDBc predicted better than a threshold MAE value on the x-axis; MAE in log from the 10-fold CV computed over all models for AqSolDBc.

figure 10

REC curve of each of the 9 AqSolDB data source. The y-axis is the proportion of AqSolDBc predicted better than a threshold MAE value on the x-axis; MAE from the 10-fold CV computed over all models for AqSolDBc.

figure 11

Structures and compound ID from the 20 hardest-to-predict compounds from AqSolDBc. The first letter of the ID corresponds to the source of the entry (see Fig.  10 ).

figure 12

Structures and compound ID from the 20 hardest-to-predict compounds colored using ColorAtom. Coloration of compounds according to the fragment-based RF model. Red and blue regions correspond, respectively, to negative and positive contributions to LogS. Dark colors correspond to large positive or negative atomic contributions.

figure 13

Flowchart describing the guidelines followed from compound standardization to data curation. Chemical structures are standardized and ionized using Chemaxon tools. To resolve some ambiguities the structures are verified in the ChemSpider database and in the CSD. Experimental meta-data are systematically retrieved, and the main chemical structure is extracted. The data are filtered according to the experimental conditions. When several thermodynamic solubility values are available, an entry is discarded if there is a doubt about which value to keep; otherwise, the median value is conserved.

The next step concerns the experimental protocol and its resulting SDi – when replica measurements are available. A crucial point to look at is the confidence of the measure. Values obtained below LOD/LOQ are subject to uncertainties and should not be used when developing regression models. One other source of variability is the substance purity as the components in solution greatly affect the measured value.

To avoid backlash, the training set should be restrained to mono-constituent substances measured at room temperature and neutral pH.

The last point revolves around the compound stability and hydrophobicity. The OECD guideline 105 recommends a water solubility cut-off of 10 mg/L for the shake-flask. Below that the column elution or slow stir should be applied, depending on the substance state, stability, and volatility. An initial idea of the method is formulated in the well-documented reviews presented by Ferguson et al . 67 in 2009, and Birch Heidi et al . in 2019 68 . These authors introduced additional rules depending on the compound’s expected stability. Since shake-flask and column elution take few hours to days to equilibrate, the half-life cut-off is set to 24 hours. Meanwhile, the cut-off is set to 7 days for the slow-stir method as it may require weeks to equilibrate.

Since 2017, thermodynamic solubility prediction has become a sandbox for the application of cutting-edge NN. These models present RMSE ranging from 0.35 to 1.71 log unit. Displaying good internal validation statistics may be misleading for drug designers seeking the best model. As mentioned earlier, these models often lack extensive external validation, and thus their performance should be considered with skepticism, particularly when applied to New Chemical Entities.

To confirm the difficulty of predicting test chemical spaces uncovered by our training set, the best performing models were applied to OChem data. The relevance of previously performed external validation may be questioned. For instance, evaluating performances using sets too small, internal, or distant from a target application (i.e. pharmaceutical data) may be an issue. Validation sets, which are meant to evaluate models in the context of their specific characteristics, should be carefully chosen based on their composition, diversity, size, and quality. It is important to note that each external test set presents its own challenges due to its peculiarities (size, diversity, predominance of various chemotypes, etc.), and past success on external validation does not guarantee future performance on different test sets. Moreover, Neural Network architectures do not display any breakthrough performances. As hypothesized previously, certain prediction errors may be avoided by using an Applicability Domain (AD) with published models.

Inter-laboratory standard deviation

The other possible source of prediction error could be the presence of poorly reproducible or variable training data. If the thermodynamic solubility is not known with sufficient accuracy or exhibits significant variability, it can introduce uncertainty into the models and distort their assessment. We analyzed the SDi of the OChem sets and the Median Average Error (MAE) of the set specific to OChem. The MAE is the median of the absolute difference between predictions and measurements for a given compound. Here we discuss MAE using results from a 10-fold cross-validation of ChemProp on OChem data, as a representative example model.

As OChem comprises datasets from various sources, the independent quality of each source can be investigated. To do so, the distributions of the SDi are confronted to the source of their entries (Fig.  8 ). The X-axis defines the source datasets found in OChem. To better highlight the quality of AqSolDBc, the set specific to OChem and shared with AqSolDBc are displayed as separated boxes. It is important to note that errors could be attributable to a range of factors such as measuring the solubility of the wrong compound, different solution compositions, and typos in recorded numbers or units. Furthermore, care must be taken when combining data from different temperatures or techniques to minimize the introduction of errors.

Overall, the compounds specific to OChem exhibit high SDi and MAE, which appear to be correlated. This suggests that the difficulties in predicting properties of compounds specific to OChem could stem from its relatively poorer data quality. The boxplots for SDi also show qualitative agreement. It should be noted that most compounds are well predicted, but the portion of the dataset with the highest SDi accounts for most of the reported error.

To summarize, these results illustrate that a decrease in measurement reliability negatively impacts the quality of models and validation.

Impact of the data characteristics

The MAE (Median Absolute Error) was computed using the results of the 10-fold CV from all RF and MPNN models (Fig.  7 ) on the AqSolDBc dataset. Models trained on the AqSolDBc are overall more predictive in the high and low solubility ranges compared to those trained on AqSolDB. For compounds with thermodynamic solubility ranging from -4.0 to 0.0 log, the MAE remains below 1.0 log. It also tends to rise the further one strays from this range.

We investigated the influence of the ionization state of the principal microspecies at pH 7.0 on the error of prediction. The Charge Ratio (CR), which is the sum of charges divided by the number of charges was used to assign compounds to subsets:

Non-Electrolytes

Uncharged: CR = 0

Electrolytes

Positive: CR =  + 1

Negative: CR = −1

Figure  9 presents the Regression Error Characteristic (REC) curves for each of these subsets obtained from the results of the 10-fold CV. They display the error tolerance expressed as MAE on the X-axis against the percentage of points predicted within the tolerance. An ideal model should be represented by a REC reaching the top left corner of the plot. It should be noted that the presence of microspecies in solution can affect the measurement, resulting in a slight difference in solubility value. Here, the defined subsets are used to highlight which compounds may be prone to these variabilities and thus give larger predictive errors. From these plots, zwitterions appear easier to predict than positively and negatively charged species. Finally, the most difficult targets are uncharged species. This is probably due to the fact that most poorly soluble species are actually uncharged, and some neutral species may be incorrectly identified as uncharged by the machine learner for rare groups.

Since AqSolDB and AqSolDBc are aggregations of public datasets, it was also possible to study the influence of data sources on the measured performances of the models (Fig.  10 ). The Huuskonen dataset is certainly the easiest data collection to predict. The largest errors are observed on the Raevsky, EPI Suite 2020 and, mostly eChemPortal 2020 datasets. The eChemPortal provides a lot of input data to AqSolDB, but it appears that they might be a large source of erroneous entries. Therefore, the eChemPortal dataset requires a closer look which is out of the scope of this study.

Hard-to-predict compounds

Finally, the information concerning the 20 hardest-to-predict compounds (having the largest MAE) from AqSolDBc are reported in Table  6 and Fig.  11 . Most of them are hydrophobic compounds from eChemPortal and measured using the shake-flask method. However, the OECD 105 advises to use the column elution with poorly soluble molecules. The usual lack of confidence over poorly soluble substance can be partially explained by the non-respect of the OECD.

Interpretation of the model

To evaluate the contribution of each atom into the modelled solubility, we employed ColorAtom 69 , 70 . This interface employed our RF model based on ISIDA fragment descriptors to produce chemical structures where each atom bears an atomic contribution of the value calculated by the model. The 20 hardest-to-predict compounds were passed on ColorAtom. Their colored structures are reported Fig.  12 . As expected, the polar parts of the molecules are usually colored in blue (high solubilization) whereas aromatic and aliphatic moieties are in red (poor solubilization).

Key results

In our study, we conducted an extensive analysis of thermodynamic solubility using two datasets: AqSolDBc and OChem. Our findings underscored the complexities and challenges of solubility prediction, but also highlighted potential strategies for improvement.

The mapping of chemical space revealed a diverse range of the solubility subspaces, highlighting the value of using diverse and complementary datasets. Despite the diversity of data, external validation revealed that all models struggled. This finding underscored the importance of model refinement and the need to consider the applicability domain when applying models to novel data. Moreover, the curation of AqSolDB into AqSolDBc significantly improved the RMSE, showing that data cleaning procedures can substantially enhance prediction accuracy.

Our study also revealed that inter-laboratory variability and the source of data can significantly influence model performance. This highlights the importance of measurement reliability and stringent data validation procedures, raising questions about the quality of datasets like eChemPortal.

Our study corroborates the findings of Lowe et al . 29 , emphasizing the complexity and challenges in solubility prediction across diverse chemical spaces. We found that RF models provide a balanced and interpretable framework. The model’s interpretation underscored the essential role of fragment-based modeling approaches in elucidating the underlying mechanisms of the predictions. These insights underline the importance of the application of OECD 68 principles for enhancing predictive accuracy and interpretability. Additionally, we investigated the 20 hardest-to-predict compounds, most of which were hydrophobic and measured using unsuitable methods. This underscored the need of carefully selecting entries based on their experimental procedure, to which we answered by delivering a decision tree for the curation of solubility data.

Overall, our findings indicate that while advancements have been made in the field of solubility prediction, challenges remain. These insights offer valuable guidance for future research and model refinement.

Published solubility models often display attractive performances. However, these same models very often fail in prospective predictions. This work aimed at clarifying the reasons for these repeated failures.

First, we compiled a comprehensive list of solubility datasets and highlighted their interconnections. It appears that some data sources are overlooked and others frequently aggregated.

Second, we observed that the use of sophisticated neural network architectures did not lead to any breakthrough, although major scientific discussions were triggered by both solubility challenges 1 and 2.

Third, when applied to an external public dataset, all models performed poorly. This is probably due to an applicability domain issue.

Fourth, we conducted a thorough reevaluation of the popular AqSolDB dataset to address potential inconsistencies and improve its quality. Our analysis led to the creation of a new version of the dataset, which exhibits improved internal consistency by ensuring that the data points are more reliable and better adhere to the principles of solubility prediction. This revised dataset allows for a more accurate assessment of factors that impact the performance of solubility prediction models, ultimately leading to better model development and evaluation. This allowed us to observe the influence of factors impacting the performances of the models: the laboratory standard deviation, the ionic state of the solute, and the source of the solubility data. It appears that the eChemPortal probably contains some corrupted data and requires careful data cleaning.

Lastly, we provide a thoroughly curated version of AqSolDB called AqSolDBc, obtained following a decision tree based on experimental conditions. With these rules, we hope to offer a correct way to curate aqueous solubility data. This set was used to train RF and MPNN models for solubility prediction and IsolationForest models for Applicability Domain. Models trained on public data, applied during this project are publicly available ( https://chematlas.chimie.unistra.fr/WebTools/predictor_solubility.php ).

Data curation

For these approaches to produce accurate predictions over a vast CS, a high quality and diversified training set is a must. However, preserving accurate measurements necessitates accounting for experimental variability, often evaluated with the SDi . Experimental thermodynamic solubility data can have inaccuracies up to 1.5 log, according to John C. Dearden 71 . Additionally, Llinas et al . reported that measurements between laboratories may vary by 0.5 to 0.6 log. Poor reproducibility can be the consequence of unintentional mistakes brought on by combining entries with heterogenous conditions, or of poor quality 52 .

In the following, we propose a guideline for the improvement of thermodynamic solubility data set quality, which we applied to AqSolDB. This dataset, aggregated by Sorkun et al . 25 in 2020, was chosen for its size, diversity, and well referenced entries. To curate AqSolDB and obtain an experimentally homogenous library, we followed the flowchart illustrated in Fig.  13 . Chemaxon’s JChem 72 software was employed for structural database standardization. In case of ambiguities, chemical structures were verified in ChemSpider 73 to benefit from its crowd sourced annotations. When possible, these structures were also searched in the CSD where the values of bond lengths, angles and torsions help to disambiguate the nature of chemical functions. CAS numbers were verified using SciFinder 74 before using them to retrieve manually described experimental conditions from eChemPortal 75 , EPI Suite 20 , and PubChem 76 if available. Overall, 608 entries containing partial records on start and final pH, measurement limitation, composition, origin, stability, or cosolvents were reported (Fig.  14 ). The forementioned experimental conditions and their importance to modelers are discussed.

figure 14

Number of non-valid entries in AqSolDB identified with the help of the meta-data of measurement.

pH sensitive species

The thermodynamic solubility of ionizable compounds strongly depends on the pH and the presence of buffer or ions. These factors can influence the microspecies equilibrium by interacting with the solute. For instance, the counter-ion effect can increase, or decrease this solubility. Therefore, several control steps are recommended:

Verifying the validity of the reported salt structure using its CAS number. This is manageable using the SciFinder 74 database and verifying when possible, in the Cambridge Structural Database 77 (CSD).

Selecting measurements without buffer, added acids/bases, cosolvents and surfactants.

Restraining the data to entries reporting a final pH = 7 ± 1.

Ionized compounds obtained through standardization should correspond to the major microspecies in solution. The microspecies distributions have been obtained using ChemAxon pKa Plugin 44 . Compounds presenting too many microspecies (more than 4) and those with uncertain major microspecies at pH 7.0 have been excluded, because we could not decide which structure to use for modeling.

Overall, 399 entries from AqSolDB obtained in the presence of buffer, cosolvent, or undesirable pH were excluded. Five entries were also deemed uncertain for having ionized structures different from the major microspecies or poor microspecies distribution.

Substance composition

Water solubility is a property of pure compounds. However, it is sometimes reported for substances. Pure compounds solubilities cannot be considered together with complex substances solubilities. The European Chemical Agency 38 describes three types of substance:

UVCB (Unknown or Variable composition, Complex reaction productions or biological materials), contain several chemicals without a complete understanding of their identity. Their composition is variable and often unknown. They usually originate from industrial processes or biological extracts.

Multi-constituent, account for a mix of known chemicals and impurities. Reported ingredients should represent 10% to 80% of the substance.

Mono-constituent refers to a solute that only contains one major component with up to 20% impurity. However, this level of purity is still high and can have a significant impact on solubility, bioactivity, and other important factors. It should be noted that such a high level of impurities can negatively affect the results and should be taken into consideration during their interpretation.

Ninety-nine entries from AqSolDB were found and eliminated for being UVCB, or multi-constituent substances (Fig.  14 ).

Unstable species

Chemical stability is related to the degradation processes. In solution, the compound can be subject to hydrolysis, hydration (R-(C=O)-R’ → R-C(OH) 2 -R’), photolysis, oxidation, biodegradation, and polymerization. These are generally dependent on the pH and temperature. The hydrolysis represents the most difficult ones to avoid during experimentation. Solubility test systems can limit photolysis by using amber glass bottles, aluminum or be done in the dark. Oxidation can be limited by working under anaerobic conditions, through nitrogen or argon flushing or by limiting the air headspace. Chemicals for which hydrolysis rapidly occur should be excluded to avoid measurements altered by reaction products. Care should be taken with compounds containing reactive functional groups such as mono- and poly- halogenated aliphatic (alkyl halides), epoxides, organophosphorus esters, carboxylic acid esters, carbamates, nitriles, organometallic, and peroxides. The Degradation Time (DT50) can be used to investigate the compounds stability. The DT50 is the period after which half of the original amount of chemical is degraded. Hydrophilic compounds with a DT50 lower than 24 hours and hydrophobic with a DT50 lower than 7 days should be discarded 68 . We identified 52 such entries in AqSolDB. Reversible reactions with water, such as hydration of activated aldehydes or internal hemiacetal formation in sugars are not de facto signaling compound instability but are sources of prediction error because the actual “solute” structures differ from the input standard form of the molecule.

Other errors

We identified 17 suspicious entries in AqSolDB resulted from either averaging measurement of similar chemicals or predictions with ML methods. In our opinion, such values should not be used for model building. Lastly, the experimental procedure may be biased. For example, two entries were discarded because the calibration of instruments was performed under different conditions than used to run the test samples.

Duplicate measurements

A common outcome of datasets aggregation is the occurrence of duplicated measurements. Managing them is a chance to investigate uncertainties. However, it is desirable to maintain one value per structure, preferably the median. This only make sense when reported values are relatively close. When there are only two very different values, or there are two or three clusters of different values associated to compounds with the same InChI Key, the median or average value becomes meaningless. Such cases are filtered out by a SDi  > 0.5 log threshold.

The result of this process to the AqSolDB is labeled AqSolDBc in the following.

Test Set Curation

Based on the number of entries, OChem represents the largest thermodynamic solubility repository. More than half of them are from AqSolDB, EPI Suite, VEGA 78 , TEST 79 and OPERA 80 . Following standardization, 7,463 unique structures remained, with values ranging from –13.17 to 1.70 log units. Out of these, 70% are found to overlap with AqSolDBc. To assess the model’s performance on both overlapping and unique compounds from the OChem dataset, it was divided into two subsets: a set shared with AqSolDBc containing 5,212 compounds and a set specific to OChem with 2,251 compounds, which were harder to predict.

The various compound sets were compared using Generative Topographic Mapping (GTM). The GTM method inserts a manifold into a N-dimensional molecular descriptor space populated by a set of representative chemical structures. By shifting the centers of Radial Basis Functions, the technique maximizes the log likelihood (LLh) while fitting the manifold to data. Subsequently, the data points are projected onto the manifold before unbending it. A vector of normalized probabilities (responsibilities), computed on the nodes of a grid over the manifold, is used to represent each compound in the latent space. The complete data set can therefore be described as a vector of cumulative responsibilities which is figured as a map and termed as a landscape .

Here, a combined dataset composed of 4,463 unique structures was created from AqSolDBc and OChem. ISIDA descriptors were employed for GTM training, as previous studies demonstrated their comprehensive coverage of the relevant chemical space and their ability to effectively represent molecular structures 81 . The descriptor space includes descriptors related to aromaticity as well as ISIDA counts of sequences and fragments from 2 to 3 atoms, representing a total of 6,121 distinct fragments (Nomenclature: IIAB(2-3)_CI) 82 . The GTM manifold was trained using 100 iterations before being resampled to obtain a map of 8,000 nodes. The map is colored based on property and class values, which subsequently generate property and class landscapes for data set comparisons. To achieve this, the responsibility-weighted mean of the class labels/property values of resident objects is obtained from each node’s mean class/property value 83 .

Public models were validated using public data from OChem. Priority was given to NN and models trained on AqSolDB. The validation process also involved testing the GSE (described above).

AqSolPred is a consensus predictor based on 3 models originally trained with a version of AqSolDB depleted of eChemPortal and EPI Suite subsets. Authors used 123 2D descriptors in NN, RF and XGBoost methods. Their consensus model scored a RMSE of 0.35 log on the Huuskonen benchmark dataset.

SolTranNet also uses the SMILES representation. It is built upon a molecule attention transformer (MAT) architecture. It applies self-attention to molecular graph representation, where each node is characterized by a feature vector which is then combined with the adjacency and distance matrices of the molecule. The distance matrix is built on a minimized 3D model of the molecule.

For training QSAR models on AqSolDBc we used Random Forest (RF) and MPNN (ChemProp 63 ). The RF is from scikit-learn 84 implementation with MOE2D 85 descriptors excluding LogS and (number of descriptors = 203) to limit the usage of predicted properties as descriptors. Using other software suite such as ISIDA led to similar results. We also used OChem models ( \(LogPo/w\) : ALOGPS 2.1 , 2016; MP: Best estate , 2015) to predict \(LogPo/w\) and MP and used the computed values as input to the GSE. The ChemProp MPNN model is a Directed Message Passing Neural Network (D-MPNN) renowned for producing reliable predictive models of chemical properties. Finally, ChemProp was used alone and in consensus with AqSolPred.

The consensus prediction was conducted to improve the applicability of AqSolPred as it was trained with a version of AqSolDB lacking eChemPortal and EPI Suite. Following the guidelines shared by the authors, models were used as intended: the performances announced were retrieved. Models were applied to 7,463 compounds from OChem.

We used Isolation Forest 86 models as AD to verify our hypothesis. The Isolation Forest method constructs an ensemble of trees for a given dataset. During the tree-building process, each tree is grown by recursively selecting a random feature and a random split value between the minimum and maximum values of the selected feature to partition the observations. Instances with short average path lengths within the trees are identified as outliers. The essence of the Isolation Forest algorithm lies in this random partitioning to identify outliers. The IsolationForest models were trained with AqSolDBc (MOE2D descriptors, n = 203) using scikit-learn 84 with an increasing contamination parameter, from 0.0 to 0.99.

The contamination parameter defines the expected proportion of outliers within the training set and is used by the Isolation Forest as a threshold to discriminate outliers from inliers. In other words, a contamination of 0 corresponds to a 100% coverage of the applicability domain (no molecule rejected) and a contamination of 1 corresponds to a 0% coverage of the applicability domain (all molecule rejected). OChem’s set was applied to these models. The RMSE from the compounds within the AD was computed for each incrementation of the contamination Fig.  15 .

figure 15

Decision tree proposed for the curation of thermodynamic solubility data. Red nodes define non-valid conditions or chemical states, and green nodes account for correct entries.

Data availability

The authors declare that the data supporting the findings of this study are available free of charge 6 . The repository features multiple datasets that have been curated for this research. The repository contains the following files:

File AqSolDBc.csv

Curated data from the AqSolDB. The available columns are:

• ID Compound ID (string)

• InChI InChI code of the chemical structure (string)

• Solubility Mole/L logarithm of the thermodynamic solubility in water at pH 7 (+/−1) at ~300 K (float)

• SMILEScurated Curated SMILES code of the chemical structure (string)

• SD Standard laboratory Deviation, default value: −1 (float)

• Group Data quality label imported from AqSolDB (string)

• Dataset Source of the data point (string)

• Composition Purity of the substance: mono-constituent, multi-constituent, UVCB (Categorical)

• Error Identifier error on the data point, default value: None (String)

• Charge Estimated formal charge of the compound at pH 7: Positive, Negative, Zwiterion, Uncharged (Categorical)

File OChemUnseen.csv

Solubility data from OChem, curated and orthogonal to AqSolDB. The available columns are:

• SMILES Curated SMILES code of the chemical structure (string)

• LogS Mole/L logarithm of the thermodynamic solubility in water at pH 7 ( + /−1) (float)

File OChemOverlapping.csv

Solubility data from OChem, curated; chemical structures are also present inside AqSolDB. The available columns are:

File OChemCurated.csv

Solubility data from OChem, curated. The available columns are:

• Name Compound name (string)

• SDi Standard laboratory Deviation, default value: −1 (float)

• Reference Unformated bibliographic reference which the data point is originating from (string)

• EXTERNALID Compound ID as appearing in its data source, default value: None (string)

• CASRN CAS number of the compound, default value: None (string)

• ARTICLEID Source ID linked to the column Reference (string)

• Temperature Temperature of the measure, in K (float)

Code availability

No custom code has been used.

Kennedy, T. Managing the drug discovery/development interface. Drug Discov. Today 2 , 436–444 (1997).

Article   Google Scholar  

Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3 , 711–716 (2004).

Article   CAS   PubMed   Google Scholar  

Millard, J., Alvarez-Núñez, F. & Yalkowsky, S. Solubilization by cosolvents. Establishing useful constants for the log-linear model. Int. J. Pharm. 245 , 153–166 (2002).

Jouyban, A. & Abolghassemi Fakhree, M. A. Solubility prediction methods for drug/drug like molecules. Recent Pat. Chem. Eng. 1 , 220–231 (2008).

Article   CAS   Google Scholar  

van de Waterbeemd, H. Improving compound quality through in vitro and in silico physicochemical profiling. Chem. Biodivers. 6 , 1760–1766 (2009).

Article   PubMed   Google Scholar  

Llompart, P. et al Will we ever be able to accurately predict solubility? Recherche Data Gouv https://doi.org/10.57745/CZVZIA (2023)

Wang, J. & Hou, T. Recent advances on aqueous solubility prediction. Comb. Chem. High Throughput Screen. 14 , 328–338 (2011).

Elder, D. P., Holm, R. & Diego, H. L. Use of pharmaceutical salts and cocrystals to address the issue of poor solubility. Int. J. Pharm. 453 , 88–100 (2013). de.

Saal, C. & Petereit, A. C. Optimizing solubility: Kinetic versus thermodynamic solubility temptations and risks. Eur. J. Pharm. Sci. 47 , 589–595 (2012).

Wang, J. et al . Development of reliable aqueous solubility models and their application in druglike analysis. J. Chem. Inf. Model. 47 , 1395–1404 (2007).

Johnson, S. R. & Zheng, W. Recent progress in the computational prediction of aqueous solubility and absorption. AAPS J. 8 , E27–E40 (2006).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Delaney, J. S. Predicting aqueous solubility from structure. Drug Discov. Today 10 , 289–295 (2005).

OECD. Test No. 105: Water Solubility. OECD Guidelines for the Testing of Chemicals, Section 1 https://read.oecd-ilibrary.org/environment/test-no-105-water-solubility_9789264069589-en (1995).

Llinàs, A., Glen, R. C. & Goodman, J. M. Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? J. Chem. Inf. Model. 48 , 1289–1303 (2008).

Stuart, M. & Box, K. Chasing Equilibrium:  Measuring the Intrinsic Solubility of Weak Acids and Bases. Anal. Chem. 77 , 983–990 (2005).

Huuskonen, J., Rantanen, J. & Livingstone, D. Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices. Eur. J. Med. Chem. 35 , 1081–1088 (2000).

Yalkowsky, RM & Dannenfleser, SH. Aquasol database of aqueous solubility. Version 5. https://hero.epa.gov/hero/index.cfm/reference/details/reference_id/5348039 (2009).

Bloch, D. Computer Software Review. Review of PHYSPROP Database (Version 1.0). ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci00024a602 (2004) https://doi.org/10.1021/ci00024a602 .

Dalanay, J. S. ESOL:  Estimating Aqueous Solubility Directly from Molecular Structure. J. Chem. Inf. Comput. Sci. 44 , 1000–1005 (2004).

US EPA. EPI Suite. https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface

Wang, J., Hou, T. & Xu, X. Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas. J. Chem. Inf. Model. 49 , 571–581 (2009).

Boobier, S., Hose, D. R. J., Blacker, A. J. & Nguyen, B. N. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat. Commun. 11 , 5753 (2020).

Article   CAS   PubMed   PubMed Central   ADS   Google Scholar  

Tetko, I. V., Tanchuk, V. Y., Kasheva, T. N. & Villa, A. E. P. Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices. J. Chem. Inf. Comput. Sci. 41 , 1488–1493 (2001).

Avdeef, A. Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database. ADMET DMPK 8 , 29 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Sorkun, M. C., Khetan, A. & Er, S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci. Data 6 , 143 (2019).

Sushko, I. et al . Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J. Comput. Aided Mol. Des. 25 , 533–554 (2011).

Panapitiya, G. et al . Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS Omega 7 , 15695–15710 (2022).

Wiercioch, M. & Kirchmair, J. Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance. Artif. Intell. Life Sci. 1 , 100021 (2021).

CAS   Google Scholar  

Lowe, C. N. et al . Transparency in Modeling through Careful Application of OECD’s QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem. Res. Toxicol. 36 , 465–478 (2023).

Francoeur, P. G. & Koes, D. R. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J. Chem. Inf. Model. 61 , 2530–2536 (2021).

Sluga, J., Venko, K., Drgan, V. & Novič, M. QSPR Models for Prediction of Aqueous Solubility: Exploring the Potency of Randić-type Indices. Croat. Chem. Acta 93 (2020).

Meng, J. et al . Boosting the predictive performance with aqueous solubility dataset curation. Sci. Data 9 , 71 (2022).

Lee, S. et al . Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS Omega 7 , 12268–12277 (2022).

Article   MathSciNet   CAS   PubMed   PubMed Central   Google Scholar  

Schrödinger. QikProp. (2015).

United States National Library of Medicine. ChemIDplus advanced. https://pubchem.ncbi.nlm.nih.gov/source/ChemIDplus (2011).

Kühne, R., Ebert, R.-U., Kleint, F., Schmidt, G. & Schüürmann, G. Group contribution methods to estimate water solubility of organic chemicals. Chemosphere 30 , 2061–2077 (1995).

Article   ADS   Google Scholar  

OECD. eChemPortal: The Global Portal to Information on Chemical Substances, https://www.echemportal.org/echemportal/ (2023).

European Chemicals Agency. ECHA. https://echa.europa.eu/fr/ (2023).

Irmann, F. Eine einfache Korrelation zwischen Wasserlöslichkeit und Struktur von Kohlenwasserstoffen und Halogenkohlenwasserstoffen. Chem. Ing. Tech. 37 , 789–798 (1965).

Hansch, C., Quinlan, J. E. & Lawrence, G. L. Linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids. J. Org. Chem. 33 , 347–350 (1968).

Yalkowsky, S. H. & Valvani, S. C. Solubility and partitioning I: Solubility of nonelectrolytes in water. J. Pharm. Sci. 69 , 912–922 (1980).

Ran, Y. & Yalkowsky, S. H. Prediction of drug solubility by the general solubility equation (GSE). J. Chem. Inf. Comput. Sci. 41 , 354–357 (2001).

Hansen, N. T., Kouskoumvekaki, I., Jørgensen, F. S., Brunak, S. & Jónsdóttir, S. Ó. Prediction of pH-Dependent Aqueous Solubility of Druglike Molecules. J. Chem. Inf. Model. 46 , 2601–2609 (2006).

ChemAxon. Marvin. https://chemaxon.com/products/marvin (2023).

Johnson, S. R., Chen, X.-Q., Murphy, D. & Gudmundsson, O. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol. Pharm. 4 , 513–523 (2007).

Hopfinger, A. J., Esposito, E. X., Llinàs, A., Glen, R. C. & Goodman, J. M. Findings of the Challenge To Predict Aqueous Solubility. ACS Publications https://pubs.acs.org/doi/pdf/10.1021/ci800436c (2008).

Lusci, A., Pollastri, G. & Baldi, P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53 , 1563–1575 (2013).

Huuskonen, J., Livingstone, D. J. & Manallack, D. T. Prediction of drug solubility from molecular structure using a drug-like training set. SAR QSAR Environ. Res. 19 , 191–212 (2008).

Zhou, D., Alelyunas, Y. & Liu, R. Scores of Extended Connectivity Fingerprint as Descriptors in QSPR Study of Melting Point and Aqueous Solubility. J. Chem. Inf. Model. 48 , 981–987 (2008).

Erić, S., Kalinić, M., Popović, A., Zloh, M. & Kuzmanovski, I. Prediction of aqueous solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative importance of descriptors implemented in counter-propagation artificial neural networks. Int. J. Pharm. 437 , 232–241 (2012).

Llinas, A. & Avdeef, A. Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 log) and Loose (SD ∼ 0.62 log) Test Sets. J. Chem. Inf. Model. 59 , 3036–3040 (2019).

Llinas, A., Oprisiu, I. & Avdeef, A. Findings of the Second Challenge to Predict Aqueous Solubility. J. Chem. Inf. Model. 60 , 4791–4803 (2020).

Hewitt, M. et al . In silico prediction of aqueous solubility: the solubility challenge. J. Chem. Inf. Model. 49 , 2572–2587 (2009).

Goh, G. B., Hodas, N., Siegel, C. & Vishnu, A. SMILES2vec: Predicting Chemical Properties from Text Representations. Preprint at arXiv:1712.02034 (2018).

Cui, Q. et al . Improved Prediction of Aqueous Solubility of Novel Compounds by Going Deeper With Deep Learning. Front. Oncol . 10 (2020).

Maziarka, Ł. et al. Molecule Attention Transformer . (2020).

Lovrić, M. et al . Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom. 35 , e3349 (2021).

Kohavi, R. & Wolpert, D. H. in International Conference on Machine Learning Bias Plus Variance Decomposition for Zero-One Loss Function (1996).

Dwork, C. et al . The reusable holdout: Preserving validity in adaptive data analysis. Science 349 , 636–638 (2015).

Article   MathSciNet   CAS   PubMed   ADS   Google Scholar  

Breiman, L. & Spector, P. Submodel Selection and Evaluation in Regression. The X-Random Case. Int. Stat. Rev. Rev. Int. Stat. 60 , 291–319 (1992).

Rao, R. B., Fung, G. & Rosales, R. in Proceedings of the 2008 SIAM International Conference on Data Mining (SDM) On the Dangers of Cross-Validation. An Experimental Evaluation. 588–596 (Society for Industrial and Applied Mathematics, 2008).

Rytting, E., Lentz, K. A., Chen, X. Q., Qian, F. & Vakatesh S. Aqueous and cosolvent solubility data for drug-like organic compounds. AAPS J . 7 , E78–105, https://doi.org/10.1208/aapsj070110 (2005).

Heid, E. et al . Chemprop: A Machine Learning Package for Chemical Property Prediction. J. Chem. Inf. Model . 64 , 9–17, https://doi.org/10.1021/acs.jcim.3c01250 (2024).

Chevillard, F. et al . In Silico Prediction of Aqueous Solubility: A Multimodel Protocol Based on Chemical Similarity. Mol. Pharm. 9 , 3127–3135 (2012).

Cao, D.-S., Xu, Q.-S., Liang, Y.-Z., Chen, X. & Li, H.-D. Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine. J. Chemometrics. 24 , 584–595 (2010).

Ruggiu, F., Marcou, G., Varnek, A. & Horvath, D. ISIDA Property-Labelled Fragment Descriptors. Mol. Inform. 29 , 855–868 (2010).

Ferguson, A. L., Debenedetti, P. G. & Panagiotopoulos, A. Z. Solubility and Molecular Conformations of n-Alkane Chains in Water. J. Phys. Chem. B 113 , 6405–6414 (2009).

Birch, H., Redman, A. D., Letinski, D. J., Lyon, D. Y. & Mayer, P. Determining the water solubility of difficult-to-test substances: A tutorial review. Anal. Chim. Acta 1086 , 16–28 (2019).

Marcou, G., Horvath, D. & Solov, V. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions. Mol Inf .

OECD. Principles For The Validation, For Regulatory Purposes, of QSAR models. https://www2.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (2004).

Dearden, J. C. In silico prediction of aqueous solubility. Expert Opin. Drug Discov. 1 , 31–52 (2006).

ChemAxon. JChem Base, version 22.19.0 (2022).

Ayers, M. ChemSpider: The Free Chemical Database. Royal Society of Chemistry https://www.chemspider.com (2023)

CAS. SciFinder. https://scifinder.cas.org (2023).

OECD, eChemPortal, https://www.echemportal.org/echemportal/ .

Kim, S. et al . PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49 , D1388–D1395 (2021).

Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. The Cambridge Structural Database. Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater. 72 , 171–179 (2016).

Article   CAS   ADS   Google Scholar  

Pedretti, A., Mazzolari, A., Gervasoni, S., Fumagalli, L. & Vistoli, G. The VEGA suite of programs: an versatile platform for cheminformatics and drug design projects. Bioinformatics. 37 , 1174–1175 (2021).

US EPA. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool) A Program to Estimate Toxicity from Molecular Structure. https://www.epa.gov/chemical-research/users-guide-test-version-42-toxicity-estimation-software-tool-program-estimate (2016).

Mansouri, K., Grulke, C. M., Judson, R. S. & Williams, A. J. OPERA models for predicting physicochemical properties and environmental fate endpoints. J. Cheminformatics 10 , 10 (2018).

Lin, A. et al . Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 13 , 540–554 (2018).

Article   CAS   PubMed   ADS   Google Scholar  

Bonachera, F. Isida/fragmentor 2017 user guide. 25.

Gaspar, H. A., Baskin, I. I., Marcou, G., Horvath, D. & Varnek, A. GTM-Based QSAR Models and Their Applicability Domains. Mol. Inform. 34 , 348–356 (2015).

Pedregosa, F. et al Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2825–2830 (2011).

Chemical Computing Group ULC. Molecular Operating Environment (MOE). (2022).

Liu, F. T., Ting, K. M. & Zhou, Z.-H. in 2008 Eighth IEEE International Conference on Data Mining . Isolation Forest. 413–422 (2008).

Huuskonen, J., Salo, M. & Taskinen, J. Neural Network Modeling for Estimation of the Aqueous Solubility of Structurally Related Drugs. J. Pharm. Sci. 86 , 450–454 (1997).

Bruneau, P. Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets. J. Chem. Inf. Comput. Sci. 41 , 1605–1616 (2001).

Liu, R. & So, S.-S. Development of Quantitative Structure−Property Relationship Models for Early ADME Evaluation in Drug Discovery. 1. Aqueous Solubility. J. Chem. Inf. Comput. Sci. 41 , 1633–1639 (2001).

Klamt, A., Eckert, F., Hornig, M., Beck, M. E. & Bürger, T. Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J. Comput. Chem. 23 , 275–281 (2002).

Engkvist, O. & Wrede, P. High-Throughput, In Silico Prediction of Aqueous Solubility Based on One- and Two-Dimensional Descriptors. J. Chem. Inf. Comput. Sci. 42 , 1247–1249 (2002).

Chen, X., Cho, S. J., Li, Y. & Venkatesh, S. Prediction of aqueous solubility of organic compounds using a quantitative structure–property relationship. J. Pharm. Sci. 91 , 1838–1852 (2002).

Wegner, J. K. & Zell, A. Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method. J. Chem. Inf. Comput. Sci. 43 , 1077–1084 (2003).

Cheng, A. & Merz, K. M. Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure−Property Relationships. J. Med. Chem. 46 , 3572–3580 (2003).

Yan, A. & Gasteiger, J. Prediction of Aqueous Solubility of Organic Compounds by Topological Descriptors. QSAR Comb. Sci. 22 , 821–829 (2003).

Lind, P. & Maltseva, T. Support vector machines for the estimation of aqueous solubility. J. Chem. Inf. Comput. Sci. 43 , 1855–1859 (2003).

Yan, A., Gasteiger, J., Krug, M. & Anzali, S. Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods. J. Comput. Aided Mol. Des. 18 , 75–87 (2004).

Hou, T. J., Xia, K. & Zhang, W. ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach. J. Chem. Inf. Comput. Sci. 44 , 266–275 (2004).

Fröhlich, H., Wegner, J. K. & Zell, A. Towards Optimal Descriptor Subset Selection with Support Vector Machines in Classification and Regression. QSAR Comb. Sci. 23 , 311–318 (2004).

Votano, J. R., Parham, M., Hall, L. H., Kier, L. B. & Hall, L. M. Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation. Chem. Biodivers. 1 , 1829–1841 (2004).

Clark, M. Generalized Fragment-Substructure Based Property Prediction Method. J. Chem. Inf. Model. 45 , 30–38 (2005).

Catana, C., Gao, H., Orrenius, C. & Stouten, P. F. W. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds. J. Chem. Inf. Model. 45 , 170–176 (2005).

Wassvik, C. M., Holmén, A. G., Bergström, C. A. S., Zamora, I. & Artursson, P. Contribution of solid-state properties to the aqueous solubility of drugs. Eur. J. Pharm. Sci. 29 , 294–305 (2006).

Schwaighofer, A. et al . Accurate Solubility Prediction with Error Bars for Electrolytes:  A Machine Learning Approach. J. Chem. Inf. Model. 47 , 407–424 (2007).

Cheung, M., Johnson, S., Hecht, D. & Fogel, G. B. Quantitative structure-property relationships for drug solubility prediction using evolved neural networks. in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) 688–693 (2008). https://doi.org/10.1109/CEC.2008.4630870 .

Duchowicz, P. R., Talevi, A., Bruno-Blanch, L. E. & Castro, E. A. New QSPR study for the prediction of aqueous solubility of drug-like compounds. Bioorg. Med. Chem. 16 , 7944–7955 (2008).

Hughes, L. D., Palmer, D. S., Nigsch, F. & Mitchell, J. B. O. Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P. J. Chem. Inf. Model. 48 , 220–232 (2008).

Du-Cuny, L., Huwyler, J., Wiese, M. & Kansy, M. Computational aqueous solubility prediction for drug-like compounds in congeneric series. Eur. J. Med. Chem. 43 , 501–512 (2008).

Obrezanova, O., Gola, J. M. R., Champness, E. J. & Segall, M. D. Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility. J. Comput. Aided Mol. Des. 22 , 431–440 (2008).

Duchowicz, P. R. & Castro, E. A. QSPR Studies on Aqueous Solubilities of Drug-Like Compounds. Int. J. Mol. Sci. 10 , 2558–2577 (2009).

Ghafourian, T. & Bozorgi, A. H. A. Estimation of drug solubility in water, PEG 400 and their binary mixtures using the molecular structures of solutes. Eur. J. Pharm. Sci. 40 , 430–440 (2010).

Muratov, E. N. et al . New QSPR equations for prediction of aqueous solubility for military compounds. Chemosphere 79 , 887–890 (2010).

Jain, P. & Yalkowsky, S. H. Prediction of aqueous solubility from SCRATCH. Int. J. Pharm. 385 , 1–5 (2010).

Eric, S. et al . The importance of the accuracy of the experimental data for the prediction of solubility. J. Serbian Chem. Soc. 75 , 483–495 (2010).

Louis, B., Agrawal, V. K. & Khadikar, P. V. Prediction of intrinsic solubility of generic drugs using MLR, ANN and SVM analyses. Eur. J. Med. Chem. 45 , 4018–4025 (2010).

Fatemi, M., Heidari, A. & Ghorbanzadeh, M. Prediction of Aqueous Solubility of Drug-Like Compounds by Using an Artificial Neural Network and Least-Squares Support Vector Machine. Bull. Chem. Soc. Jpn. 83 , 1338–1345 (2010).

Salahinejad, M., Le, T. C. & Winkler, D. A. Aqueous solubility prediction: do crystal lattice interactions help? Mol. Pharm. 10 , 2757–2766 (2013).

McDonagh, J. L., Nath, N., De Ferrari, L., van Mourik, T. & Mitchell, J. B. O. Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. J. Chem. Inf. Model. 54 , 844–856 (2014).

Kim, S., Jinich, A. & Aspuru-Guzik, A. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. J. Chem. Inf. Model. 57 , 657–668 (2017).

Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Inf. Model. 57 , 1757–1772 (2017).

Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. ChemMedChem 14 , 1604–1609 (2019).

Cho, H. & Choi, I. S. Enhanced Deep-Learning Prediction of Molecular Properties via Augmentation of Bond Topology. Chem Med Chem 14 , 1604 (2019).

Deng, T. & Jia, G. Prediction of aqueous solubility of compounds based on neural network. Mol. Phys. 118 , e1600754 (2020).

Gao, P., Zhang, J., Sun, Y. & Yu, J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys. Chem. Chem. Phys. 22 , 23766–23772 (2020).

Falcón-Cano, G., Molina, C. & Cabrera-Pérez, M. A. ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches. ADMET DMPK 8 , 251–273 (2020).

PubMed   PubMed Central   Google Scholar  

Shen, W. X. et al . Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nat Mach Intell 3 , 334–343 (2021).

Tosca, E. M., Bartolucci, R. & Magni, P. Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules. Pharmaceutics 13 , 1101 (2021).

Wieder, O. et al . Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules 26 , 6185 (2021).

Chen, J.-H. & Tseng, Y. J. Different molecular enumeration influences in deep learning: an example using aqueous solubility. Briefings Bioinf 22 , bbaa092 (2021).

Panapitiya, G. et al . Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations. ACS Omega 7 , 15695–15710 (2022).

Hou, Y., Wang, S., Bai, B., Chan, H. C. S. & Yuan, S. Accurate Physical Property Predictions via Deep Learning. Molecules 27 , 1668 (2022).

Raevsky, O. A., Grigor’ev, V. Y., Polianczyk, D. E., Raevskaja, O. E. & Dearden, J. C. Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors. J Chem Inf Model . 54 , 683–91, https://doi.org/10.1021/ci400692n (2014).

Schaper, K.-J., Kunz, B. & Raevsky, O. Analysis of water solubility data on the basis of HYBOT descriptors. Part 2. QSAR Comb. Sci . 22 , 943–958, https://doi.org/10.1002/qsar.200330840 (2003).

Download references

Author information

Authors and affiliations.

Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France

P. Llompart, S. Baybekov, D. Horvath, G. Marcou & A. Varnek

IDD/CADD, Sanofi, Vitry-Sur-Seine, France

P. Llompart & C. Minoletti

You can also search for this author in PubMed   Google Scholar

Contributions

P.L. is the main author. Data collection, annotation process supervision, modeling and statistical analysis of results were carried out by P.L., C.M. and G.M. Figures and tables preparation by P.L. and G.M. Kinetic data contributed by S.B. Supervision by C.M., G.M. and A.V. The first version of this article was written by P.L. and G.M.; G.M., D.H., C.M. and A.V. led the subsequent revisions.

Corresponding author

Correspondence to G. Marcou .

Ethics declarations

Competing interests.

C. Minoletti and P. Llompart are Sanofi employees and may hold shares and/or stock options in the company. S. Baybekov, D. Horvath, G. Marcou, and A. Varnek have nothing to disclose.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Llompart, P., Minoletti, C., Baybekov, S. et al. Will we ever be able to accurately predict solubility?. Sci Data 11 , 303 (2024). https://doi.org/10.1038/s41597-024-03105-6

Download citation

Received : 01 September 2023

Accepted : 29 February 2024

Published : 18 March 2024

DOI : https://doi.org/10.1038/s41597-024-03105-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

good hypothesis about solubility

Logo for Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13 Effect of Temperature and Solvent on Solubility

To evaluate the solubility of two solid solutes in two different solvents at different temperatures.

Expected Learning Outcomes

  • Determine the solubility curve of substances.
  • Explain the features of solubility curves based on intermolecular forces.

Textbook Reference

This experiment illustrates concepts from Tro, Chemistry: Structures and Properties , 2nd Ed., Ch. 13.2 and 13.4.

Introduction

The solubility  of a compound in a given solvent is the mass of solute that can be dissolved in a given amount of solvent. The solubility is typically expressed as

[latex]\textrm{solubility} = \frac{\textrm{g solute}}{100 \textrm{ g solvent}}[/latex]

When such a solution is formed, it is referred to as a saturated solution ; no more solute can be dissolved and additional solute will be suspended in the solution. [1]

The Solvation Process

The key to solubility is that, when a saturated solution is present, it is essentially the point where you are at an equilibrium between the suspension form and the solution form. This point is found when you start to see some specks of solid suspended in the liquid.

For sucrose in water, we have

\begin{equation} \textrm{C}_{12}\textrm{H}_{22}\textrm{O}_{11} (s) \rightleftharpoons \textrm{C}_{12}\textrm{H}_{22}\textrm{O}_{11} (aq) \end{equation}

For an ionic compound, as you may recall from CHEM-C 105 (Tro, Chemistry: Structures and Properties , 2nd Ed, Ch. 8), we need to account for the dissociation of the ionic compound.

Sodium chloride, NaCl, will dissociate when dissolved in water to form Na + and Cl – ions

\begin{equation} \textrm{NaCl}(s) \rightleftharpoons \textrm{Na}^+ (aq) + \textrm{Cl}^- (aq) \end{equation}

The solvation process can be considered as a three-step process:

  • Breaking the solute-solute interactions
  • Breaking the solvent-solvent interactions.
  • Forming solvent-solute interactions – attraction between the solute and solvent particles.

The first two processes above are endothermic while the last process is exothermic.   The overall thermodynamics is summarized in the following enthalpy level diagram; note that the overall enthalpy of solvation

Hess' Law diagram for the solvation of sodium chloride

In this experiment, you will study the solubility of potassium dichromate ([latex]\textrm{K}_2\textrm{Cr}_2\textrm{O}_7[/latex]) or oxalic acid ([latex]\textrm{H}_2\textrm{C}_2\textrm{O}_4[/latex]) using two different solvents:

  • 70:30 water:1,4-dioxane (by volume) mixture

1,4-dioxane is (largely) nonpolar due to its symmetrical shape. The structures of both oxalic acid and 1,4-dioxane are illustrated below:

structures of oxalic acid and 1,4-dioxane

Effect of Temperature on Solubility

The temperature dependence of the solubility of substances are depicted graphically on a solubility curve.

Solubility vs temperature graphs of mercury(II) chloride, potassium chloride, sodium chloride (goes up) and cadmium selenate (goes down).

For many solid solutes in liquid solvents (as we see from everyday life) the solubility of the solute increases with temperature.  However, this is not a hard and fast rule.  For example, if you look at the solubility curve of cadmium selenate above, the solubility decreases as a function of temperature.  As you could also see above, how much the solubility changes as a function of temperature varies significantly for different substances.

In this experiment, you will explore this for both of the solids that are studied in this experiment in both water and the water:1,4-dioxane mixture. Wwe will explore this in a more quantitative manner in the experiment  Thermodynamics of the Solvation of Calcium Hydroxide .

Solubility Differences of a Solute in Different Solvents

As discussed above, the energetics of the solvation process involves a consideration of the solute-solute , solvent-solvent , and solute-solvent  interactions. While entropic considerations mean that exothermic interactions overall are not necessary for solvation, it does mean that solvation is unlikely unless the solute-solvent interactions are comparable (or larger) than the solute-solute and solvent-solvent interactions.

Sodium chloride forms ion-dipole forces with water, which (mostly) counterbalances the ionic bonds (within sodium chloride) and hydrogen bonds (within water) that are broken. As a result, sodium chloride is soluble in water.

On the other hand, sodium chloride is insoluble in cyclohexane (C 6 H 12 ), a non-polar solvent.  This is because the energy required to break the ionic bonds in sodium chloride is much greater than the gain in van der Waals energy when sodium and chloride ions interact with cyclohexane molecules.

A common way of thinking about this is to use the like dissolves like approach (which largely holds though there are subtle nuances to be considered).

  • Ionic/polar solutes are more likely to be soluble in polar solvents
  • Non-polar solutes are more likely to be soluble in non-polar solvents.

In this experiment, you will compare the solubility of each of the two solutes in the two solvent systems studied, and evaluate the difference in the context of this discussion.

In this experiment, you will be assigned to measure the solubility curves for either potassium dichromate  or oxalic acid.  You will then (before leaving the lab) share data with another group of students who did the measurements for the other solute.

  • Volumes of solvent can be measured using a 5 mL graduated pipet.
  • Part C must be completed in a fume hood.
  • Students may be asked to complete the experiment in a different order from that listed here to help traffic control. This will not affect the results of the experiment.
  • Throughout this experiment, it is important to stir the test tube continuously in a gentle manner such that the temperature throughout the test tube. On the other hand, you must do it in a manner such that you don’t break the test tube.

Part A: Solubility of a Solid Solute in Water

  • On a piece of weighing paper, weigh out (as assigned) either 3.1-3.3 g oxalic acid or 2.8-2.9 g potassium dichromate.  Be sure to record the exact mass of your solute.
  • Add the solid into a medium sized test tube.
  • Using a graduated pipet, add 5.0 mL deionized water into the test tube. Use a test tube holder to clamp the test tube.
  • Place the test tube into a 400 mL beaker of warm tap water. Begin heating the beaker on a hot plate, using the beaker as a water bath.
  • Stir the mixture in the test tube regularly using the thermometer, keeping the test tube in the water bath until all of the solid has dissolved. You may also wish to stir (using a glass rod) the beaker of hot water from time to time.
  • When all of the solid has dissolved, take the test tube out of the hot water beaker. Continue stirring the test tube gently while the test tube cools.
  • Record the temperature when you first see for certain crystals of solute come out of solution.  Be careful not to confuse contaminants (e.g. specks of dust) with the solute.
  • Add 3.0 mL deionized water into the test tube. Place the test tube back into the hot water beaker and repeat steps 5-7.
  • Add 2.0 mL deionized water into the test tube. Place the test tube back into the hot water beaker and repeat steps 5-7.

Part B: Solubility of Your Solute in a Dilute Solution

  • Prepare a 400 mL beaker containing ice. You may wish to add some salt to the ice as well.
  • Weigh out and place into a clean, dry test tube approximately 0.9 g of your assigned solid.
  • Add 5.0 mL of deionized water into the test tube.
  • Repeat steps 5-7 from Part A.  If the solute is still completely dissolved at room temperature, place the test tube into the ice/salt bath and allow the mixture to cool until crystals of your solid are observed.
  • Add 2.0 mL deionized water into the test tube and repeat step 14 twice.

Part C: Solubility of Your Solute in a Water:1,4-Dioxane Mixture

  • 1.4-1.6 g potassium dichromate
  • 3.2-3.5 g oxalic acid
  • Move your test tube, graduated pipet, two 400 mL beakers (one containing your ice bath), thermometer, and test tube holder to a space in the fume hood.
  • Measure out 5.0 mL of the 70:30 (v/v) water:dioxane mixture into the test tube.
  • Repeat step 14 from Part B above.
  • Add 3 mL of the 70:30 (v/v) water:dioxane mixture to the test tube and repeal step 14 from Part B again.
  • Repeat step 20 two more times.
  • Be sure to obtain the experimental data on the solid you did not study from another pair of students before you leave the laboratory.

Waste Management

All waste must be collected and discarded into appropriate beakers placed in the fume-hood.  There will be two separate waste beakers: one for waste containing oxalic acid and one for waste cotnaining potassium dichromate.

Data Analysis

For this experiment, you will need to first determine the solubility of the solid for each trial.  Since the volume of the solvent was measured at room temperature, the density value used should be that at room temperature (25°C):

Solvent Density (g/mL)
water 0.9975
70:30 (v/v) water:dioxane mixture 1.023

Using these density values, calculate the mass of solvent used for each data point in this experiment.  Note that the volume of solvent used should be the total volume of solvent added, not the amount added at that point.

If you have first added 10.0 mL water, then added another 5.0 mL water, the total volume of solvent added is [latex]10.0 \textrm{ mL} + 5.0 \textrm{ mL}= 15.0\textrm{ mL}[/latex].  You will then calculate the mass of the solvent as:

[latex]?\textrm{ g} = 15.0\textrm{ mL} \times \frac{0.9975\textrm{ g}}{\textrm{mL}} = 14.96 \textrm{ g}[/latex]

From this, calculate the concentration of the solute (in g solute/(100 g solvent)) for each data point.

If the mass of solute is 5.912 g (should be the same for the entire part) and for that given trial you had 15.0 mL water (and hence the mass of solvent is 14.96 g from above), the concentration of this solution is

\begin{eqnarray} ? \frac{\textrm{g solute}}{100\textrm{ g solvent}} &=& 100\mbox{ g solvent} \times \frac{5.912\textrm{ g K}_2\textrm{Cr}_2\textrm{O}_7}{9.98\textrm{ g H}_2\textrm{O}} \\ &=& 59.3\textrm{ g solute/100 g solvent} \end{eqnarray}

Make two plots (one for each solute) where you plot the solubility in each solvent (along the  y -axis) as a function of temperature (along the x -axis).

On each graph, there should be two data sets along the same axes.

  • The solute in water (Parts A and B). The data for both those parts (for a given solute) should fall along a single, smooth curve.
  • The solute in a 70:30 water:1,4-dioxane mixture (Part C). The data for this part (for a given solute) should fall on a separate, smooth curve.

Each set should be plotted with different symbols, and a smooth curve to illustrate the trend in the data (as best as possible) should be included to guide the eye for each set as shown in the illustration below.  It is common for there to be anomalous data points in this data, so you should not expect the curve to go through every data point or for every “jump” to follow the overall trend.

Two solubility curves, one in water (parts A and B, 7 data points) and one in water:dioxane mixture (Part C, 4 data points))

  • Well, there are supersaturated solutions where there is a greater amount of solute dissolved than what is found in the solubility. However, such solutions are not thermodynamically stable and will not be considered in this experiment. ↵

IU East Experimental Chemistry Laboratory Manual Copyright © 2022 by Yu Kay Law is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Logo for UEN Digital Press with Pressbooks

52 Solubility

LumenLearning

Solubility is the relative ability of a solute (solid, liquid, or gas) to dissolve into a solvent and form a solution.

LEARNING OBJECTIVES

Recognize the various ions that cause a salt to generally be soluble/insoluble in water.

KEY TAKEAWAYS

  • Solubility is the relative ability of a solute to dissolve into a solvent.
  • Several factors affect the solubility of a given solute in a given solvent. Temperature often plays the largest role, although pressure can have a significant effect for gases.
  • To predict whether a compound will be soluble in a given solvent, remember the saying, “Like dissolves like.” Highly polar ionic compounds such as salt readily dissolve in polar water, but do not readily dissolve in non-polar solutions such as benzene or chloroform.
  • solute : the compound that dissolves in solution (can be a solid, liquid, or gas)
  • solubility : the relative ability of a solute to dissolve into a solvent
  • solvent : the compound (usually a liquid) that dissolves the solute

Definition of Solubility

Solubility is the ability of a solid, liquid, or gaseous chemical substance (referred to as the solute ) to dissolve in solvent (usually a liquid) and form a solution . The solubility of a substance fundamentally depends on the solvent used, as well as temperature and pressure. The solubility of a substance in a particular solvent is measured by the concentration of the saturated solution. A solution is considered saturated when adding additional solute no longer increases the concentration of the solution.

The degree of solubility ranges widely depending on the substances, from infinitely soluble (fully miscible), such as ethanol in water, to poorly soluble, such as silver chloride in water. The term “insoluble” is often applied to poorly soluble compounds. Under certain conditions, the equilibrium solubility can be exceeded, yielding a supersaturated solution.

Solubility does not depend on particle size; given enough time, even large particles will eventually dissolve.

Factors Affecting Solubility

Temperature.

The solubility of a given solute in a given solvent typically depends on temperature. For many solids dissolved in liquid water, solubility tends to correspond with increasing temperature. As water molecules heat up, they vibrate more quickly and are better able to interact with and break apart the solute.

good hypothesis about solubility

The solubility of gases displays the opposite relationship with temperature; that is, as temperature increases, gas solubility tends to decrease. In a chart of solubility vs. temperature, notice how solubility tends to increase with increasing temperature for the salts and decrease with increasing temperature for the gases.

Pressure has a negligible effect on the solubility of solid and liquid solutes, but it has a strong effect on solutions with gaseous solutes. This is apparent every time you open a soda can; the hissing sound from the can is due to the fact that its contents are under pressure, which ensures that the soda stays carbonated (that is to say, that the carbon dioxide stays dissolved in solution). The takeaway from this is that the solubility of gases tends to correlate with increasing pressure.

A popular saying used for predicting solubility is “Like dissolves like.” This statement indicates that a solute will dissolve best in a solvent that has a similar chemical structure; the ability for a solvent to dissolve various compounds depends primarily on its polarity. For example, a polar solute such as sugar is very soluble in polar water, less soluble in moderately polar methanol, and practically insoluble in non-polar solvents such as benzene. In contrast, a non-polar solute such as naphthalene is insoluble in water, moderately soluble in methanol, and highly soluble in benzene.

Solubility Chart

The solubility chart shows the solubility of many salts. Salts of alkali metals (and ammonium), as well as those of nitrate and acetate, are always soluble. Carbonates, hydroxides, sulfates, phosphates, and heavy metal salts are often insoluble.

good hypothesis about solubility

Solubility : Solubility of salt and gas solutes in liquid solvent.

Licenses and attributions, cc licensed content, shared previously.

  • Curation and Revision.  Provided by : Boundless.com.  License :  CC BY-SA: Attribution-ShareAlike

CC LICENSED CONTENT, SPECIFIC ATTRIBUTION

  • General Chemistry/Solubility.  Provided by : Wikibooks.  Located at :  http://en.wikibooks.org/wiki/General_Chemistry/Solubility .  License :  CC BY-SA: Attribution-ShareAlike
  • Precipitation (chemistry).  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/Precipitation_(chemistry) .  License :  CC BY-SA: Attribution-ShareAlike
  • General Chemistry/Properties of Solutions.  Provided by : Wikibooks.  Located at :  http://en.wikibooks.org/wiki/General_Chemistry/Properties_of_Solutions .  License :  CC BY-SA: Attribution-ShareAlike
  • precipitation.  Provided by : Wiktionary.  Located at :  http://en.wiktionary.org/wiki/precipitation .  License :  CC BY-SA: Attribution-ShareAlike
  • solution.  Provided by : Wiktionary.  Located at :  http://en.wiktionary.org/wiki/solution .  License :  CC BY-SA: Attribution-ShareAlike
  • precipitation reaction.  Provided by : Wiktionary.  Located at :  http://en.wiktionary.org/wiki/precipitation_reaction .  License :  CC BY-SA: Attribution-ShareAlike
  • Solubility chart.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/Solubility_chart .  License :  Public Domain: No Known Copyright
  • Solubility.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/Solubility .  License :  CC BY-SA: Attribution-ShareAlike
  • miscible.  Provided by : Wiktionary.  Located at :  http://en.wiktionary.org/wiki/miscible .  License :  CC BY-SA: Attribution-ShareAlike
  • Solubilty of various substances vs. temperature change.  Provided by : Wikispaces.  Located at :  http://chem409-fouling.wikispaces.com/Fouling+Mechanisms .  License :  CC BY-SA: Attribution-ShareAlike
  • Solubility.  Located at :  http://www.youtube.com/watch?v=zjIVJh4JLNo .  License :  Public Domain: No Known Copyright .  License Terms : Standard YouTube license
  • Boundless.  Provided by : Boundless Learning.  Located at :  https://figures.boundless.com/9231/large/solubility-20chart.png .  License :  CC BY-SA: Attribution-ShareAlike
  • Ionic equation.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/Ionic_equation .  License :  CC BY-SA: Attribution-ShareAlike
  • Spectator ion.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/Spectator_ion .  License :  CC BY-SA: Attribution-ShareAlike
  • electrolyte.  Provided by : Wiktionary.  Located at :  http://en.wiktionary.org/wiki/electrolyte .  License :  CC BY-SA: Attribution-ShareAlike
  • salt.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/salt .  License :  CC BY-SA: Attribution-ShareAlike
  • Chlorid stu0159brnu00fd.  Provided by : Wikipedia.  Located at :  http://en.wikipedia.org/wiki/File:Chlorid_st%C5%99%C3%ADbrn%C3%BD.PNG .  License :  CC BY-SA: Attribution-ShareAlike

This chapter is an adaptation of the chapter “ Precipitation Reactions ” in Boundless Chemistry by LumenLearning and is licensed under a CC BY-SA 4.0 license.

solution component present in a concentration less than that of the solvent

solution component present in a concentration that is higher relative to other components

Introductory Chemistry Copyright © by LumenLearning is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Principles of Solubility

Cite this chapter.

good hypothesis about solubility

  • YUCHUAN GONG 3 ,
  • DAVID J.W. GRANT 3 &
  • HARRY G. BRITTAIN 4  

Part of the book series: Biotechnology: Pharmaceutical Aspects ((PHARMASP,volume VI))

3291 Accesses

5 Citations

Solubility is defined as the maximum quantity of a substance that can be completely dissolved in a given amount of solvent, and represents a fundamental concept in fields of research such as chemistry, physics, food science, pharmaceutical, and biological sciences. The solubility of a substance becomes especially important in the pharmaceutical field because it often represents a major factor that controls the bioavailability of a drug substance. Moreover, solubility and solubility-related properties can also provide important information regarding the structure of drug substances, and in their range of possible intermolecular interactions. For these reasons, a comprehensive knowledge of solubility phenomena permits pharmaceutical scientists to develop an optimal understanding of a drug substance, to determine the ultimate form of the drug substance, and to yield information essential to the development and processing of its dosage forms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Similar content being viewed by others

Extended hildebrand approach: an empirical model for solubility prediction of etodolac in 1,4-dioxane and water mixtures.

good hypothesis about solubility

Physicochemical Properties

good hypothesis about solubility

Physicochemical Properties of Drugs for Use in the Pharmaceutical Industry

Amidon GL, Yalkowsky SH, Anik ST, and Valvani SC. Solubility of Nonelectrolytes in Polar Solvents V: Estimation of the Solubility of Aliphatic Monofunctional compounds in Water using a Molecular Surface Area Approach. J Phys Chem 1975; 79: 2239–2246.

Article   CAS   Google Scholar  

Amidon GL, Yalkowsky SH, and Leung S. Solubility of Nonelectrolytes in Polar Solvents II: Solubility of Aliphatic Alcohols in Water. J Pharm Sci 1974; 63: 1858–1866.

Article   PubMed   CAS   Google Scholar  

Barton AFM. CRC Handbook of Solubility Parameters and Other Cohesion Parameters . CRC Press, Boca Raton, FL, 1983, p. 594.

Google Scholar  

Bernstein J. Polymorphism in Molecular Crystals . Oxford University Press, New York, 2002.

Brittain HG. A Method for the Determination of Solubility of Metastable Crystal Phases Based on Total Light Scattering. Langmuir 1996; 12: 601–604.

Brittain HG. Polymorphism of Pharmaceutical Solids . Marcel Dekker, New York, 1999.

Brittain HG and Grant DJW. Effects of Polymorphism and Solid-State Solvation on Solubility and Dissolution Rate. In: Brittain HG (ed.), Polymorphism of Pharmaceutical Solids . Marcel Dekker, New York, 1999, pp. 279–330.

Davis SS, Higuchi T, and Rytting JH. Determination of Thermodynamics of Functional Groups in Solutions of Drug Molecules. In: Bean HS, Beckett AH, and Carless JE (eds.), Advances in Pharmaceutical Sciences, Vol. 4 . Academic Press, New York, 1974, pp. 73–261.

Eyjolfsson R. Nitrofurantoin: Particle Size and Dissolution. Drug Dev Ind Pharm 1999; 25: 105–106.

Ghosh S and Grant DJW. Determination of the Solubilities of Crystalline Solids in Solvent Media that Induce Phase Changes: Solubilities of 1,2-dialkyl-3-hydroxy-4-pyridones and Their Formic Acid Solvates in Formic Acid and Water. Int J Pharm 1995; 114: 185–196.

Grant DJW and Brittain HG. Solubility of Pharmaceutical Solids. In: Brittain HG (ed.), Physical Characterization of Pharmaceutical Solids . Marcel Dekker, New York, 1995, pp. 321–386.

Grant DJW and Higuchi T. Solubility Behavior of Organic Compounds . John Wiley & Sons, New York, 1990, p. 656.

Grant DJW, Mehdizadeh M, Chow AHL, and Fairbrother JE. Non-linear Van't-Hoff Solubility-Temperature Plots and their Pharmaceutical Interpretation. Int J Pharm 1984; 18: 25–38.

Gu CH, Young V Jr, and Grant DJW. Polymorph Screening: Influence of Solvents on the Rate of Solvent-Mediated Polymorphic Transformation. J Pharm Sci 2001; 90: 1878–1890.

Habib FS and Attia MA. Effect of Particle Size on the Dissolution Rate of Monophenylbutazone Solid Dispersions in the Presence of Certain Additives. Drug Dev Ind Pharm 1985; 11: 2009–2019.

CAS   Google Scholar  

Hancock BC and Parks M. What is the True Solubility Advantage for Amorphous Pharmaceuticals. Pharm Res 2000; 17: 397–404.

Hancock BC and Zografi G. Characteristics and Significance of the Amorphous State in Pharmaceutical Systems. J Pharm Sci 1996; 86: 1–12.

Article   Google Scholar  

Hansen C and Beerbower A. Solubility Parameters. In: Standen A (ed.), Kirk-Othmer Encyclopedia of Chemical Technology, 2nd ed. Supplement Volume , John Wiley & Sons, New York, 1971, pp. 889–910.

Higuchi T. Solubility. In: Lyman R (ed.), Pharmaceutical Compounding and Dispensing , Lippincott Philadelphia, PA, 1949, pp. 176–177.

Higuchi WI, Rowe EL, and Hiestand EN. Dissolution Rates of Finely Divided Drug Powders. II: Micronized Methylprednisolone. J Pharm Sci 1963; 52: 162–164.

Hilderbrand JH, Prausnitz JM, and Scott RL. Regular and Related Solutions . Van Nostrand Reinhold, New York, 1970, pp. 64–67.

Hilderbrand JH and Scott RL. Solubility of Nonelectrolytes, 3rd edn . Reinhold Pub., New York, 1950, p. 488.

Hilderbrand JH and Scott RL. Regular Solutions . Prentice-Hall, Englewood Cliffs, NJ, 1962, p. 180.

Hildebrand JH and Wood SE. Derivation of Equations for Regular Solutions. J Chem Phys 1933; 1: 817–822.

Hixson AW and Crowell JH. Dependence of Reaction Velocity upon Surface and Agitation. I: Theoretical Consideration. Ind Eng Chem 1931; 23: 923–931.

Kornblum SS and Hirschorn JO. Dissolution of Poorly Water-Soluble Drugs. I: Some Physical Parameters related to Method of Micronization and Tablet Manufacture of a Quinazoline Compound. J Pharm Sci 1970; 59: 606–609.

Krishnan CV and Friedman HL. Solvation Enthalpies of Hydrocarbons and Normal Alcohols in Highly Polar Solvents. J Phys Chem 1971; 75: 3598–3606.

Kumar R and Prausnitz JM. Solvents in Chemical Technology. In: Dack MRJ. Solutions and Solubilities, Part 1, Techniques of Chemistry , Vol. VIII , John Wiley & Sons, New York, 1975, pp. 259–326.

Levich VG. Physicochemical Hydrodynamics . Advance Publications, London, UK, 1977, p. 1078.

Martin A. Physical Pharmacy, 4th edn . Lippincott Williams & Wilkins, Philadelphia, PA, 1993, pp. 223–224.

Milosovish G. Determination of Solubility of a Metastable Polymorph. J Pharm Sci 1964; 53:484–487.

Mooney KG, Rodrigues-Gaxiola M, Mintun M, Himmelstein KJ, and Stella VJ. Dissolution Kinetics of Phenylbutazone. J Pharm Sci 1981; 70: 1358–1365.

Nernst W. Theorie der Reaktionsgeschhwindigkeit in Heterogenen Systemen. Z Phys Chem 1904; 47: 52–55.

Noyes AA and Whitney WR. The Rate of Dissolution of Solid Substances in Their Own Solutions. J Am Chem Soc 1897; 19: 930–934.

Rohrschneider L. Solvent Characterization by Gas-liquid Partition Coefficients of Selected Solutes. Anal Chem 1973; 45:1241–1247.

Sato T, Okada A, Sekiguchi K, and Tsuda Y. Difference in Physico-pharmaceutical Properties Between Crystalline and Non-crystalline 9,3″-diacetylmidecamycin. Chem Pharm Bull 1981; 29: 2675–2682.

Swanepoel E, Liebenberg W, de Villiers MM, and Dekker TG. Dissolution Properties of Piroxicam Powders and Capsules as a Function of Particle Size and the Agglomeration of Powders. Drug Dev Ind Pharm 2000; 26: 1067–1076.

Taft RW, Gurka D, Joris L, Schleyer P, von R, and Rakshys JW. Studies of Hydrogen-bonded complex formation with p-fluorophenol. V. Linear Free Energy Relationships with OH Reference Acids. J Am Chem Soc 1969; 91: 4801–4808.

Ullah I and Cadwallader DE. Dissolution of Slightly Soluble Powders under Sink Conditions. II: Griseofulvin Powders. J Pharm Sci 1971; 60: 230–233.

United States Pharmacopeia, 25th edition . United States Pharmacopeial Convention, Rockville, MD, 2002, p. 2363.

United States Pharmacopeia, 29th edition . United States Pharmacopeial Convention, Rockville, MD, 2006, p. 9.

Walden P. Über die Schmelzwärme, Spezifische Kohäsion und Molekulargrösse bei der Schmelztemperatur. Z Eletrochem 1908; 14: 713–724.

Yalkowsky SH. Estimation of Entropies of Fusion of Organic Compounds. Ind Eng Chem Fundam 1979; 18: 108–111.

Yalkowsky SH, Flynn GL, and Amidon GL. Solubility of nonelectrolytes in Polar Solvents. J Pharm Sci 1972; 61: 983–984.

Yalkowsky SH, Valvani SC, and Amidon GL. Solubility of Nonelectrolytes in Polar Solvents IV: Nonpolar Drugs in Mixed Solvents. J Pharm Sci 1976; 65: 1480–1494.

Zhu H and Grant DJW. Influence of Water Activity in Organic Solvent + Water Mixtures on the Nature of the Crystallizing Drug Phase. 2. Ampicillin. Int J Pharm 1996; 139: 33–43.

Download references

Author information

Authors and affiliations.

Department of Pharmaceutics, College of Pharmacy, University of Minnesota, Minneapolis, MN

YUCHUAN GONG & DAVID J.W. GRANT

Center for Pharmaceutical Physics, Milford, NJ

HARRY G. BRITTAIN

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Catholic University of Leuven, Belgium

Patrick Augustijns

Janssen Pharmaceutica N.V., Beerse, Belgium

Marcus E. Brewster

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

GONG, Y., GRANT, D.J., BRITTAIN, H.G. (2007). Principles of Solubility. In: Augustijns, P., Brewster, M.E. (eds) Solvent Systems and Their Selection in Pharmaceutics and Biopharmaceutics. Biotechnology: Pharmaceutical Aspects, vol VI. Springer, New York, NY. https://doi.org/10.1007/978-0-387-69154-1_1

Download citation

DOI : https://doi.org/10.1007/978-0-387-69154-1_1

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-69149-7

Online ISBN : 978-0-387-69154-1

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

Societies and partnerships

The American Association of Pharmaceutical Scientists

  • Find a journal
  • Track your research

PrepScholar

Choose Your Test

  • Search Blogs By Category
  • College Admissions
  • AP and IB Exams
  • GPA and Coursework

The 11 Solubility Rules and How to Use Them

General Education

test-214185_1280

One of the first science experiments I remember was adding salt to a cup of water and waiting eagerly for it to dissolve. Though I was excited to watch the salt seem to “disappear” I definitely didn’t understand the intricacies of solubility. Luckily, solubility follows a list of rules that helps us determine how soluble a substance is, like how likely that salt is to dissolve into that water (sneak peek- it’s very likely). We’re going to go over what solubility is, how it works, and the complete list of solubility rules to help you determine the solubility of substances.

What Is Solubility?

Solubility is a substance's ability to be dissolved . The substance that is dissolved is called a solute, and the substance it is dissolving in is called a solvent. The resulting substance is called a solution. Generally, the solute is a solid and the solvent is a liquid, such as our salt in water example above. However, solutes can be in any state: gas, liquid, or solid. For example, a carbonated beverage is a solution where the solute is a gas and the solvent is a liquid.

A solute is considered insoluble when they are unable to dissolve at a ratio greater than 10000:1. While many compounds are partially or mostly insoluble, there is no substance that is completely insoluble in water , meaning that it can't dissolve at all. You will see in the solubility rules that many compounds that are labeled as insoluble have exceptions, such as carbonates. This is partly why it's important to follow the solubility rules closely.

When you are working on chemical equations or building a hypothesis, solubility rules are helpful in predicting the end states of the substances involved. You will be able to accurately predict what combinations will lead to what results.

The solubility rules are only for ionic solids' ability to dissolve in water. While we can calculate the solubility by measuring each substance and following an equation, the solubility rules allow us to determine the solubility of a substance before you attempt to create it.

Solubility Rules

It is very important that the rules on this list are followed in order, because if a rule seems to contradict another rule, the rule that comes first is the one that you follow . Substances on this list are given by their elemental names. Referencing the periodic table below will help you work through the elemental names and groups.

Salts containing Group I elements (Li+, Na+, K+, Cs+, Rb+) are soluble . There are few exceptions to this rule. Salts containing the ammonium ion (NH4+) are also soluble.

Salts containing nitrate ion (NO3-) are generally soluble.

Salts containing Cl -, Br -, or I - are generally soluble. Important exceptions to this rule are halide salts of Ag+, Pb2+, and (Hg2)2+. Thus, AgCl, PbBr2, and Hg2Cl2 are insoluble.

Most silver salts are insoluble. AgNO3 and Ag(C2H3O2) are common soluble salts of silver; virtually all others are insoluble.

Most sulfate salts are soluble. Important exceptions to this rule include CaSO4, BaSO4, PbSO4, Ag2SO4 and SrSO4 .

Most hydroxide salts are only slightly soluble. Hydroxide salts of Group I elements are soluble. Hydroxide salts of Group II elements (Ca, Sr, and Ba) are slightly soluble. Hydroxide salts of transition metals and Al3+ are insoluble. Thus, Fe(OH)3, Al(OH)3, Co(OH)2 are not soluble.

Most sulfides of transition metals are highly insoluble, including CdS, FeS, ZnS, and Ag2S. Arsenic, antimony, bismuth, and lead sulfides are also insoluble.

Carbonates are frequently insoluble. Group II carbonates (CaCO3, SrCO3, and BaCO​3) are insoluble, as are FeCO3 and PbCO3.

Chromates are frequently insoluble. Examples include PbCrO4 and BaCrO4.

Phosphates such as Ca3(PO4)2 and Ag3PO4 are frequently insoluble.

Fluorides such as BaF2, MgF2, and PbF2 are frequently insoluble.

periodic-system-1059755_1920

Sample Questions

1. Select the compounds that are always soluble in water

2. Label each of the following as soluble or insoluble

3. Which (if any) silver is soluble: Silver chloride AgCl , silver phosphate, Ag3 PO4 , or silver fluoride, AgF ?

1. Select the compounds that are always soluble in water (bolded are correct)

a. BaSO4 (see rule 5)

b. HG2I2 (see rule 3)

c. Na OH (see rule 1)

d. Na2 SO3 (see rule 1)

e . Ag ClO3 (see rule 3)

f. Cr Cl3 (see rule 3)

g. Fe PO4 (see rule 6)

Note: Letter e is an example of using the order of the rules to determine solubility. Rule 4 says that silvers (Ag) are frequently insoluble, but rule 3 says that chlorates (Cl) are soluble. Since Ag ClO3 is a silver chlorate, and rule 3 comes before rule 4, it supersedes it. This compound is soluble.

a. Li OH soluble - rule 1

b. Fe (OH)2 insoluble - rule 7

c. Pb Br2 insoluble – rule 2

d. Rb2 SO3 soluble - rule 1

e. Ni I2 soluble – rule 3

f. H3 AsO4 insoluble - rule 10

g. Ni CRo4 insoluble - rule 8

3. Which (if any) silver is soluble: Silver chloride AgCl, silver phosphate, Ag3 PO4 , or silver fluoride, AgF ?

None of the above silver is soluble. In rule #4, it states that silver salts (Ag) are insoluble, with silver nitrate, AgNO3, as one exception.

blur-bottle-chemistry-248152

How Solubility Works

As we see from our solubility rules, some substances are very soluble, while some are insoluble or have low solubility. Let's take a look at how solubility works to better understand the solubility rules.

Factors That Affect Solubility

Whether or not a substance is soluble, and to what degree, depends on a variety of factors. Solutes typically will dissolve best in solvents that have the most molecular similarities. Polarity is a major factor in a substance's solubility. Molecules where one end is negatively charged and the other is positively charged are considered “polar,” meaning that they have electrical poles. If a molecule does not have this ionic makeup, it is considered nonpolar.

Generally, solutes are soluble in solvents that are most similar to them molecularly. Polar solutes will dissolve better in polar solvents, and non-polar solutes will dissolve better in non-polar solvents. For example, sugar is a polar solute, and absorbs very well in water. However, sugar would have a low solubility in a nonpolar liquid like vegetable oil. In general, solutes will also be more soluble if the molecules in the solute are smaller than the ones in the solvent.

Other factors that affect solubility are pressure and temperature. In some solvents, when heated the molecules vibrate faster and are able to break apart the solute. Pressure is mainly a factor when a gas substance is involved, and has little to no effect on liquid substances. 

The rate of solution refers to how quickly a substance dissolves, and is separate from solubility. Solubility depends entirely on the physical and chemical properties of the solute and solvent , and isn’t affected by the rate of solution. Rate should not be factored into the solubility of a substance.This can often be confusing when first learning about solubility, since in a visual example, watching something dissolve quickly can feel like an affirmation of its ability to dissolve. However, the process of solubility is unique, and the rate at which it dissolves is not factored into the equation.

water-316625_1280

Predicting Outcomes

When a solute is mixed with a solvent, there are three possible outcomes: If the solution has less solute than the maximum amount it is able to dissolve (the solubility), it is a dilute solution . If the amount of solute is exactly the same as the solubility it is saturated. If there is more solute than is able to be dissolved, the excess separates from the solution and forms a precipitate .

A solution is considered saturated when adding additional solute does not increase the concentration of the solution. Additionally, a solution is miscible when it can be mixed together at any ratio- this mainly applies to liquids, like ethanol, C2H5OH, and water, H2O.

Knowing and following the solubility rules is the best way to predict the outcome of any given solution. If we know that a substance is insoluble, it is likely that it would have excess solute, thus forming a precipitate. However, compounds that we know to be highly soluble, like salt, are likely to form solutions at various ratios; in this case, we will be able to determine how much solute and solvent is needed to form each solution, and if it's possible to form one at all.

Thinking about the salt in water experiment now, it’s clear that the salt- also known as NaCl or sodium chloride, would be highly soluble according to our solubility rules. Sodium chloride contains Na, which is almost always soluble according to rule 1, and Cl, which is usually soluble according to rule 3. Though I can tell this just by glancing at the rules, nothing takes away from the magic of watching chemical compounds break down and dissolve right before your eyes. Remember to keep your periodic tables handy, and pay close attention to the solubility rules in your next experiment.

What's Next?

Preparing for the AP Chemistry test? Study with our articles on every AP Chemistry practice test available and the ultimate AP Chem study guide . Taking IB instead? Start with our study notes for IB Chemistry .

Looking for more chemistry help? We walk you through the solubility constant (K sp ) and how to solve for it , explain how to balance chemical equations , and go over examples of physical vs chemical change here .

If you need more non-chemistry science guides, be sure to check out these guides about finding the density of water , defining commensalism , and how to calculate acceleration .

Trending Now

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

ACT vs. SAT: Which Test Should You Take?

When should you take the SAT or ACT?

Get Your Free

PrepScholar

Find Your Target SAT Score

Free Complete Official SAT Practice Tests

How to Get a Perfect SAT Score, by an Expert Full Scorer

Score 800 on SAT Math

Score 800 on SAT Reading and Writing

How to Improve Your Low SAT Score

Score 600 on SAT Math

Score 600 on SAT Reading and Writing

Find Your Target ACT Score

Complete Official Free ACT Practice Tests

How to Get a Perfect ACT Score, by a 36 Full Scorer

Get a 36 on ACT English

Get a 36 on ACT Math

Get a 36 on ACT Reading

Get a 36 on ACT Science

How to Improve Your Low ACT Score

Get a 24 on ACT English

Get a 24 on ACT Math

Get a 24 on ACT Reading

Get a 24 on ACT Science

Stay Informed

Get the latest articles and test prep tips!

Follow us on Facebook (icon)

Carrie holds a Bachelors in Writing, Literature, and Publishing from Emerson College, and is currently pursuing an MFA. She worked in book publishing for several years, and believes that books can open up new worlds. She loves reading, the outdoors, and learning about new things.

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

  • Thesis Action Plan New
  • Academic Project Planner

Literature Navigator

Thesis dialogue blueprint, writing wizard's template, research proposal compass.

  • Why students love us
  • Rebels Blog
  • Why we are different
  • All Products
  • Coming Soon

What Makes a Good Hypothesis? Essential Criteria and Examples

A well-formulated hypothesis is a cornerstone of scientific research, providing direction and focus for investigations. It serves as a bridge between theory and experiment, guiding researchers in their quest to explore, test, and validate scientific phenomena. In this article, we will delve into what makes a good hypothesis by examining its essential criteria and providing illustrative examples.

Key Takeaways

  • A good hypothesis should be clear and precise, avoiding vague language and ambiguity.
  • It must be testable and falsifiable, meaning it can be supported or refuted through experimentation.
  • Grounding in existing knowledge is crucial; a hypothesis should be based on prior research or established theories.
  • Formulating a hypothesis involves identifying variables and constructing if-then statements to define cause-and-effect relationships.
  • Common pitfalls in hypothesis development include vagueness, double-barreled hypotheses, and lack of relevance to research objectives.

Defining a Hypothesis in Research

A hypothesis is a foundational element in scientific research, serving as a proposed explanation for a phenomenon that can be tested through experimentation and observation. It is a precise, testable statement predicting the outcome of a study, typically involving a relationship between an independent variable (what the researcher changes) and a dependent variable (what the researcher measures).

Essential Characteristics of a Good Hypothesis

A well-crafted hypothesis is fundamental to any research endeavor. It serves as a guiding framework for your study, ensuring that your research is focused and meaningful. Here are the essential characteristics that define a good hypothesis:

Formulating a Testable Hypothesis

Creating a testable hypothesis is a crucial step in the research process. A well-formulated hypothesis should be specific and measurable , allowing for clear and definitive testing. This section will guide you through the essential steps to ensure your hypothesis is both testable and meaningful.

Common Pitfalls to Avoid in Hypothesis Development

Avoiding vagueness.

One of the most frequent mistakes in hypothesis development is formulating vague or ambiguous hypotheses . A well-defined hypothesis should be clear and specific , leaving no room for multiple interpretations. For instance, instead of saying, "There is a relationship between study habits and academic performance," specify the type of study habits and the metrics for academic performance.

Steering Clear of Double-Barreled Hypotheses

A double-barreled hypothesis combines two or more variables in a single statement, making it difficult to test each variable independently. For example, "Increased exercise and a balanced diet improve mental health" is problematic because it conflates two distinct variables. Instead, separate the hypotheses: "Increased exercise improves mental health" and "A balanced diet improves mental health."

Ensuring Relevance to Research Objectives

Your hypothesis must align with your research objectives. Irrelevant hypotheses can lead to wasted resources and time. Ensure that your hypothesis directly addresses the core question of your research. For example, if your research focuses on the impact of social media on teenage self-esteem , a hypothesis about social media's effect on adult self-esteem would be misaligned.

By avoiding these common pitfalls, you can develop a robust and testable hypothesis that will significantly enhance the validity of your research.

Examples of Effective Hypotheses

Hypotheses in social sciences.

In social sciences, hypotheses often explore relationships between variables such as behavior, attitudes, and social structures. For instance, a hypothesis might state, "Individuals who participate in community service are more likely to report higher levels of life satisfaction." This hypothesis is clear and specific , making it testable through surveys or observational studies.

Hypotheses in Natural Sciences

Natural sciences frequently involve hypotheses that predict natural phenomena or biological processes. An example could be, "Plants exposed to classical music will grow taller than those that are not." This hypothesis is grounded in existing knowledge about the effects of sound on plant growth and can be tested through controlled experiments.

Hypotheses in Applied Research

Applied research often aims to solve practical problems, leading to hypotheses like, "Implementing a four-day workweek will increase employee productivity." This hypothesis is relevant to organizational studies and can be tested by comparing productivity metrics before and after the implementation of the new work schedule.

Evaluating and Refining Hypotheses

Peer review and feedback.

Engaging in peer review is crucial for refining your hypothesis. Soliciting feedback from colleagues or mentors can provide new perspectives and identify potential weaknesses. This collaborative approach ensures that your hypothesis is robust and well-grounded in targeted research .

Iterative Refinement

Hypothesis development is an iterative process. After initial feedback, you should revisit and revise your hypothesis. This may involve adjusting variables, rephrasing for clarity, or incorporating new data. The goal is to enhance the testability and precision of your hypothesis.

Aligning with Research Design

Your hypothesis must align with your overall research design. Ensure that it is compatible with your methodology, data collection techniques, and analysis plan. This alignment is essential for the hypothesis to be effectively tested and validated within the context of your study.

Evaluating and refining hypotheses is a crucial step in any research process. It allows you to test your assumptions and improve the accuracy of your findings. If you're struggling with this phase, our step-by-step Thesis Action Plan can guide you through it with ease. Visit our website to learn more and claim your special offer now!

In conclusion, crafting a good hypothesis is a fundamental step in the scientific method and essential for conducting meaningful research. A well-formulated hypothesis should be clear, concise, and testable, providing a predictive statement that can be empirically evaluated. By ensuring that your hypothesis is grounded in existing literature and theory, you enhance its validity and relevance. The examples and criteria discussed in this article serve as a guide to help researchers develop robust hypotheses that can withstand rigorous testing and contribute valuable insights to their respective fields. Ultimately, a strong hypothesis not only guides the direction of your research but also lays the foundation for scientific discovery and advancement.

Frequently Asked Questions

What is a hypothesis in research.

A hypothesis is a testable prediction about the relationship between two or more variables. It serves as a foundation for scientific inquiry, guiding the research process and helping to formulate experiments.

What are the essential characteristics of a good hypothesis?

A good hypothesis should be clear and precise, testable and falsifiable, and grounded in existing knowledge. It should also include an if-then statement that defines the relationship between variables.

How do you formulate a testable hypothesis?

To formulate a testable hypothesis, identify the variables involved, construct an if-then statement, and ensure that the hypothesis is measurable. This process helps in designing experiments that can validate or refute the hypothesis.

What are common pitfalls to avoid when developing a hypothesis?

Common pitfalls include vagueness, double-barreled hypotheses (addressing more than one issue at a time), and lack of relevance to the research objectives. Avoiding these pitfalls ensures that the hypothesis is clear and focused.

Can you provide examples of effective hypotheses?

Effective hypotheses can be found in various fields. For example, in social sciences: 'If social media usage increases, then levels of anxiety among teenagers will increase.' In natural sciences: 'If the temperature of water increases, then the solubility of salt will increase.'

How can hypotheses be evaluated and refined?

Hypotheses can be evaluated and refined through peer review and feedback, iterative refinement, and alignment with the overall research design. This process helps in improving the clarity and testability of the hypothesis.

Discovering Statistics Using IBM SPSS Statistics: A Fun and Informative Guide

Discovering Statistics Using IBM SPSS Statistics: A Fun and Informative Guide

Unlocking the Power of Data: A Review of 'Essentials of Modern Business Statistics with Microsoft Excel'

Unlocking the Power of Data: A Review of 'Essentials of Modern Business Statistics with Microsoft Excel'

Discovering Statistics Using SAS: A Comprehensive Review

Discovering Statistics Using SAS: A Comprehensive Review

Trending Topics for Your Thesis: What's Hot in 2024

Trending Topics for Your Thesis: What's Hot in 2024

How to Deal with a Total Lack of Motivation, Stress, and Anxiety When Finishing Your Master's Thesis

How to Deal with a Total Lack of Motivation, Stress, and Anxiety When Finishing Your Master's Thesis

Confident student with laptop and colorful books

Mastering the First Step: How to Start Your Thesis with Confidence

Thesis Action Plan

Thesis Action Plan

Research Proposal Compass

  • Blog Articles
  • Affiliate Program
  • Terms and Conditions
  • Payment and Shipping Terms
  • Privacy Policy
  • Return Policy

© 2024 Research Rebels, All rights reserved.

Your cart is currently empty.

  • Nahkleh Group
  • Robinson Group
  • Weaver Group
  • Bodner Group

Solubility and Complex-Ion Equilibria

Solubility Product

Common Ions and Complex Ions

Combined Equilibria

Why Do Some Solids Dissolve in Water?

The sugar we use to sweeten coffee or tea is a molecular solid , in which the individual molecules are held together by relatively weak intermolecular forces. When sugar dissolves in water, the weak bonds between the individual sucrose molecules are broken, and these C 12 H 22 O 11 molecules are released into solution.

It takes energy to break the bonds between the C 12 H 22 O 11 molecules in sucrose. It also takes energy to break the hydrogen bonds in water that must be disrupted to insert one of these sucrose molecules into solution. Sugar dissolves in water because energy is given off when the slightly polar sucrose molecules form intermolecular bonds with the polar water molecules. The weak bonds that form between the solute and the solvent compensate for the energy needed to disrupt the structure of both the pure solute and the solvent. In the case of sugar and water, this process works so well that up to 1800 grams of sucrose can dissolve in a liter of water.

Ionic solids (or salts) contain positive and negative ions, which are held together by the strong force of attraction between particles with opposite charges. When one of these solids dissolves in water, the ions that form the solid are released into solution, where they become associated with the polar solvent molecules.

O
) ( ) ( )

We can generally assume that salts dissociate into their ions when they dissolve in water. Ionic compounds dissolve in water if the energy given off when the ions interact with water molecules compensates for the energy needed to break the ionic bonds in the solid and the energy required to separate the water molecules so that the ions can be inserted into solution.

Solubility Equilibria

Discussions of solubility equilibria are based on the following assumption: When solids dissolve in water, they dissociate to give the elementary particles from which they are formed . Thus, molecular solids dissociate to give individual molecules

O
H O ( ) H O ( )

and ionic solids dissociate to give solutions of the positive and negative ions they contain.

O
) ( ) ( )

When the salt is first added, it dissolves and dissociates rapidly. The conductivity of the solution therefore increases rapidly at first.

) ( ) ( )

The concentrations of these ions soon become large enough that the reverse reaction starts to compete with the forward reaction, which leads to a decrease in the rate at which Na + and Cl - ions enter the solution.

( ) ( ) )

Eventually, the Na + and Cl - ion concentrations become large enough that the rate at which precipitation occurs exactly balances the rate at which NaCl dissolves. Once that happens, there is no change in the concentration of these ions with time and the reaction is at equilibrium. When this system reaches equilibrium it is called a saturated solution , because it contains the maximum concentration of ions that can exist in equilibrium with the solid salt. The amount of salt that must be added to a given volume of solvent to form a saturated solution is called the solubility of the salt.

Solubility Rules

  • A salt is soluble if it dissolves in water to give a solution with a concentration of at least 0.1 moles per liter at room temperature.
  • A salt is insoluble if the concentration of an aqueous solution is less than 0.001 M at room temperature.
  • Slightly soluble salts give solutions that fall between these extremes.

Solubility Rules for Ionic Compounds in Water

Soluble Salts 1. The Na + , K + , and NH 4 + ions form soluble salts . Thus, NaCl, KNO 3 , (NH 4 ) 2 SO 4 , Na 2 S, and (NH 4 ) 2 CO 3 are soluble. 2. The nitrate (NO 3 - ) ion forms soluble salts . Thus, Cu(NO 3 ) 2 and Fe(NO 3 ) 3 are soluble. 3. The chloride (Cl - ), bromide (Br - ), and iodide (I - ) ions generally form soluble salts . Exceptions to this rule include salts of the Pb 2+ , Hg 2 2+ , Ag + , and Cu + ions. ZnCl 2 is soluble, but CuBr is not. 4. The sulfate (SO 4 2- ) ion generally forms soluble salts . Exceptions include BaSO 4 , SrSO 4 , and PbSO 4 , which are insoluble, and Ag 2 SO 4 , CaSO 4 , and Hg 2 SO 4 , which are slightly soluble. Insoluble Salts 1. Sulfides (S 2- ) are usually insoluble . Exceptions include Na 2 S, K 2 S, (NH 4 ) 2 S, MgS, CaS, SrS, and BaS. 2. Oxides (O 2- ) are usually insoluble . Exceptions include Na 2 O, K 2 O, SrO, and BaO, which are soluble, and CaO, which is slightly soluble. 3. Hydroxides (OH - ) are usually insoluble . Exceptions include NaOH, KOH, Sr(OH) 2 , and Ba(OH) 2 , which are soluble, and Ca(OH) 2 , which is slightly soluble. 4. Chromates (CrO 4 2- ) are usually insoluble . Exceptions include Na 2 CrO 4 , K 2 CrO 4 , (NH 4 ) 2 CrO 4 , and MgCrO 4 . 5. Phosphates (PO 4 3- ) and carbonates (CO 3 2- ) are usually insoluble . Exceptions include salts of the Na + , K + , and NH 4 + ions.

Logo for Open Oregon Educational Resources

3.2 Solubility

An understanding of the various types of noncovalent intermolecular forces allows us to explain many observable physical properties of organic compounds on a molecular level. One physical property that has links to intermolecular forces is solubility. Whether some organic substance will dissolve in a liquid solvent, and to what extent it will do so, is linked to the structures of the molecules making up this solute and the solvent.

A lot of organic chemistry takes place in the solution phase. In the organic laboratory, reactions are often run in nonpolar or slightly polar solvents such as toluene (methylbenzene), dichloromethane, or diethyl ether. In recent years, much effort has been made to adapt reaction conditions to allow for the use of more environmentally friendly solvents such as water or ethanol, which are polar and capable of hydrogen bonding. So laboratory chemistry tends to occur in these environments.

In biochemistry the solvent is of course water, but the microenvironment inside an enzyme’s active site – where the actual chemistry is going on – can range from very polar to very non-polar, depending on which amino acid residues on the enzyme surround the reactants.

Glass containing two unmixed layers of liquid. Yellow liquid lies on top of a lower layer of clear liquid.

You have probably observed at some point in your life that oil does not mix with water, either in a puddle underneath a car with a leaky oil pan, or in a vinaigrette dressing bottle in the kitchen. The underlying reason for this insolubility (or immiscibility when we talk about liquids) is intermolecular forces that exist (or don’t) between molecules within the solute, the solvent, and between the solute and solvent.

When considering the solubility of an organic compound in a given solvent, the most important question to ask ourselves is: How strong are the noncovalent attractive interactions between the compound and the solvent molecules? If the solvent is polar, like water, then a larger dipole moment, indicating greater molecular polarity, will tend to increase the solubility of a substance in it. If the solvent is non-polar, like the hydrocarbon hexane, then the exact opposite is true.

Imagine that you have a flask filled with water, and a selection of substances that you will test to see how well they dissolve in it. The first substance is table salt, or sodium chloride. This ionic compound dissolves readily in water. Why? Because water, as a very polar molecule, is able to form many ion-dipole interactions with both the sodium cation and the chloride anion, the energy from which is more than enough to make up for energy required to break up the ion-ion interactions in the salt crystal.

Dissolution of ions dissolving in water shown as a reaction. We see 8 ions, 4 cations and 4 anions, in a cubic arrangement to the left of the arrow. To the right of the arrow individual ions are represented as spheres surrounded by water molecules so that the dipoles of the water are oriented close to the dissolved ion with opposite charge. An anion is surrounded by 4 water molecules with their partial positive hydrogens close, and a cation is shown surrounded by 4 water molecules oriented with their partially negative oxygens surrounding it.

The end result, then, is that in place of sodium chloride crystals, we have individual sodium cations and chloride anions surrounded by water molecules – the salt is now in solution . Charged species as a rule dissolve readily in water: in other words, they are very hydrophilic (water-loving).

Biphenyl, like sodium chloride, is a colorless crystalline substance.

Line-bond structure of biphenyl, composed of two benzene rings connected by a single bond.

Biphenyl does not dissolve at all in water. Why is this? It is a very non-polar molecule, with only carbon-carbon and carbon-hydrogen bonds. It has some intermolecular forces bonding it to itself through nonpolar London dispersion forces, but it has no significant attractive interactions with very polar solvent molecules like water. Meanwhile the water molecules themselves are highly connected to one another through hydrogen bonding forces. Thus, the water tends to continue to engage in hydrogen bonding interactions with other molecules of its own kind, and very little is gained in terms of new biphenyl-water interactions. Water is a terrible solvent for nonpolar hydrocarbon molecules: they are very hydrophobic (water-fearing).

Next, you try a series of increasingly large alcohol compounds, starting with methanol (1 carbon) and ending with octanol (8 carbons).

Structural formulas are provided for methanol (very soluble in water), butanol (slightly soluble in water) and octanol (very insoluble in water).

You find that the smaller alcohols – methanol, ethanol, and propanol – dissolve easily in water, at any water/alcohol ratio that you try. This is because the water is able to form hydrogen bonds with the hydroxyl group in these molecules, and the increased stability in the system due to formation of these water-alcohol hydrogen bonds is more than enough to make up for the lost stability from undoing the alcohol-alcohol (and water-water) hydrogen bonds. When you try butanol, however, you begin to notice that, as you add more and more to the water, it starts to form a layer on top of the water. Butanol is only sparingly soluble in water.

The longer-chain alcohols – pentanol, hexanol, heptanol, and octanol – are increasingly insoluble in water. What is happening here? Clearly, the same favorable water-alcohol hydrogen bonds are still possible with these larger alcohols. The difference, of course, is that the larger alcohols have larger nonpolar, hydrophobic regions in addition to their hydrophilic hydroxyl group. At about four or five carbons, the influence of the hydrophobic part of the molecule begins to overcome that of the hydrophilic part, and water solubility is lost.

Now, try dissolving glucose in the water – even though it has six carbons just like hexanol, it also has five hydrophilic hydroxyl (-OH) groups that can engage in hydrogen bonding interactions, in addition to a sixth oxygen that is capable of being a hydrogen bond acceptor.

Line-bond structure for the 6 carbon polyalcohol named glucose, in its cyclic configuration. Glucose in this form has 5 OH groups and a Carbon to oxygen to carbon bond. It is water soluble.

We have tipped the scales to the hydrophilic side, and we find that glucose is quite soluble in water.

We saw that ethanol was very water-soluble (if it were not, drinking beer or vodka would be rather inconvenient!) How about dimethyl ether, which is a constitutional isomer of ethanol but with an ether rather than an alcohol functional group? We find that diethyl ether is much less soluble in water. Is it capable of forming hydrogen bonds with water? Yes, in fact, it is –the ether oxygen can act as a hydrogen-bond acceptor. The difference between the ether group and the alcohol group, however, is that the alcohol group is both a hydrogen bond donor and acceptor.

Two structures are shown: ethanol and dimethyl ether. Each is shown hydrogen bonding with water molecules. The alcohol group in ethanol can be a hydrogen bond donor and acceptor, and it is very water soluble. Dimethyl ether is a hydrogen bond acceptor only, and is less water soluble.

The result is that the alcohol is able to form more energetically favorable interactions with the solvent compared to the ether, and the alcohol is therefore much more soluble.

‘Like dissolves like’ is a general rule for solubility frequently taught in chemistry classes. This phrase consolidates the patterns described above, and while it loses some of the explanation and is really general, it is helpful.

Photo from shore of the wrecked boat New Carissa on fire.

‘Like’ items are those that are more polar, or capable of hydrogen bonding or interacting with ions. Polar solvents will dissolve polar substances well, and also ionic ones. Nonpolar substances, in contrast, will not: but they will do a good job of dissolving things that are nonpolar.

Nonpolar solvents are less familiar to non-chemists, but in daily life they do sometimes help when it is necessary to dissolve something nonpolar. For instance, essential oils are oil solutions of fragrance molecules because the fragrance compounds are nonpolar and will not dissolve in water. Cleaning solvents also often are at least somewhat nonpolar, and help to dissolve and therefore remove nonpolar greasy contaminants from tools, bikes, and other places around the house. In the environment, oils tend to float on water and thus can cover wide areas rather than remain confined to a local spill. Small volumes of spilled hazardous materials that are nonpolar can contaminate vast areas.

Summary of factors contributing to water solubility

Evaluating a chemical structure to predict its solubility characteristics can be challenging. But consideration of these factors can often lead to predictions that match real observed behavior of substances:

A: How many carbons? All else being equal, more carbons means more of a non-polar/hydrophobic character, and thus lower solubility in water.

B: How many, and what kind of hydrophilic groups? The more, the greater the water solubility. In order of importance:

  • Anything with a charged group (eg. ammonium, carboxylate, phosphate) is almost certainly water soluble, unless has a vary large nonpolar group, in which case it will most likely be soluble in the form of micelles, like a soap or detergent.
  • Any functional group that can donate a hydrogen bond to water (e.g. alcohols, amines) will significantly contribute to water solubility.
  • Any functional group that can only accept a hydrogen bond from water (eg. ketones, aldehydes, ethers) will have a somewhat smaller but still significant effect on water solubility.
  • Other groups that contribute to polarity (eg. alkyl halides, thiols, sulfides) will make a small contribution to water solubility.

Watch for heteroatoms in molecules, which often are built into functional groups that contribute to molecular polarity, and thus water-solubility.

Exercise 3.2.1

Exercise 3.2.2

Exercise 3.2.3

Exercise 3.2.4

Introductory Organic Chemistry Copyright © 2021 by Carol Higginbotham is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Science Experiments on Solubility

Many of the substances people use daily, including shampoo, gasoline and milk, are mixtures. When mixtures are homogenous, meaning the particles of each substance are mixed evenly, they create a solution. Solutions form when the attraction between the solute, a substance that dissolves, and solvent, a substance like water that does the dissolving, is greater than the particles that make up the solute. Solubility measures the amount of a solute that can dissolve in a solvent.

good hypothesis about solubility

Saturated Solutions

Introduce solubility by testing how much a solute dissolves in water before the solution becomes too saturated. This type of experiment introduces aqueous solutions, or solutions of a substance dissolved in water, to students. The experiment can also spark a discussion about why water is able to dissolve so many substances; the attraction between water and the solute is greater than the particles of the solute. The scientific method dictates you must include a hypothesis; for example, predicting that more of one solute will dissolve than another substance. To test your hypothesis, measure 1 cup each of table salt, Epsom salt and sugar, placing each substance in a separate container. Prepare three plastic cups with 1/2 cup of distilled water each. Add 1 teaspoon of table salt to one plastic cup and stir to dissolve. Continue adding table salt to this cup in small increments until the solute will no longer dissolve. Weigh the remaining salt and subtract from the initial cup to find the amount that remains. Repeat the steps with the Epsom salt as well as sugar. Compare how much of each solute was dissolved to determine if your hypothesis was correct. You should find that some crystals of each substance remain floating in the water because the water is already saturated.

Advertisement

Article continues below this ad

More For You

Science experiments on evaporation for kids in seventh grade, science experiments with purple cabbage, a simple ph experiment to do in a class, effects of acetone on plastic, chemistry experiments with baking soda & hydrochloric acid, testing various solvents.

Water isn't the only liquid that will dissolve solids like salt and sugar. Water is considered the universal solvent because the electrical charge of its molecules attract other substances, but students might wonder if other liquids also attract and dissolve solids. Test water, rubbing alcohol, club soda, cooking oil and nail polish to determine which one is the best solvent. Create your hypothesis; for example, that nail polish will dissolve more solutes and cooking oil will be the most ineffective solvent. Prepare plastic cups with 2 teaspoons of each liquid. Measure and add 1 teaspoon of salt to each liquid and stir for 10 to 30 seconds. Record results, indicating if the salt dissolved completely, partially or not at all. Repeat the experiment with other solutes like baking soda, sugar and sand to determine if multiple substances can dissolve in particular solvents.

Results and Explanations

You will find that water is the best solvent, and heavier liquids like cooking oil are the worst. Some salt will dissolve in alcohol, but since the polarity of alcohol is not as strong as water, it is not as good a solvent. Club soda will likely dissolve more than alcohol because it contains water, but the soda is also somewhat saturated with carbon dioxide. This experiment also shows that "like dissolves like," so while salt dissolves in water because they are both polar compounds, salt will not dissolve in organic compounds like nail polish. Examine your results to see if your hypothesis was correct .

Temperature and Solubility

A common hypothesis states that hot water will dissolve more solute than cold water. Use this experiment to determine if temperature has any effect on solubility. Add a 1/2 cup of lukewarm tap water to a plastic cup. Weigh about 5 tablespoons of salt and gradually add the salt to the tap water, stirring to mix. Stop adding salt when it no longer dissolves. Repeat the mixing steps with 1/2 cup each of ice water and hot water; determine at which temperature more salt dissolves. This experiment proves that the solubility of some substances is dependent on temperature, and you will notice much more salt dissolves in hot water than in cold.

Peeps Solubility

In 1996, scholars James Zimring and Gary Falcon examined the solubility of Peeps, the bird-shaped marshmallow candy. You can duplicate a similar experiment and hypothesize that the candy is not soluble in water but will dissolve in acetone, or nail polish remover. Fill four plastic cups with 1 cup each of water, acetone, vinegar and rubbing alcohol. Submerge one Peeps candy in each liquid and observe every 20 minutes for an hour. Write down your observations. This experiment demonstrates to students the difference between what is expected versus the outcome. Many students think that candy is made from sugar, and since they know sugar will dissolve in liquids like water, they believe the candy will dissolve. The candy doesn't dissolve in any of these liquids. From these results, students can determine that the candy must be made up of other substances resistant to dissolving in liquids.

  • Science Buddies: Saturated Solutions: Measuring Solubility
  • Education.com: To Test the Solubility of Common Liquid Solvents
  • Peep Research: Solubility Testing

Cara Batema is a musician, teacher and writer who specializes in early childhood, special needs and psychology. Since 2010, Batema has been an active writer in the fields of education, parenting, science and health. She holds a bachelor's degree in music therapy and creative writing.

Jump to navigation

Home

Experiment: Temperature and Solubility

Introduction.

The "Temperature and Solubility" experiment aims to investigate how the solubility of a substance is influenced by the temperature of the solvent. This experiment is based on the hypothesis that the solubility of a solute increases with the temperature of the solvent, a concept fundamental to understanding solutions in chemistry.

Materials You Need

  • Sodium chloride (table salt) or sugar (sucrose)
  • Distilled water
  • Three beakers or glass jars
  • Stirring rods
  • Thermometer
  • Heating source (like a hot plate or Bunsen burner)
  • Balance scale
  • Graduated cylinders
  • Label the beakers as 'Cold', 'Room Temperature', and 'Hot'. Measure and pour equal volumes of distilled water into each.
  • Adjust the temperature of the water in each beaker: add ice for 'Cold' to reach about 5°C, leave 'Room Temperature' as is, and heat 'Hot' to approximately 60°C.
  • Weigh out an equal amount of the solute and add it to each beaker.
  • Stir each solution continuously and observe the solute's dissolving rate until saturation.

Observations and Results

Record the time each solute takes to dissolve in the different temperature conditions. Note the amount of undissolved solute at the saturation point for each temperature. Compare the solubility in cold, room temperature, and hot water.

Conclusions

Analyze the data to determine if the hypothesis holds true. The conclusion should discuss whether the solubility of the solute was greater in hot water compared to cold, and what this implies about the relationship between temperature and solubility in solutions.

Your browser is not supported

Sorry but it looks as if your browser is out of date. To get the best experience using our site we recommend that you upgrade or switch browsers.

Find a solution

  • Skip to main content
  • Skip to navigation

good hypothesis about solubility

  • Back to parent navigation item
  • Primary teacher
  • Secondary/FE teacher
  • Early career or student teacher
  • Higher education
  • Curriculum support
  • Literacy in science teaching
  • Periodic table
  • Interactive periodic table
  • Climate change and sustainability
  • Resources shop
  • Collections
  • Remote teaching support
  • Starters for ten
  • Screen experiments
  • Assessment for learning
  • Microscale chemistry
  • Faces of chemistry
  • Classic chemistry experiments
  • Nuffield practical collection
  • Anecdotes for chemistry teachers
  • On this day in chemistry
  • Global experiments
  • PhET interactive simulations
  • Chemistry vignettes
  • Context and problem based learning
  • Journal of the month
  • Chemistry and art
  • Art analysis
  • Pigments and colours
  • Ancient art: today's technology
  • Psychology and art theory
  • Art and archaeology
  • Artists as chemists
  • The physics of restoration and conservation
  • Ancient Egyptian art
  • Ancient Greek art
  • Ancient Roman art
  • Classic chemistry demonstrations
  • In search of solutions
  • In search of more solutions
  • Creative problem-solving in chemistry
  • Solar spark
  • Chemistry for non-specialists
  • Health and safety in higher education
  • Analytical chemistry introductions
  • Exhibition chemistry
  • Introductory maths for higher education
  • Commercial skills for chemists
  • Kitchen chemistry
  • Journals how to guides
  • Chemistry in health
  • Chemistry in sport
  • Chemistry in your cupboard
  • Chocolate chemistry
  • Adnoddau addysgu cemeg Cymraeg
  • The chemistry of fireworks
  • Festive chemistry
  • Education in Chemistry
  • Teach Chemistry
  • On-demand online
  • Live online
  • Selected PD articles
  • PD for primary teachers
  • PD for secondary teachers
  • What we offer
  • Chartered Science Teacher (CSciTeach)
  • Teacher mentoring
  • UK Chemistry Olympiad
  • Who can enter?
  • How does it work?
  • Resources and past papers
  • Top of the Bench
  • Schools' Analyst
  • Regional support
  • Education coordinators
  • RSC Yusuf Hamied Inspirational Science Programme
  • RSC Education News
  • Supporting teacher training
  • Interest groups

A primary school child raises their hand in a classroom

  • More navigation items

The effect of temperature on solubility

  • No comments

Examine why some solid substances are more soluble in hot water than in cold water

Most solid substances that are soluble in water are more soluble in hot water than in cold water. This experiment examines solubility at various temperatures.

This experiment should take 60 minutes.

Equipment 

  • Eye protection
  • Boiling tubes
  • Beaker to act as ice bath, 250 cm 3
  • Beaker to act as a hot water bath, 250 cm 3
  • Stirring thermometer (-10 –110 °C) 
  • Measuring cylinder or graduated pipette, 250 cm 3
  • Wooden tongs to hold hot boiling tube
  • Ammonium chloride

Health, safety and technical notes

  • Read our standard health and safety guidance
  • Wear eye protection.
  • Ammonium chloride is harmful if swallowed and an eye irritant, see CLEAPSS Hazcard HC009a .
  • Set up a hot water bath and an ice bath. Put 2.6 g of ammonium chloride into the boiling tube. Add 4 cm 3 water.
  • Warm the boiling tube in the hot water bath until the solid dissolves.
  • Put the boiling tube in the ice bath and stir with the thermometer. Use wooden tongs to hold it if necessary.
  • Note the temperature at which crystals first appear and record it in the table
  • Add 1 cm 3 water. Warm the solution again, stirring until all the crystals dissolve.
  • Then repeat the cooling and note the new temperature at which crystals appear.
  • Repeat steps 5, 6 and 7 until 10 cm 3 water has been used.

This is a good opportunity to introduce the use of quantitative chemical apparatus to younger students.

Students should know that solids are generally more soluble in hot water than in cold water.

  • Plot a graph showing solubility on the vertical axis and temperature on the horizontal axis.

The effect of temperature on solubility - teacher notes

The effect of temperature on solubility - student sheet, additional information.

This practical is part of our  Classic chemistry experiments  collection.

  • 11-14 years
  • 14-16 years
  • Practical experiments
  • Properties of matter
  • Physical chemistry

Specification

  • (g) simple methods to determine solubility and produce solubility curves
  • (h) the interpretation of solubility curves
  • (g) concept of concentration and its expression in terms of grams or moles per unit volume (including solubility)
  • 1.10.3b adding sodium hydroxide solution and warming to identify ammonium ion.
  • 1.10.5 use starch to identify iodine.
  • 2. Develop and use models to describe the nature of matter; demonstrate how they provide a simple way to to account for the conservation of mass, changes of state, physical change, chemical change, mixtures, and their separation.

Related articles

Previews of the Review my learning: solubility teacher guidance and scaffolded student sheets

Solubility | Review my learning worksheets | 14–16 years

By Lyn Nicholls

Identify learning gaps and misconceptions with this set of worksheets offering three levels of support

Particle model index image

Particle diagrams | Structure strip | 14–16

By Kristy Turner

Support learners to describe and evaluate the particle model for solids, liquids and gases with this writing activity

A glass beaker pouring liquid uphill into another

Illustrate polymer properties with a self-siphoning solution

2024-04-22T05:38:00Z By Declan Fleming

Demonstrate the tubeless siphon with poly(ethylene glycol) and highlight the polymer’s viscoelasticity to your 11–16 learners

No comments yet

Only registered users can comment on this article., more experiments.

Image showing a one page from the technician notes, teacher notes, student sheet and integrated instructions that make up this resource, plus two bags of chocolate coins

‘Gold’ coins on a microscale | 14–16 years

By Dorothy Warren and Sandrine Bouchelkia

Practical experiment where learners produce ‘gold’ coins by electroplating a copper coin with zinc, includes follow-up worksheet

potion labels

Practical potions microscale | 11–14 years

By Kirsty Patterson

Observe chemical changes in this microscale experiment with a spooky twist.

An image showing the pages available in the downloads with a water bottle in the shape of a 6 in the foreground.

Antibacterial properties of the halogens | 14–18 years

Use this practical to investigate how solutions of the halogens inhibit the growth of bacteria and which is most effective

  • Contributors
  • Email alerts

Site powered by Webvision Cloud

Choose an Account to Log In

Roly

Notifications

Science project, testing the solubility of common liquid solvents.

Fourth Grade Science Science projects: Testing the Solubility of Common Liquid Solvents

Grade Level: 4th - 6th; Type: Physical Science/Mathematics

What is the project about?

Solutions are a special kind of mixture. Solubility is a term used to describe the amount of materials (solids, liquids, or gas) which can be dissolved in a solvent to make a solution. The research aspect of this science fair project is to test the solubility of several common liquid substances.

What are the goals?

Several common liquids, such as water, rubbing alcohol, and club soda, will have solids such as salts, sand, and baking soda added to them to determine which solids dissolve in which liquids at room temperature. Based on the results of this investigation a data table will be prepared and the results potted on a series of graphs. A rule of thumb for solubility in solvents is "like dissolves like." This means that in general, polar compounds are soluble in polar solvents and non-polar compounds are soluble in non-polar solvents. One practical benefit of the results of this project is to prove or disprove this rule.

Research Questions:

  • What is a solvent?
  • What is a solute?
  • Which solvent was able to dissolve most or all of the solutes?
  • Which solute was the most soluble in the solvents tested?
  • The term "universal solvent" means ability to dissolve most substances. Which solvent tested would fits this description?

Solutions are a special kind of mixture. Solubility is a term used to describe the amount of materials (solids, liquids, or gas) which can be dissolved in a solvent to make a solution. A solvent is the dissolving agent, e.g. water. A solute is a substance that is dissolved in a solution.

In this science fair project, solutions in which the solvent is a liquid will be investigated. Most liquid solvents are molecular compounds. Whether a compound will dissolve in a particular solvent depends on what that solvent is. The rule of thumb for solubility in molecular solvents is "like dissolves like." This means that in general, polar compounds (chemical compounds whose molecules exhibit electrically positive characteristics at one extremity and negative characteristics at the other) are soluble in polar solvents and non-polar compounds are soluble in nonpolar solvents. Water is an example of a polar solvent. Cooking oil is an example of a nonpolar solvent. Water is the most commonly used liquid solvent. It is sometimes called the "universal solvent" because it can dissolve more substances than any other liquid.

What materials are required?

Rubbing alcohol, club soda, cooking oil, table salt, baking soda, table sugar, Epsom salt, package of plastic drinking cups, coffee stirrers, metric measuring cup, clean playground or beach sand, and rubber or Latex disposable gloves

Where can the materials be found?

All of the items for this project can be a purchased locally at most major retail stores (Walmart, Target, dollar stores, etc).

Experimental Procedure:

  • On a sheet of paper or with the use of a computer and printer draw a table similar to the one shown below.
  • Using a graduated measuring cup, measure out 10 ml of water and pour into a cup.
  • Measure out a teaspoon of table salt and add it to the cup of water and stir using a coffee stirrer.
  • If all of the salt (solute) disappears then the solute is said to have dissolved in the solvent and a solution is produce. An insoluble solute will settle out of the mixture. Insoluble solutes are usually found at the bottom of the cup or floating on the surface of the liquid.
  • Record the results of each test by writing the words "soluble" if the entire solid dissolves, "insoluble" if the solid does not dissolve, or "partially soluble" if some of the solid dissolves.
  • In another clean cup add 10 ml of water, but this time add a teaspoon of sand and stir. Record the results in the table.
  • Repeat the same procedure for the Epsom salt, baking soda, and sugar. Each time used a clean cup and coffee stirrer.
  • Follow the same procedure with the rubbing alcohol, club water, and cooking oil in place of the water.
         
   Table Salt  Baking Soda  Sand  Table Sugar Epsom Salt
 Water          
 Alchohol          
 Club Soda          
 Cooking Oil          

good hypothesis about solubility

Terms/Concepts: Solution; solubility; solvent; solute; polar compound

References:

References to related books

Title: Janice VanCleave's Chemistry for Every Kid: 101 Easy Experiments that Really Work

Author: Janice VanCleave

Publisher: Jossey-Bass. Inc. ISBN -10: 0471620858 and ISBN -13: 978-0471620853

This book contains many experiments design to be conducted by elementary and middle school science age children. It also explains basic chemistry concepts that will be useful in conducting this science fair project.

Links to related sites on the web

Title: Solubility of Salts

URL: http://www.elmhurst.edu/~chm/vchembook/171solublesalts.html

Title: What is Solubility?

URL:  http://www.chemistryland.com/CHM107/Water/WaterTutorial.htm

NOTE : The Internet is dynamic; websites cited are subject to change without warning or notice!

Related learning resources

Add to collection, create new collection, new collection, new collection>, sign up to start collecting.

Bookmark this to easily find it later. Then send your curated collection to your children, or put together your own custom lesson plan.

May 2, 2013

Solubility Science: How to Grow the Best Crystals

A chemistry challenge from Science Buddies

By Science Buddies

Key concepts Chemistry Solubility Saturation Crystals Purification

Introduction Have you ever wondered how crystals are made? Crystals come in all different shapes and sizes. The purest and cleanest crystals, however, are usually also the ones that grow to be the largest in size. In this activity you'll compare the size and shape of crystals grown in different temperatures. With just water and borax, a household cleaning product, you can discover the method for growing large, pure crystals!

Background Chemical reactions are constantly happening all around you—and inside of you. For instance, a chemical reaction can turn metal into reddish-brown rust (the iron in the metal is reacting with the oxygen in the air or water, and the end product is what we recognize as rust). Chemists perform chemical reactions to change one chemical compound into another. Sometimes when multiple products are formed the chemist may want to separate one compound from the others. One way this can be done is using a process called recrystallization, where a solution of compounds can be dissolved in hot water and then cooled. As it cools, one substance crystalizes (becomes crystals), and can be removed from the rest of the liquid, which holds the other compound.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing . By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Why do crystals appear as the mixture cools? It has to do with solubility, or the largest amount of something that can be dissolved in something else, such as dissolving the powdered cleaning product, Borax, in water. The solubility of most solids increases with temperature. In other words, more Borax may be dissolved in hot water than cold water. So if a hot, saturated mixture is cooled, there's more Borax than can be contained by the colder water, and so Borax may fall out of the mixture, forming crystals.

Materials • Large bowl • Ice cubes • Water • String • Scissors • Two pencils • Two identical jars or large drinking glasses • Cooking pot • Borax, also called 20-Mule Team household cleaner. It can be found in the cleaning aisle of many grocery stores. (Use caution when handling cleaners—they can harm skin and eyes, and should not be inhaled.) • Measuring tablespoon • Plastic wrap

Preparation • Fill the large bowl half full of ice cubes and then add water until the bowl is about three quarters full. • Cut two pieces of string (they should be at least as long as the height of the jars or large drinking glasses). Tie the end of one string around each of the two pencils. Adjust the strings' lengths so that when the pencil is laid across the top of one of the jars or large drinking glasses, the end of the string hangs down to just above the bottom of the jar. Make the strings equal length. • Borax is harmful if swallowed, inhaled or contacts eyes, and on rare occasion touching it can result in rashes. Caution and adult supervision is advised when handling it.

Procedure • Fill a cooking pot with enough water to fill both jars nearly full. Then bring that water to a boil on the stove. Once the water is boiling, turn the burner off so that the water stops boiling. (Because Borax is harmful if inhaled or contacts eyes it is advised to not dissolve it with boiling water.) • Add one tablespoon of Borax to the water and stir until it dissolves. Continue to add one tablespoon at a time until no more dissolves. You will probably need about three tablespoons of Borax for each cup of water. How does the saturated solution look? • Carefully pour equal amounts of the saturated Borax solution into the two jars. Each jar should be about three fourths full. • Lay a pencil across the top of each jar so that the string hangs down into the saturated solution. • Cover the top of the jars with plastic wrap. • Leave one jar undisturbed on a countertop or table at room temperature. Place the second jar in the bowl full of ice that you prepared. If needed, adjust the water level in the bowl so that the water reaches at least three fourths the way up the jar, but is not so high that it goes into the jar. • Do not disturb the jars for at least five hours. Check the bowl of ice regularly and add ice if it has melted. • Check on the jars about once an hour to see how the crystals are forming. It may be difficult to observe the jar in the bowl—try looking at the string through the plastic wrap cover. Do you see crystals forming on the side of one of the jars? Do crystals form in one jar before the other? • After at least five hours carefully remove the pencils and observe the crystals on the strings. How do the size, shape and number of crystals on each string compare with one another? Why do you think this is? • Extra: In this activity you examined Borax crystal formation at two different temperatures, but you could try other temperatures as well; one way is to put one of the jars in the refrigerator. How does allowing the Borax mixture to cool at a different temperature affect crystal formation? • Extra: Try making crystals out of other materials, such as sugar or salt. How well do crystals form using other mixtures with water? • Extra: You did this activity for at least five hours. How do your results change if you grow your crystals for a longer period of time? Make sure to keep adding ice cubes to the water bath to keep it cool throughout the activity. Observations and results Did smaller, more abundant crystals form in the jar and on the string in the bowl of ice water, whereas larger, fewer, better-shaped crystals formed in the jar at room temperature?

As the hot, saturated mixture of Borax and water cooled, there was more borax than could be contained by the colder water, and so this borax fell out of the mixture and formed crystals. A crystal is made of molecules of a product that have come together in a specific repeated pattern. When the molecules of the crystal come together, other products that are often considered impurities, or the unwanted products of the chemical reaction, do not fit well into the structure, much like the wrong piece of a puzzle does not fit. If the crystals form slowly enough, the impurities will be rejected because they do not fit correctly, and instead will remain in the water. This is why the crystals in the room-temperature jar should have been larger and more cube-like. But if a solution is cooled too quickly, there isn't time to expel the impurities and instead they become trapped within the crystal structure and the pattern is disturbed. Consequently, the crystals in the bowl of ice water should have formed more quickly and in greater numbers, but were smaller and less cubelike in shape because they had more impurities.

More to explore Crystallization , from the Department of Chemistry and Biochemistry at the University of Colorado at Boulder How to Grow Great Crystals: Tips, Tricks and Techniques , from About.com Crystal Chemistry ( pdf ), from the Royal Society of Chemistry Crazy Crystal Creations: How to Grow the Best and the Largest Crystals , from Science Buddies This activity brought to you in partnership with  Science Buddies

COMMENTS

  1. Solubility

    Example of a dissolved solid (left) Formation of crystals in a 4.2 M ammonium sulfate solution. The solution was initially prepared at 20 °C and then stored for 2 days at 4 °C. In chemistry, solubility is the ability of a substance, the solute, to form a solution with another substance, the solvent. Insolubility is the opposite property, the inability of the solute to form such a solution.

  2. Will we ever be able to accurately predict solubility?

    Abstract. Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used ...

  3. 13 Effect of Temperature and Solvent on Solubility

    Definition. The solubility of a compound in a given solvent is the mass of solute that can be dissolved in a given amount of solvent. The solubility is typically expressed as. solubility = g solute 100 g solvent solubility = g solute 100 g solvent.

  4. Solubility

    Solubility is the relative ability of a solute to dissolve into a solvent. Several factors affect the solubility of a given solute in a given solvent. Temperature often plays the largest role, although pressure can have a significant effect for gases. To predict whether a compound will be soluble in a given solvent, remember the saying, "Like ...

  5. An Introduction to the Understanding of Solubility

    Solubility (defined as the concentration of dissolved. solute in a solvent in equilibrium with undissolved solute at a. specified temperature and pressure) has a practical side in terms. of ...

  6. Principles of Solubility

    Abstract. Solubility is defined as the maximum quantity of a substance that can be completely dissolved in a given amount of solvent, and represents a fundamental concept in fields of research such as chemistry, physics, food science, pharmaceutical, and biological sciences. The solubility of a substance becomes especially important in the ...

  7. The 11 Solubility Rules and How to Use Them

    Solubility is a substance's ability to be dissolved. The substance that is dissolved is called a solute, and the substance it is dissolving in is called a solvent. The resulting substance is called a solution. Generally, the solute is a solid and the solvent is a liquid, such as our salt in water example above.

  8. What Makes a Good Hypothesis? Essential Criteria and Examples

    Key Takeaways. A good hypothesis should be clear and precise, avoiding vague language and ambiguity. It must be testable and falsifiable, meaning it can be supported or refuted through experimentation. Grounding in existing knowledge is crucial; a hypothesis should be based on prior research or established theories.

  9. Solubility

    The weak bonds that form between the solute and the solvent compensate for the energy needed to disrupt the structure of both the pure solute and the solvent. In the case of sugar and water, this process works so well that up to 1800 grams of sucrose can dissolve in a liter of water. Ionic solids (or salts) contain positive and negative ions ...

  10. 3.2 Solubility

    3.2 Solubility. An understanding of the various types of noncovalent intermolecular forces allows us to explain many observable physical properties of organic compounds on a molecular level. One physical property that has links to intermolecular forces is solubility. Whether some organic substance will dissolve in a liquid solvent, and to what ...

  11. (PDF) Solubility: An overview

    The Flory-Huggins solution hypothesis is a hypothetical. model depicting the ... The solubility of a solute is the maximum quantity of solute that can dissolve in a certain quantity of solvent or ...

  12. Science Experiments on Solubility

    Temperature and Solubility. A common hypothesis states that hot water will dissolve more solute than cold water. Use this experiment to determine if temperature has any effect on solubility. Add a 1/2 cup of lukewarm tap water to a plastic cup. Weigh about 5 tablespoons of salt and gradually add the salt to the tap water, stirring to mix.

  13. Experiment: Temperature and Solubility

    Compare the solubility in cold, room temperature, and hot water. Conclusions. Analyze the data to determine if the hypothesis holds true. The conclusion should discuss whether the solubility of the solute was greater in hot water compared to cold, and what this implies about the relationship between temperature and solubility in solutions.

  14. The effect of temperature on solubility

    Procedure. Set up a hot water bath and an ice bath. Put 2.6 g of ammonium chloride into the boiling tube. Add 4 cm 3 water. Warm the boiling tube in the hot water bath until the solid dissolves. Put the boiling tube in the ice bath and stir with the thermometer. Use wooden tongs to hold it if necessary. Note the temperature at which crystals ...

  15. Testing the Solubility of Common Liquid Solvents

    Solubility is a term used to describe the amount of materials (solids, liquids, or gas) which can be dissolved in a solvent to make a solution. A solvent is the dissolving agent, e.g. water. A solute is a substance that is dissolved in a solution. In this science fair project, solutions in which the solvent is a liquid will be investigated.

  16. Matter: Solubility lab Flashcards

    What is the effect of temperature on the solubility of a solid in a liquid? Hypothesis. If the temperature of the liquid is increased, then more sugar will dissolve, because warm solutions hold more solute than cold solutions. Summary. Measure the amount of sugar that can dissolve at four (or more) different water temperatures and analyze the ...

  17. Solubility Science: How to Grow the Best Crystals

    Solubility Science: How to Grow the Best Crystals