This file was generated on 2024-06-24 by Yarkın Aybars ÇETİN GENERAL INFORMATION 1. Title of Dataset: SCC-DFTB/MD simulation: guanine-TiO2 adsorption model 2. Author Information A. Investigator Contact Information Name: Çetin, Yarkın Aybars - ORCID: 0000-0003-2456-5949 Institution: Universitat Rovira i Virgili Address: Departament d'Enginyeria Informatica i Matematiques, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia, Spain. Email: yarkinaybars.cetin@urv.cat 3. Date of data collection (single date, range, approximate date) : approximate dates, Start: 2024-03-01 ; End: 2024-06-15 4. Geographic location of data collection : Tarragona, Catalonia, Spain. 5. Information about funding sources that supported the collection of the data: SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: CC BY-NC 4.0 (Attribution-NonCommercial 4.0 International) 2. Links to publications that cite or use the data: 3. Links to other publicly accessible locations of the data: No other location. 4. Links/relationships to ancillary data sets: 5. Was data derived from another source? yes/no A. If yes, list source(s): No 6. Recommended citation for this dataset: Çetin, Yarkin Aybars, 2024, "SCC-DFTB/MD simulation: guanine-TiO2 adsorption model", https://doi.org/10.34810/data1312, CORA.Repositori de Dades de Recerca. AND Prediction of Electronic Density of States in Guanine-TiO2 Adsorption Model based on Machine Learning doi: https://doi.org/10.1016/j.csbr.2024.100008 DATA & FILE OVERVIEW 1. File List: This dataset includes the main folders, which are named correspondingly to the studied chemistry models, “DEFECTIVE_MODEL,” “DEFECTIVE_ORIENTED_MODEL,” and “STOICHIOMETRIC_MODEL.” The relationship between the subfolders and other components is described in the next section. 2. Relationship between files, if important: This dataset includes the main folders, which are named correspondingly to the studied chemistry models, “DEFECTIVE_MODEL,” “DEFECTIVE_ORIENTED_MODEL,” and “STOICHIOMETRIC_MODEL.” These main folders provide each model geometry’s initial structure, starting with “Initial_.” Secondly, the folder “Total Energy” includes a total energy plot (“TotalEnergy_DFTB+”, with .fig and .png extension). This information was provided per model to ensure energetic equilibration. Thirdly, the MD trajectory is concatenated for all trajectory steps in the “MDtrajectory_…” name formatted files. In the names of dataset files, 'K' denotes a thousand, following the convention of the derived prefix ‘kilo.’ Numbers at the end of the nomenclature initiate the range of trajectory (e.g. _0_16K stands for 0th step to 16000th step range). Fourth, files named in the format of “MD_DOS_..…tgz” include “band.out”, “dftb_in.hsd”, “dftb.out”, “dos_total.dat,” “DOS.txt,” “geom.out.xyz" files for each individual MD trajectory steps. Fifth, the “ML” folder includes subfolders for applied instances of machine learning. These instances are named with consecutive numbers (ML#). Each subfolder of the ML folder provides the “RMSE_for_each_column.fig” figure file that represents the root mean square for all test samples in the testing step.“MAX_ERROR_sample_Vs_DFTB+.fig”, and “MIN_ERROR_sample_VS_DFTB+.fig” figure files represent the comparison of predicted DOS to Calculated DOS for the samples with maximum and minimum RMSE respectively. And the generated neural network functions with the name “myNeuralNetworkFunction.m”. Figure files are also given with .png extension just for the sake of representation of this dataset. In the case of the “DEFECTIVE_ ORIENTED MODEL,” subfolders are divided into corresponding trajectory step ranges as ML is implemented with equilibrated trajectory steps from 6000 to 16000, from 6000 to the 36000th, and from 6000 to the 46000th step to generate NNs. 3. Additional related data collected that was not included in the current data package: 4. Are there multiple versions of the dataset? yes/no No A. If yes, name of file(s) that was updated: i. Why was the file updated? ii. When was the file updated? METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: This dataset houses simulation data for “SCC-DFTB/MD simulation: guanine-TiO2 adsorption model.” This set of simulations is a continuation of the set used in the related publication (https://doi.org/10.1016/j.csbr.2024.100008) and mentioned in its dataset (https://doi.org/10.34810/data1223). This dataset shares Self Consistent Charge Density Functional Tight Binding Molecular Dynamics (SCC-DFTB/MD) simulation data of horizontally oriented guanine molecule adsorption on an Anatase-(101), (96 TiO2 6 Trilayers) Slab. The initial oxygen-deficient geometry (Initial_Defective_Model) and stoichiometric geometry (Initial_Stoichiometric_Model) computational chemistry models were obtained as described in the related publication (https://doi.org/10.1016/j.csbr.2024.100008). Afterward, the plane of the guanine molecule was manipulated by rotating the molecule 90 degrees so that it became parallel to the TiO2 surface. In this resulting geometry, an additional distance of 0.5 Angstrom was added between the guanine molecule and the slab surface. Moreover, an alternative starting geometry (Initial_Defective_Oriented_Model) was created for the oxygen-deficient chemistry model, and the oxygen atom in the guanine was ensured to fall exactly on the oxygen vacancy center by adjusting the guanine molecule horizontally. Computational details were implemented as described in a related publication. The prepared molecular models were subjected to MD calculations of at least 16000 steps (16ps (dt=1fs)). The MD simulation was energetically equilibrated after roughly 6000 steps. As in the related publication, the database for every trajectory step’s density of states (DOS) and geometry (GEO) was created through MD simulation. It was used to create Neural Network (NN) replicas via the Machine Learning (ML) technique. ML only considered energetically equilibrated steps. In the alternative starting geometry case, the calculation continued until the 46000th step. In this case, the MD trajectory is considered in two approaches; ML is implemented with equilibrated trajectory steps from 6000 to 16000, from 6000 to the 36000th, and from 6000 to the 46000th step to generate NNs. AND Please see the "2. Relationship between files" section and the research paper: Prediction of Electronic Density of States in Guanine-TiO2 Adsorption Model based on Machine Learning doi: https://doi.org/10.1016/j.csbr.2024.100008 2. Methods for processing the data: Please see the "2. Relationship between files" section and the research paper: Prediction of Electronic Density of States in Guanine-TiO2 Adsorption Model based on Machine Learning doi: https://doi.org/10.1016/j.csbr.2024.100008 3. Instrument- or software-specific information needed to interpret the data: Code scripts run on MatLab and DFTB+ programs. Installation information can be found via the links below; MatLab, 2023b, https://www.mathworks.com DFTB+, 20.2.1, https://www.dftbplus.org 4. Standards and calibration information, if appropriate: 5. Environmental/experimental conditions: Please see the "2. Relationship between files" section and the research paper: Prediction of Electronic Density of States in Guanine-TiO2 Adsorption Model based on Machine Learning doi: https://doi.org/10.1016/j.csbr.2024.100008 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: A. Investigator Contact Information Name: Çetin, Yarkın Aybars - ORCID: 0000-0003-2456-5949 Institution: Universitat Rovira i Virgili Address: Departament d'Enginyeria Informatica i Matematiques, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia, Spain. Email: yarkinaybars.cetin@urv.cat