You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PROJECT II: IMPLEMENTING A NEW HOSPITAL IN THE USA HEALTHCARE SYSTEM
INTRO
As a CEO of different hospitals spread across the States I am planning to design and implement a new hospital specialised in some of the most prevalent and resource-demanding health conditions:
Diabetes
Cancer
Hyperthesion
Asthma
Arthritis
Obesity
As a result, a market revision is needed in order to find the best conditions (geographical, population, disease prevalence, etc.) to place the hospital.
OBJECTIVES
Check the prevalence of each condition across gender and age and which one is the most common disease
Study the different admission types. Is there any relation to the age of patients?
Study the effects of test results. Are they related with the length of stay?
Month of admission
Study other hospitals. How many patients do they have of 0- blood type?
Health expenditure in figures. Who are the patients who spent less amount of money (i.e. 1000$-2000$) in hospital stays?
Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.
Which patients stayed in the V.I.P. room (i.e. room 463)
METHODOLOGY
Database with 10,000 patient records from different USA hospitals (Kaggle)
No missing data, no duplicates considered
Data on admissions, bloodtype, medical conditions, test results, age, hospitals and doctors
1. Check the prevalence of each condition across gender and age and which one is the most common disease
query=pd.read_sql_query('''select gender as Gender, medical_cond as Medical_condition, count(medical_cond) as N from patient p join bloodtype bt on p.blood_id=bt.blood_idjoin medicalcond mc on p.medcond_id=mc.medcond_idgroup by gender, medical_condorder by Medical_condition desc;''', engine
)
query
Gender
Medical_condition
N
0
Male
Obesity
790
1
Female
Obesity
838
2
Male
Hypertension
852
3
Female
Hypertension
836
4
Female
Diabetes
825
5
Male
Diabetes
798
6
Male
Cancer
816
7
Female
Cancer
887
8
Male
Asthma
834
9
Female
Asthma
874
10
Male
Arthritis
835
11
Female
Arthritis
815
Most prevalent condition (subquery)
query=pd.read_sql_query('''select medical_cond as Medical_condition, count(p.medcond_id) as N from medicalcond mc join patient p on p.medcond_id=mc.medcond_idgroup by p.medcond_id having N=(select max(N) from (select count(p.medcond_id) as N, medical_cond from medicalcond mc join patient p on p.medcond_id=mc.medcond_idgroup by p.medcond_id)sub1)''', engine
)
query
Medical_condition
N
0
Asthma
1708
2. Study the effect of age. Is there any relation to the admission type or to the health condition?
query=pd.read_sql_query('''select admission_type as AdmissionType, round(avg(age),2) as MeanAge from patient p join admission a on p.pat_id=a.pat_idjoin admission_type aty on a.admtype_id=aty.admtype_idgroup by AdmissionTypeorder by MeanAge desc''', engine
)
query
AdmissionType
MeanAge
0
Urgent
51.96
1
Elective
51.44
2
Emergency
51.35
query=pd.read_sql_query('''select medical_cond as Medical_condition, round(avg(age),2) as MeanAge from patient p join medicalcond mc on p.medcond_id=mc.medcond_idgroup by Medical_conditionorder by MeanAge desc''', engine
)
query
Medical_condition
MeanAge
0
Diabetes
51.80
1
Obesity
51.63
2
Cancer
51.58
3
Arthritis
51.53
4
Asthma
51.45
5
Hypertension
50.74
3. Study the effects of test results. Are they related with the length of stay?
query=pd.read_sql_query('''select test_result as TestResult, datediff(disc_date,adm_date) as LengthStay from admission ad join test_results tr on ad.testres_id=tr.testres_idgroup by TestResult, LengthStayorder by LengthStay desclimit 6''', engine
)
query
TestResult
LengthStay
0
Inconclusive
30
1
Normal
30
2
Abnormal
30
3
Abnormal
29
4
Normal
29
5
Inconclusive
29
4. Which is the month with more admissions?
query=pd.read_sql_query('''select date_format(convert(adm_date,date), '%%M') as Mes, count(*) as N from admissiongroup by Mesorder by N desc''', engine
)
query
Mes
N
0
July
1005
1
April
985
2
October
980
3
August
968
4
May
960
5
June
959
6
January
948
7
December
935
8
March
926
9
November
918
10
September
877
11
February
874
5 Study other hospitals. How many patients do they have 0- blood type?
query=pd.read_sql_query('''select hospital as Hospital, count(blood_type) as N_BloodType from patient p join medicalcond mc on p.medcond_id=mc.medcond_id join admission ad on p.pat_id=ad.pat_id join hospital h on ad.hospital_id=h.hospital_idjoin bloodtype bt on p.blood_id=bt.blood_idgroup by Hospital, blood_typehaving blood_type='o-' order by N_BloodType desclimit 5''', engine
)
query# 0- is the universal blood type so good for transfusions
Hospital
N_BloodType
0
Martinez PLC
9
1
Williams LLC
6
2
Smith-Williams
6
3
Burton, Harris and Phillips
6
4
Booth LLC
6
6. Who are the patients who spent less amount of money (i.e. 0$-800$) in hospital stays?
query=pd.read_sql_query('''select p.pat_id, name as Patient, billing_amount as $ from admission a join patient p on a.pat_id=p.pat_idwhere billing_amount between 0 and 800order by $ desc''', engine
)
query
pat_id
Patient
$
0
9279
Mrs. Sandra Wood
800.0
7. Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.
query=pd.read_sql_query('''select billing_amount as $,CASE WHEN billing_amount >20000 THEN 'Medium bill' WHEN billing_amount >40000 THEN 'High bill' ELSE 'Low bill'END as Bill_Typefrom admission a join patient p on a.pat_id=p.pat_idlimit 5''', engine
)
query
$
Bill_Type
0
26702.0
Medium bill
1
16492.0
Low bill
2
16492.0
Low bill
3
22986.0
Medium bill
4
22986.0
Medium bill
8. Select the maximum bill per hospital (window function).
query=pd.read_sql_query('''select hospital_id as Hospital, max(billing_amount) over (partition by hospital_id) as MaxBill$from admissionwhere billing_amount > 2000order by Hospitallimit 10''', engine
)
query
Hospital
MaxBill$
0
1
35608.0
1
1
35608.0
2
2
4934.0
3
3
7060.0
4
4
4063.0
5
5
39238.0
6
6
38813.0
7
6
38813.0
8
7
19144.0
9
8
30432.0
9. List three patients who stayed in the V.I.P. room (Room 463). How many patients stayed in that room? -- subquery
query=pd.read_sql_query('''select name as Patient from patient where pat_id in (select pat_id from admission where room_number=463)limit 3''', engine
)
query
Patient
0
David Copeland
1
Lori Smith
2
Brian Cox
query=pd.read_sql_query('''select count(*) from patient where pat_id in (select pat_id from admission where room_number=463)''', engine
)
query
count(*)
0
18
10. Add a column with the patient's nationality
altertable admission
add column pat_nation VARCHAR(50) default null after hospital_id;
UPDATE admission
SET
pat_nation ='American'WHERE
adm_id >0;
All patients are from an American Country
# Commit changesconn.commit()
# Close connectionconn.close()
STRENGTHS AND WEEAKNESSESS
Database clean and easy to work with (S)
Nice N for robust conclusions (S)
Mix of real and fake data (S)
Time constraints (W)
Very homogeneus data at some point -- solution: randomised (W)
Some desired variables were missing --> incorporated manually (W)
CONCLUSIONS
By looking a the data, efforts should be directed to asthmatic patients with a mean age of 50, although all the disease are similarly prevalente if we don't take into account other variables (e.g. gender). More staff should be hired during July (tourism and parties?). There are no age differences of patients regarding health disease or test result.
About
This project is based on SQL and Tableau. The aim is to do a market research in the USA healthcare system to analyze the implementation and construction of a new hospital! Let's look for disease prevalence, healthcare providesrs, stays, patients...and let's see what conclusions can we extract from this!